Towards a Better 16-bit Number Representation for Training Neural Networks

Abstract

Error resilience in neural networks has allowed for the adoption of low-precision floating-point representations for mixed-precision training to improve efficiency. Although the IEEE 754 standard had long defined a 16-bit float representation, several other alternatives targeting mixed-precision training have also emerged. However, their varying numerical properties and differing hardware characteristics, among other things, make them more or less suitable for the task. Therefore, there is no clear choice of a 16-bit floating-point representation for neural network training that is commonly accepted. In this work, we evaluate all 16-bit float variants and upcoming posit™ number representations proposed for neural network training on a set of Convolutional Neural Networks (CNNs) and other benchmarks to compare their suitability. Posits generally achieve better results, indicating that their non-uniform accuracy distribution is more conducive for the training task. Our analysis suggests that instead of having the same accuracy for all weight values, as is the case with floats, having greater accuracy for the more commonly occurring weights with larger magnitude improves the training results, thereby challenging previously held assumptions while bringing new insight into the dynamic range and precision requirements. We also evaluate the efficiency on hardware for mixed-precision training based on FPGA implementations. Finally, we propose the use of statistics based on the distribution of network weight values as a heuristic for selecting the number representation to be used.