Floating point

Proceedings Article•DOI•

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

[...]

Benoit Jacob¹, Skirmantas Kligys¹, Bo Chen¹, Menglong Zhu¹, Matthew Tang¹, Andrew Howard¹, Hartwig Adam¹, Dmitry Kalenichenko¹ - Show less +4 more•Institutions (1)

Google¹

18 Jun 2018

TL;DR: A quantization scheme is proposed that allows inference to be carried out using integer- only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware.

...read moreread less

Abstract: The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

...read moreread less

2,067 citations

Book•

Accuracy and stability of numerical algorithms

[...]

Nicholas J. Higham

01 Jan 1991

TL;DR: This book gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic by combining algorithmic derivations, perturbation theory, and rounding error analysis.

...read moreread less

Abstract: From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method? The answers to these and many related questions are included here. This book gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic. It combines algorithmic derivations, perturbation theory, and rounding error analysis. Software practicalities are emphasized throughout, with particular reference to LAPACK and MATLAB. The best available error bounds, some of them new, are presented in a unified format with a minimum of jargon. Because of its central role in revealing problem sensitivity and providing error bounds, perturbation theory is treated in detail. Historical perspective and insight are given, with particular reference to the fundamental work of Wilkinson and Turing, and the many quotations provide further information in an accessible format. The book is unique in that algorithmic developments and motivations are given succinctly and implementation details minimized, so that attention can be concentrated on accuracy and stability results. Here, in one place and in a unified notation, is error analysis for most of the standard algorithms in matrix computations. Not since Wilkinson's Rounding Errors in Algebraic Processes (1963) and The Algebraic Eigenvalue Problem (1965) has any volume treated this subject in such depth. A number of topics are treated that are not usually covered in numerical analysis textbooks, including floating point summation, block LU factorization, condition number estimation, the Sylvester equation, powers of matrices, finite precision behavior of stationary iterative methods, Vandermonde systems, and fast matrix multiplication. Although not designed specifically as a textbook, this volume is a suitable reference for an advanced course, and could be used by instructors at all levels as a supplementary text from which to draw examples, historical perspective, statements of results, and exercises (many of which have never before appeared in textbooks). The book is designed to be a comprehensive reference and its bibliography contains more than 1100 references from the research literature. Audience Specialists in numerical analysis as well as computational scientists and engineers concerned about the accuracy of their results will benefit from this book. Much of the book can be understood with only a basic grounding in numerical analysis and linear algebra. About the Author Nicholas J. Higham is a Professor of Applied Mathematics at the University of Manchester, England. He is the author of more than 40 publications and is a member of the editorial boards of the SIAM Journal on Matrix Analysis and Applications and the IMA Journal of Numerical Analysis. His book Handbook of Writing for the Mathematical Sciences was published by SIAM in 1993.

...read moreread less

1,911 citations

Accuracy and Stability of Numerical Algorithms

[...]

Nicholas J. Higham

01 Jan 2002

TL;DR: Higham as discussed by the authors gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic, combining algorithmic derivations, perturbation theory, and rounding error analysis.

...read moreread less

Abstract: From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method? The answers to these and many related questions are included here. This book gives a thorough, up-to-date treatment of the behavior of numerical algorithms in finite precision arithmetic. It combines algorithmic derivations, perturbation theory, and rounding error analysis. Software practicalities are emphasized throughout, with particular reference to LAPACK and MATLAB. The best available error bounds, some of them new, are presented in a unified format with a minimum of jargon. Because of its central role in revealing problem sensitivity and providing error bounds, perturbation theory is treated in detail. Historical perspective and insight are given, with particular reference to the fundamental work of Wilkinson and Turing, and the many quotations provide further information in an accessible format. The book is unique in that algorithmic developments and motivations are given succinctly and implementation details minimized, so that attention can be concentrated on accuracy and stability results. Here, in one place and in a unified notation, is error analysis for most of the standard algorithms in matrix computations. Not since Wilkinson's Rounding Errors in Algebraic Processes (1963) and The Algebraic Eigenvalue Problem (1965) has any volume treated this subject in such depth. A number of topics are treated that are not usually covered in numerical analysis textbooks, including floating point summation, block LU factorization, condition number estimation, the Sylvester equation, powers of matrices, finite precision behavior of stationary iterative methods, Vandermonde systems, and fast matrix multiplication. Although not designed specifically as a textbook, this volume is a suitable reference for an advanced course, and could be used by instructors at all levels as a supplementary text from which to draw examples, historical perspective, statements of results, and exercises (many of which have never before appeared in textbooks). The book is designed to be a comprehensive reference and its bibliography contains more than 1100 references from the research literature. Audience Specialists in numerical analysis as well as computational scientists and engineers concerned about the accuracy of their results will benefit from this book. Much of the book can be understood with only a basic grounding in numerical analysis and linear algebra. About the Author Nicholas J. Higham is a Professor of Applied Mathematics at the University of Manchester, England. He is the author of more than 40 publications and is a member of the editorial boards of the SIAM Journal on Matrix Analysis and Applications and the IMA Journal of Numerical Analysis. His book Handbook of Writing for the Mathematical Sciences was published by SIAM in 1993.

...read moreread less

1,565 citations

Journal Article•DOI•

What every computer scientist should know about floating-point arithmetic

[...]

David E. Goldberg¹•Institutions (1)

PARC¹

01 Mar 1991-ACM Computing Surveys

TL;DR: This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems, and concludes with examples of how computer system builders can better support floating point.

...read moreread less

Abstract: Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

...read moreread less

1,372 citations

Standard•DOI•

IEEE Standard for Floating-Point Arithmetic

[...]

Dan Zuras, M. F. Cowlishaw, Alex Aiken, Matthew Applegate, David H. Bailey, Steve Bass, Dileep Bhandarkar, Mahesh Bhat, David Bindel, Sylvie Boldo, Stephen Canon, Steven R. Carlough, Marius Cornea, John H. Crawford, Joseph D. Darcy, Debjit Das Sarma, Marc Daumas, Bob Davis, Mark Davis, Dick Delp, James Demmel, Mark A. Erle, Hossam A. H. Fahmy, J. P. Fasano, Richard J. Fateman, Eric Feng, Warren E. Ferguson, Alex Fit-Florea, Laurent Fournier, Chip Freitag, Ivan Godard, Roger A. Golliver, David Gustafson, Michel Hack, John R. Harrison, John Hauser, Yozo Hida, Chris N. Hinds, Graydon Hoare, David G. Hough, Jerry Huck, Jim Hull, Michael Ingrassia, David V. James, Rick James, William Kahan, John Kapernick, Richard Karpinski, Jeff Kidder, Plamen Koev, Ren-Cang Li, Zhishun A. Liu, Raymond Mak, Peter Markstein, David W. Matula, Guillaume Melquiond, Nobuyoshi Mori, Ricardo Morin, Ned Nedialkov, Craig Nelson, Stuart Oberman, Jon Okada, Ian Ollmann, Michael Parks, Tom Pittman, Eric Postpischil, Jason Riedy, Eric M. Schwarz, David Scott, Don Senzig, Ilya Sharapov, Jim Shearer, Michael Siu, Ron Smith, Chuck Stevens, Peter Tang, Pamela J. Taylor, James W. Thomas, Brandon Thompson, Wendy Thrash, Neil Toda, Son Dao Trong, Leonard Tsai, Charles Tsen, Fred Tydeman, Liang Wang, Scott Westbrook, Steve Winkler, Anthony Wood, Umit Yalcinalp, Fred Zemke, Paul Zimmermann - Show less +88 more

01 Jan 2008

1,354 citations

Year	Papers
2023	105
2022	206
2021	214
2020	302
2019	304
2018	309

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics