scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Decimal floating-point: algorism for computers

15 Jun 2003-pp 104-111
TL;DR: This work introduces a new approach to decimal floating point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard.
Abstract: Decimal arithmetic is the norm in human calculations, and human centric applications must use a decimal floating point arithmetic to achieve the same results. Initial benchmarks indicate that some applications spend 50% to 90% of their time in decimal processing, because software decimal arithmetic suffers a 100/spl times/ to 1000/spl times/ performance penalty over hardware. The need for decimal floating point in hardware is urgent. Existing designs, however, either fail to conform to modern standards or are incompatible with the established rules of decimal arithmetic. We introduce a new approach to decimal floating point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard. A hardware implementation of this arithmetic is in development, and it is expected that this will significantly accelerate a wide variety of applications.

Content maybe subject to copyright    Report

Citations
More filters
StandardDOI
01 Jan 2008

1,354 citations

Proceedings ArticleDOI
25 Jun 2007
TL;DR: Two novel architectures for parallel decimal multipliers are introduced based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits and three schemes for fast and efficient generation of partial products in parallel are presented.
Abstract: This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

130 citations


Cites background from "Decimal floating-point: algorism fo..."

  • ...Thus, it is expected that microprocessor manufacturers include decimal floating–point units in their products oriented to mainframe servers to satisfy the high performance demands of current financial, commercial and user– oriented applications [3]....

    [...]

Journal ArticleDOI
TL;DR: The IBM POWER6 processor--on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads--is the first commercial hardware implementation of the IEEE 754R Binary Floating-point Arithmetic Standard.
Abstract: The IBM POWER6™ microprocessor core includes two accelerators for increasing performance of specific workloads. The vector multimedia extension (VMX) provides a vector acceleration of graphic and scientific workloads. It provides single instructions that work on multiple data elements. The instructions separate a 128-bit vector into different components that are operated on concurrently. The decimal floating-point unit (DFU) provides acceleration of commercial workloads, more specifically, financial transactions. It provides a new number system that performs implicit rounding to decimal radix points, a feature essential to monetary transactions. The IBM POWER™ processor instruction set is substantially expanded with the addition of these two accelerators. The VMX architecture contains 176 instructions, while the DFU architecture adds 54 instructions to the base architecture. The IEEE 754R Binary Floating-Point Arithmetic Standard defines decimal floating-point formats, and the POWER6 processor--on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads--is the first commercial hardware implementation of this format.

101 citations

Journal ArticleDOI
TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

93 citations


Cites methods from "Decimal floating-point: algorism fo..."

  • ...Although software DFP implementations [4], [6] satisfy the precision requirements, they are about an order of magnitude slower than hardware implementations [14], [34] and could not satisfy the high-performance demands of future financial, commercial, and user-oriented applications [5]....

    [...]

Proceedings ArticleDOI
01 Oct 2006
TL;DR: The results of the implementation show that the combinational decimal multiplier offers a good compromise between latency and area when compared to other decimal multiply units and to binary double-precision multipliers.
Abstract: In this work, we present a combinational decimal multiply unit which can be pipelined to reach the desired throughput. With respect to previous implementations of decimal multiplication, the proposed unit is combinational (parallel) and not sequential, has a simpler recoding of the operands which reduces the number of partial product precomputations and uses counters to eliminate the need of the decimal equivalent of a 4:2 adder. The results of the implementation show that the combinational decimal multiplier offers a good compromise between latency and area when compared to other decimal multiply units and to binary double-precision multipliers.

88 citations


Cites background from "Decimal floating-point: algorism fo..."

  • ...INTRODUCTION Hardware implementations of decimal arithmetic units have recently gained importance because they provide higher accuracy in financial applications [1]....

    [...]

References
More filters
Book ChapterDOI
01 Jun 1989
TL;DR: It is intended that the machine be fully automatic in character, i.e. independent of the human operator after the computation starts.
Abstract: Inasmuch as the completed device will be a general-purpose computing machine it should contain certain main organs relating to arithmetic, memory- storage, control and connection with the human operator. It is intended that the machine be fully automatic in character, i.e. independent of the human operator after the computation starts. A fuller discussion of the implications of this remark will be given in Chapter 3 below.

412 citations


"Decimal floating-point: algorism fo..." refers background in this paper

  • ...This paper introduces a new approach to decimal floating-point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard....

    [...]

Book
01 Oct 2003
TL;DR: This book provides an introduction to and technical specification of the four major new features of C# 2.0: Generics, Anonymous Methods, Iterators, and Partial Types.
Abstract: C# is a simple, modern, object-oriented, and type-safe programming language that combines the high productivity of rapid application development languages with the raw power of C and C++. Written by the language's architect and design team members, The C# Programming Language is the definitive technical reference for C#. Moving beyond the online documentation, the book provides the complete specification of the language along with descriptions, reference materials, and code samples from the C# design team.The first part of the book opens with an introduction to the language to bring readers quickly up to speed on the concepts of C#. Next follows a detailed and complete technical specification of the C# 1.0 language, as delivered in Visual Studio .NET 2002 and 2003. Topics covered include Lexical Structure, Types, Variables, Conversions, Expressions, Statements, Namespaces, Exceptions, Attributes, and Unsafe Code.The second part of the book provides an introduction to and technical specification of the four major new features of C# 2.0: Generics, Anonymous Methods, Iterators, and Partial Types.Reference tabs and an exhaustive print index allow readers to easily navigate the text and quickly find the topics that interest them most. An enhanced online index allows readers to quickly and easily search the entire text for specific topics.With the recent acceptance of C# as a standard by both the International Organization for Standardization (ISO) and ECMA, understanding the C# specification has become critical. The C# Programming Language is the definitive reference for programmers who want to acquire an in-depth knowledge of C#.0321154916B10142003

380 citations

Journal ArticleDOI
TL;DR: This paper was first published in Mathematical Tables and Other Aids to Computation just after the ENIAC was announced in 1946 and ranks as one of the classic descriptions of the EnIAC.
Abstract: This paper was first published in Mathematical Tables and Other Aids to Computation just after the ENIAC was announced in 1946. It was the major source of technical information about the machine for the scientific world of the time. Even today it ranks as one of the classic descriptions of the ENIAC. This paper is reprinted by the kind permission of the American Mathematical Society and the National Academy of Sciences.

100 citations


"Decimal floating-point: algorism fo..." refers background in this paper

  • ...Initial benchmarks indicate that some applications spend 50% to 90% of their time in decimal processing, because software decimal arithmetic suffers a 100× to 1000× performance penalty over hardware....

    [...]

Journal ArticleDOI
Cohen, Hamacher1
TL;DR: The design of an arithmetic unit called CADAC (clean arithmetic with decimal base and controlled precision) is described, which combines both complex and interval arithmetic at the level of a programming language such as Fortran or PL/I.
Abstract: This paper describes the design of an arithmetic unit called CADAC (clean arithmetic with decimal base and controlled precision). Programming language specifications for carrying out "ideal" floating-point arithmetic are described first. These specifications include detailed requirements for dynamic precision control and exception handling, along with both complex and interval arithmetic at the level of a programming language such as Fortran or PL/I.

57 citations


Additional excerpts

  • ...#) and in many other software libraries....

    [...]