scispace - formally typeset
Search or ask a question

Showing papers on "Decimal published in 2010"


Journal ArticleDOI
TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

93 citations


Journal ArticleDOI
TL;DR: In this paper, a large sample of children from Grades 3 to 6 performed a numerical comparison task on different categories of pairs of decimal fractions and the success rate and the type of error they made varied with age and categories.

62 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: This paper presents novel high speed low power architecture for fixed bit binary to BCD conversion which is at least 28% better in terms of power-delay product than the existing designs.
Abstract: Decimal data processing applications have grown exponentially in recent years thereby increasing the need to have hardware support for decimal arithmetic. Binary to BCD conversion forms the basic building block of decimal digit multipliers. This paper presents novel high speed low power architecture for fixed bit binary to BCD conversion which is at least 28% better in terms of power-delay product than the existing designs.

39 citations


Proceedings ArticleDOI
24 Mar 2010
TL;DR: This paper analyzes the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks and indicates that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.
Abstract: Human-centric applications, like financial and commercial, depend on decimal arithmetic since the results must match exactly those obtained by human calculations. The IEEE-754 2008 standard for floating point arithmetic has definitely recognized the importance of decimal for computer arithmetic. A number of hardware approaches have already been proposed for decimal arithmetic operations, including addition, subtraction, multiplication and division. However, few efforts have been done to develop decimal IP cores able to take advantage of the binary multipliers available in most reconfigurable computing architectures. In this paper, we analyze the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks. The proposed circuits were implemented in a Xilinx Virtex 4 FPGA. The results indicate that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.

31 citations


Journal ArticleDOI
TL;DR: An overview of DFP arithmetic in IEEE 754-2008 is given, processors that provide hardware and instruction set support for decimal arithmetic are described, and a survey of hardware designs for decimal addition, subtraction, multiplication, and division is provided.
Abstract: Decimal data and decimal arithmetic operations are ubiquitous in daily life. Although microprocessors normally use binary arithmetic for computations, decimal arithmetic is often required in financial and commercial applications. Due to the increasing importance of and demand for decimal arithmetic, decimal floating-point (DFP) formats and operations are specified in the revised IEEE Standard for Floating-Point Arithmetic (IEEE 754-2008). This paper provides a survey of hardware designs for decimal arithmetic. It gives an overview of DFP arithmetic in IEEE 754-2008, describes processors that provide hardware and instruction set support for decimal arithmetic, and provides a survey of hardware designs for decimal addition, subtraction, multiplication, and division. Finally, it describes potential areas for future research.

29 citations


Patent
27 May 2010
TL;DR: In this article, the authors proposed a method to perform high-speed gradation conversion of an image having pixels whose values have a plurality of components, where the image processor can be applied to a device for gradient conversion.
Abstract: PROBLEM TO BE SOLVED: To perform high-speed gradation conversion of an image having pixels whose values have a plurality of components. SOLUTION: In the image processor, a quantization value OUT of an additional value obtained by adding a pixel value IN of the image and a filter output U that is a result of filtering of a quantization error Q is obtained as the sum of an integer part I of the pixel value IN and an integer part J of a decimal additional value X that is the sum of the filter output U and a decimal part P of the pixel value IN on the basis of a decimal point of the quantization value OUT, and the quantization error Q of the quantization value OUT is obtained as a decimal part of the decimal additional value X that is the sum of the filter output U and the decimal part P of the pixel value IN on the basis of the decimal point of the quantization value OUT. Based thereon, in an error diffusion part 51, each of the pixel value IN and the decimal additional value X is separated into the integer part and the decimal part, and the quantization value OUT and the quantization error Q are obtained to perform the gradation conversion using an error diffusion method. The image processor can be applied to a device for gradation conversion. COPYRIGHT: (C)2010,JPO&INPIT

17 citations


Proceedings ArticleDOI
05 Jun 2010
TL;DR: Three algorithms for accurately converting floating-point numbers to decimal representation are presented, fast (up to 4 times faster than commonly used algorithms that use high-precision integers) and correct: any printed number will evaluate to the same number, when read again.
Abstract: We present algorithms for accurately converting floating-point numbers to decimal representation. They are fast (up to 4 times faster than commonly used algorithms that use high-precision integers) and correct: any printed number will evaluate to the same number, when read again.Our algorithms are fast, because they require only fixed-size integer arithmetic. The sole requirement for the integer type is that it has at least two more bits than the significand of the floating-point number. Hence, for IEEE 754 double-precision numbers (having a 53-bit significand) an integer type with 55 bits is sufficient. Moreover we show how to exploit additional bits to improve the generated output.We present three algorithms with different properties: the first algorithm is the most basic one, and does not take advantage of any extra bits. It simply shows how to perform the binary-to-decimal transformation with the minimal number of bits. Our second algorithm improves on the first one by using the additional bits to produce a shorter (often the shortest) result.Finally we propose a third version that can be used when the shortest output is a requirement. The last algorithm either produces optimal decimal representations (with respect to shortness and rounding) or rejects its input. For IEEE 754 double-precision numbers and 64-bit integers roughly 99.4% of all numbers can be processed efficiently. The remaining 0.6% are rejected and need to be printed by a slower complete algorithm.

17 citations


Proceedings ArticleDOI
01 Nov 2010
TL;DR: Each engine solves constraints describing all corner cases of the operation, and generates test vectors to verify these corner cases in the tested design, and describes the constraints of each operation and the steps of each engine to solve these constraints.
Abstract: Decimal floating-point designs require a verification process to prove that the design is in compliance with the IEEE Standard for Floating-Point Arithmetic (IEEE Std 754-2008). Our work represents three engines, the first engine for the verification of decimal addition-subtraction operation, the second for the verification of decimal multiplication operation, and the third for the verification of decimal fused-multiple-add operation. Each engine solves constraints describing all corner cases of the operation, and generates test vectors to verify these corner cases in the tested design. The paper describes the constraints of each operation and the steps of each engine to solve these constraints.

14 citations


Proceedings ArticleDOI
24 Mar 2010
TL;DR: The work reported in this paper is devoted to the FPGA implementation of decimal dividers, which implements a decimal non-restoring like algorithm and uses ripple-carry operators and an SRT-like algorithm that uses carry-free operators.
Abstract: The work reported in this paper is devoted to the FPGA implementation of decimal dividers. Two types of dividers are described. The first one implements a decimal non-restoring like algorithm and uses ripple-carry operators. For medium size operators it gives a good compromise between cost and latency. The second one implements an SRT-like algorithm and uses carry-free operators. Their latencies are close to that of a binary radix-16 divider with the same range, implemented in the same FPGA.

13 citations


Journal ArticleDOI
TL;DR: This article argued that decimalization of currency diffused as a consequence of three forms of isomorphism: normative, coercive, and mimetic, and it is ambiguous as to whether the normative isomorphisms was well founded.
Abstract: This paper argues that decimalization of currency diffused as a consequence of all three forms of isomorphism: normative, coercive, and mimetic. Furthermore, it is ambiguous as to whether the normative isomorphism was well founded. The patterns of denominations show variety by country as a consequence of a number of factors, including cultural ones. These patterns tend to follow a powers-of-two (binary) principle for smaller denominations and a purer decimal principle for larger denominations, reflecting their utility for cash transactions and for store-of-value functions, respectively.

13 citations


Journal Article
TL;DR: In this paper, the use of representational tools and activity that helps students to understand decimal size and decimal place value and which encourages use of fractional language to describe decimals is discussed.
Abstract: Use of representational tools and activity that helps students to understand decimal size and decimal place value and which encourages the use of fractional language to describe decimals, is discussed. Use of the game, Colour in Decimats, to introduce students to the Decimat as a model for representing decimals is explained.

Patent
Steffen Wittmann1, Thomas Wedi1
06 Jan 2010
TL;DR: In this article, a two-dimensional adaptive interpolation filter coefficient decision method is proposed, which can effectively decide an appropriate filter coefficient with a small calculation amount, provided is a two dimensional adaptive filter coefficient.
Abstract: Provided is a two-dimensional adaptive interpolation filter coefficient decision method which can effectively decide an appropriate filter coefficient with a small calculation amount. The method includes: a motion detection step (S100) which detects an image motion of the block from a reference picture as a motion vector with a decimal pixel accuracy for each of blocks constituting an object picture; an identification step (S102) which identifies one or more blocks having a motion vector indicating a decimal pixel position (p, q) on the reference picture among the blocks having the motion vector of the detected decimal pixel accuracy; and adecision step (S104) which decides a filter coefficient of the decimal pixel position (p, q) according to an image of the one or more blocks identified by the identification step and an image of one or more blocks on the reference picture indicated by the motion vector of the one ore more blocks.

Proceedings ArticleDOI
31 Aug 2010
TL;DR: An IEEE 754-2008 compliant parallel decimal floating-point multiplier designed to exploit the features of Virtex-5 FPGAs and implements early estimation of the shift-left amount and efficient decimal rounding.
Abstract: Decimal floating point operations are important for applications that cannot tolerate errors from conversions between binary and decimal formats, for instance, scientific, commercial, and financial applications. In this paper we present an IEEE 754-2008 compliant parallel decimal floating-point multiplier designed to exploit the features of Virtex-5 FPGAs. It is an extension to a previously published decimal fixed-point multiplier. The decimal floating-point multiplier implements early estimation of the shift-left amount and efficient decimal rounding. Additionally, it provides all required rounding modes, exception handling, overflow, and gradual underflow. Several pipeline stages can be added to increase throughput. Furthermore, different modifications are analyzed including shifting by means of hard-wired multipliers and delayed carry propagation adders.

Journal ArticleDOI
TL;DR: The proposed digit set is faithfully encoded as a mix of posibits, negabits, and unibits and is shown to obviate the need for any compare-to-9 operations and leads to minimal penalty subtraction using the addition circuitry.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: A combined decimal division/square root scheme using limited-precision multipliers, adders, and table-lookups is presented, which uses short operators which leads to compact modules, and may have an advantage at the layout level as well as in power optimization.
Abstract: A combined decimal division/square root scheme using limited-precision multipliers, adders, and table-lookups is presented The combined algorithm, except in the initialization steps, uses a slightly modified digit-recurrence algorithm for division with limited-precision primitives We describe the proposed combined division/square root algorithm, a design, and its FPGA implementation on a Xilinx Virtex-6 FPGA We present the cost and delay characteristics for precisions of 7 (single-precision), 8, 14 (double-precision), 16, 24, and 32 decimal digits The costs range from 1384 to 4066 LUTs with maximum clock frequencies around 68MHz, and latencies ranging from 102 to 485 ns (with unoptimized routing delays) The proposed scheme uses short (2 to 4 digit-wide) operators which leads to compact modules, and may have an advantage at the layout level as well as in power optimization The proposed approach is general and can be adapted to other higher radix combined division/square root implementations

Journal Article
TL;DR: In this article, a micro-genetic approach was used to capture detail of the teaching intervention used to facilitate development in student thought, and a framework for considering cognitive conflict in lesson design was presented.
Abstract: This paper reports on an investigation into managing cognitive conflict in the context of student learning about decimal magnitude. The influence of prior constructs is examined through a brief review of the literature. A micro-genetic approach was used to capture detail of the teaching intervention used to facilitate development in student thought. A framework for considering cognitive conflict in lesson design is presented, and a case is made for the use of measurement tasks to generate data.

01 Jan 2010
TL;DR: The system of Roman numerals, which was widely used throughout the Roman empire, but was replaced in the period between 1200 and 1500 CE by a decimal place-value system using what have come to be known as Hindu-Arabic numerals.
Abstract: 1.1 Mathematical concepts and notation In recent years philosophers of mathematics have begun to show greater interest in the activities involved in doing mathematics. This turn to mathematical practice is motivated in part by the belief that an understanding of what mathematicians do will lead to a better understanding of what mathematics is. One obvious activity that mathematicians engage in is that of writing and manipulating meaningful symbols like numerals, formulas, and diagrams. These notational systems are used to represent abstract concepts and objects, and by operating with their symbolic representations we can learn about the their properties. Since such notational systems are crucial ingredients of mathematical practice, a better understanding of such systems and the way we handle them also contributes to a more encompassing understanding of mathematics. Natural numbers are among the most fundamental mathematical objects. In the history of mankind different linguistic systems for their representation have been invented, used, and forgotten. Most readers will have some familiarity with the system of Roman numerals, which was widely used throughout the Roman empire, but was replaced in the period between 1200 and 1500 CE by a decimal place-value system using what have come to be known as Hindu-Arabic numerals. The exact reasons for this transition are still largely in the dark, although popular accounts of this development speculate frequently about certain deficiencies of the Roman numeral


Journal ArticleDOI
01 Sep 2010
TL;DR: Evidence that prospective elementary teachers’ conceptions of the repeating decimal .999… = 1 formed their misconceptions about repeating decimals in early grades rather than in higher-level courses that cover limit notions is presented and that higher- level courses can lead students to reason incorrectly about infinitely repeating decIMals.
Abstract: This article investigates prospective elementary teachers’ conceptions of the repeating decimal .999.... Five students from a first-semester undergraduate course “Mathematics for Elementary School ...

Journal ArticleDOI
01 Aug 2010
TL;DR: A parallel implementation of floating-point real FFT-based multiplication is used, since the key operation for fast multiple-precision arithmetic is multiplication.
Abstract: We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel computers. A parallel implementation of floating-point real FFT-based multiplication is used, since the key operation for fast multiple-precision arithmetic is multiplication. The operation for releasing propagated carries and borrows in multiple-precision addition, subtraction and multiplication was also parallelized. More than 2.576 trillion decimal digits of @p were computed on 640 nodes of Appro Xtreme-X3 (648 nodes, 147.2GFlops/node, 95.4TFlops peak performance) with a computing elapsed time of 73h 36min which includes the time required for verification.

14 Oct 2010
TL;DR: This paper presents a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices that halve the area and latency of previous proposals, and presents area and delay figures close to those of optimal binary adder trees.
Abstract: The research and development of hardware designs for decimal arithmetic is currently going under an intense activity. For most part, the methods proposed to implement fixed and floating point operations are intended for ASIC designs. Thus, a direct mapping or adaptation of these techniques into a FPGA could be far from an optimal solution. Only a few studies have considered new methods more suitable for FPGA implementations. A basic operation that has not received enough attention in this context is multi-operand BCD addition. For example, it is of interest for low latency implementations of decimal fixed and floating point multipliers and decimal fused multiply-add units. We have explored the most representative proposals for multi-operand BCD addition and found that the resultant implementations in FPGAs are still very inefficient in terms of both area and latency when compared to their binary counterparts. In this paper we present a new method for fast and efficient implementation of multi-operand BCD addition in current FPGA devices. In particular, our proposal maps quite well into the slice structure of the Xilinx Virtex-5/Virtex-6 families and it is highly pipelineable. The synthesis results for a Virtex-6 device indicate that our implementations halve the area and latency of previous proposals, presenting area and delay figures close to those of optimal binary adder trees.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: A novel method for hardware design of combined binary/decimal multi-operand adders based on binary CSA trees, which are of interest for VLSI implementation of high performance multipliers and other low latency arithmetic units.
Abstract: We present a novel method for hardware design of combined binary/decimal multi-operand adders. More specifically, we apply this method to architectures based on binary CSA (carry-save adder) trees, which are of interest for VLSI implementation of high performance multipliers and other low latency arithmetic units. A remarkable feature of the proposed method is that it allows the reuse of any binary CSA for computing the sum of BCD operands. Decimal corrections are performed in parallel, separately from the computation of the binary sum, such that the layout of the binary carry-save adder does not require any further rearrangement. As a result, the latency of the binary operation is unaffected by the incorporation of hardware support for decimal, while the latency for the decimal mode is close to the latency figures of dedicated decimal multi-operand adders. We show that our combined architecture is competitive in terms of area and delay with respect to other representative proposals, and that it has a more regular layout when implemented in a submicron VLSI technology.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: A redundant decimal floating-point adder that allows for a carry-free addition hence the addition process does not depend on the width of the operands is proposed.
Abstract: Decimal floating-point addition is important for financial and business applications. There has been a growing interest in implementing decimal floating-point adders in hardware to enhance the speed of the decimal floating-point operations. In this paper, a redundant decimal floating-point adder is proposed with the ultimate objective of enhancing the speed of the decimal floating-point addition. Redundancy allows for a carry-free addition hence the addition process does not depend on the width of the operands. The results show that our design outperforms the conventional design in both the Decimal64 and the Decimal128 IEEE format.

19 May 2010
TL;DR: In this paper, the impact of self-regulation instruction on fractions and decimal numbers on academic achievement and attitude towards mathematics in elementary school program in Turkey was found out and the results in the study suggested that the students in the experimental group had higher academic achievement on fraction and decimal number, and attitude scores in mathematics than the control group.
Abstract: The purpose of this study is to find out the impact of self-regulation instruction on fractions and decimal numbers on academic achievement and attitude towards mathematics in elementary school program in Turkey. The subjects of the study were fourth year elementary school students (N=60). Zimmerman, Bonner and Kovach’s (1996) model related to selfregulation instruction was adapted to fraction and decimal numbers teaching activities and carried out for six weeks during the academic year. Self-regulated learning instruction was implemented in the experimental group. The results in the study suggested that the students in the experimental group had higher academic achievement on fraction and decimal numbers, and attitude scores in mathematics than the control group.

Book ChapterDOI
01 Jan 2010
TL;DR: In fact, there are at least two ways of reading Wittgenstein's Tractatus: (1) a physical and sequential one in strictly numerical order, as in the original edition; and (2) a logical-hierarchical one by means of the top-down structure of the decimal numbers (the tree-like arrangement, see Bazzocchi, 2008) as mentioned in this paper.
Abstract: Both the Tractatus Logico-Philosophicus and the so-called ‘Prototractatus’ manuscript1 contain numbered propositions. The 726 propositions of the Tractatus are printed in numerical order starting with statements 1, 1.1, 1.11, 1.12, 1.13, 1.2 and ending with 6.522, 6.53, 6.54, 7, while in the ‘Prototractatus’ the same or similar propositions appear in disarray, without any obvious criteria of arrangement or possible reading. More careful consideration of the numbering system, which is substantially the same in the two texts, reveals alternative virtual arrangements. In fact, there are at least two ways of reading Wittgenstein’s Tractatus: (1) a physical and sequential one, in strictly numerical order, as in the original edition; and (2) a logical-hierarchical one by means of the top-down structure of the decimal numbers (the tree-like arrangement, see Bazzocchi, 2008). But there are at least three ways of reading the ‘Prototractatus’ notebook: (1) a physical one, following the compositional order of the notebook pages; (2) a sequential one, reordering propositions in strictly numerical order; (3) a logical-hierarchical one, by means of the top-down structure of the decimal numbers (the tree-like arrangement). Information about the chronological order of composition is, of course, the real contribution of the ‘Prototractatus’ notebook, and it is a pity that in the printed editions rearrangement of the propositions in numerical order, in strict accordance with the second point of view, has hidden the order of composition from most critics.2

Journal ArticleDOI
TL;DR: The time optimization of the combinational adder of the decimal digits encoded in the Johnson-Mobius code is performed and the implementation of the optimal variant of the adder using quantum-dot cellular automata is described.
Abstract: The time optimization of the combinational adder of the decimal digits encoded in the Johnson-Mobius code is performed. A number of structural variants of the decimal adder are proposed and analyzed. The implementation of the optimal variant of the adder using quantum-dot cellular automata is described. A successful computer simulation of the adder is performed. The estimations of the hardware costs and the signal propagation delay are given in comparison with the original version of the adder developed earlier by the author.

Journal ArticleDOI
TL;DR: An efficient algorithm to compute base-10 logarithm of a decimal number based on a digit-by-digit iterative computation that does not require look-up tables, curve fitting, decimal-binary conversion, or division operations is presented.
Abstract: The paper presents an efficient algorithm to compute base-10 logarithm of a decimal number. The algorithm uses a 64-bit floating-point arithmetic, and is based on a digit-by-digit iterative computation that does not require look-up tables, curve fitting, decimal-binary conversion, or division operations. It is the first FPGA prototype of its kind that uses a 64-bit (decimal 16-digit) precision. Two numerical examples have been presented for the purpose of illustration. The algorithm produces very accurate result with a maximum absolute error of 3.53x10 . The architecture is pipelined and implemented on to the Xilinx Virtex2p FPGA. It costs 6,752 logic cells, outputs at a minimum rate of 51 mega-samples/sec, and consumes 125.7 mW of power. The scheme is very suitable for timing and accuracy critical applications and compliant with the IEEE754-2008 standard (decimal64 format).

Journal ArticleDOI
TL;DR: This review article presents the decimal representation of any real number and investigates whether this representation is unique or not, and which sequences of digits cannot be accepted as decimal representations of real numbers.
Abstract: The representation of natural numbers in decimal form is an unequivocal procedure while for the representation of real numbers some ambiguities concerning the existence of infinitely many digits equal to 9 still emerge. One of the most frequently confronted misunderstandings is whether 0.999 � equals 1 or not, and if not what number does this sequence of digits represent. In this review article, the decimal representation of any real number is explicitly presented. In particular, it is investigated whether this representation is unique or not. A condition is given that guarantees the uniqueness of decimal representation for a subset of real numbers, while for the remaining numbers two decimal representations exist. It is also investigated which sequences of digits cannot be accepted as decimal representations of real numbers. Moreover, analogous results are presented in the case where a real number is represented in systems with base n, n being a natural number greater than 1.

Journal ArticleDOI
TL;DR: The authors propose a new concept of conversion of a binary number having some fraction to its equivalent decimal counterpart and its vice-versa and use optical tree and some nonlinear material based switches to perform this operation.
Abstract: Optics is found as a very potential candidate in information processing and computing. Several all optical methods have been proposed for implementation of all optical logic and arithmetic devices during the last few decades. In this regard, the role of optical tree architecture can be mentioned as an important approach for conversion of an optical data from binary to decimal and vice-versa. In this communication, the authors propose a new concept of conversion of a binary number having some fraction (fractional value) to its equivalent decimal counterpart and its vice-versa. To perform this operation, optical tree and some nonlinear material based switches are used properly.

Patent
24 Jun 2010
TL;DR: In this article, the authors proposed a carry-based arithmetic add/subtract operation on two BCD numbers independent of which BCD number is of a greater magnitude, where the first magnitude is greater than, equal to, or less than the second magnitude.
Abstract: Binary code decimal (BCD) arithmetic add/subtract operations on two BCD numbers independent of which BCD number is of a greater magnitude include, responsive to the BCD arithmetic add/subtract operation being a subtract operation, performing a BCD arithmetic subtraction operation on a first BCD number and a second BCD number, the first BCD number having a first magnitude and the second BCD number having a second magnitude. The first magnitude is greater than, equal to, or less than the second magnitude. The performing includes: in parallel to a carry generation, partial sums or partial differences of the first and second BCD numbers are computer such that a final result in signed magnitude form is selectable from the partial sums or differences based on carry information without any post processing steps.