scispace - formally typeset
Search or ask a question
Author

Paolo Montuschi

Bio: Paolo Montuschi is an academic researcher from Polytechnic University of Turin. The author has contributed to research in topics: Adder & Radix. The author has an hindex of 24, co-authored 126 publications receiving 1987 citations. Previous affiliations of Paolo Montuschi include Instituto Politécnico Nacional & University of Turin.
Topics: Adder, Radix, Square root, Rounding, Token bus network


Papers
More filters
Journal ArticleDOI
TL;DR: The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio.
Abstract: Inexact (or approximate) computing is an attractive paradigm for digital processing at nanometric scales. Inexact computing is particularly interesting for computer arithmetic designs. This paper deals with the analysis and design of two new approximate 4-2 compressors for utilization in a multiplier. These designs rely on different features of compression, such that imprecision in computation (as measured by the error rate and the so-called normalized error distance) can meet with respect to circuit-based figures of merit of a design (number of transistors, delay and power consumption). Four different schemes for utilizing the proposed approximate compressors are proposed and analyzed for a Dadda multiplier. Extensive simulation results are provided and an application of the approximate multipliers to image processing is presented. The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio (more than 50 dB for the considered image examples).

447 citations

Proceedings ArticleDOI
25 Jun 2007
TL;DR: Two novel architectures for parallel decimal multipliers are introduced based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits and three schemes for fast and efficient generation of partial products in parallel are presented.
Abstract: This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

130 citations

Journal ArticleDOI
TL;DR: The designs of both non-iterative and iterative approximate logarithmic multipliers (ALMs) are studied to further reduce power consumption and improve performance and it is found that the proposed approximate LMs with an appropriate number of inexact bits achieve higher accuracy and lower power consumption than conventional LMs using exact units.
Abstract: In this paper, the designs of both non-iterative and iterative approximate logarithmic multipliers (ALMs) are studied to further reduce power consumption and improve performance. Non-iterative ALMs, that use three inexact mantissa adders, are presented. The proposed iterative ALMs (IALMs) use a set-one adder in both mantissa adders during an iteration; they also use lower-part-or adders and approximate mirror adders for the final addition. Error analysis and simulation results are also provided; it is found that the proposed approximate LMs with an appropriate number of inexact bits achieve higher accuracy and lower power consumption than conventional LMs using exact units. Compared with conventional LMs with exact units, the normalized mean error distance of 16-bit approximate LMs is decreased by up to 18% and the power-delay product has a reduction of up to 37%. The proposed approximate LMs are also compared with previous approximate multipliers; it is found that the proposed approximate LMs are best suitable for applications allowing larger errors, but requiring lower energy consumption. Approximate Booth multipliers fit applications with less stringent power requirements, but also requiring smaller errors. Case studies for error-tolerant computing applications are provided.

109 citations

Journal ArticleDOI
TL;DR: The authors introduce and prove new algorithms for dividing and square-rooting oating-point expansions, as well as for “normalizing” such expansions, and propose several approximate restoringdivider designs.
Abstract: A 2009 IEEE Transactions on Computers (TC) guest editorial called computer arithmetic “the mother of all computer research and application topics.” Today, one might question what computer arithmetic still o ers in terms of advancing scienti c research; after all, multiplication and addition haven’t changed. The answer is surprisingly easy: new architectures, processors, problems, application domains, and so forth all require computations and are open to new challenges for computer arithmetic. Big data crunching, exascale computing, low-power constraints, and decimal precision are just a few domains in which advances are implicitly pushing for rapid, deep reshaping of the traditional computer-arithmetic framework. TC (www.computer.org/web/tc) has long published regular submissions as well as special sections on this topic, including one scheduled for 2017. Here, we focus on three recently published papers. In “Parallel Reproducible Summation,” James Demmel and Hong Diep Nguyen (IEEE Trans. Computers, vol. 64, no. 7, 2015, pp. 2060–2070) address result reproducibility in cases where it’s a requirement. They present a technique for floating-point reproducible addition that doesn’t depend on the order in which operations are performed, which makes it appropriate for massively parallel environments. Mioara Joldeş and her colleagues deal with manipulation of oatingpoint expansions in “Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions” (IEEE Trans. Computers, vol. 65, no. 4, 2016, pp. 1197–1210). Such expansions, which are unevaluated sums of a few oatingpoint numbers, might be used when one temporarily needs to represent numerical values with a higher precision than that o ered by the available oating-point format. The authors introduce and prove new algorithms for dividing and square-rooting oating-point expansions, as well as for “normalizing” such expansions. In “On the Design of Approximate Restoring Dividers for Error-Tolerant Applications” (IEEE Trans. Computers, vol. 65, no. 8, 2016, pp. 2522–2533), Linbin Chen and his colleagues propose several approximate restoringdivider designs. Their simulation results show that, compared with nonrestoring division schemes, their designs had superior delay, power dissipation, circuit complexity, and error tolerance. Most striking, the approximate designs o er better error tolerance “for quotient-oriented applications (image processing) than remainder-oriented applications (modulo operations).”

95 citations

Journal ArticleDOI
TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

93 citations


Cited by
More filters
Book
24 Jun 2003
TL;DR: Digital Arithmetic, two of the field's leading experts, deliver a unified treatment of digital arithmetic, tying underlying theory to design practice in a technology-independent manner, to develop sound solutions, avoid known mistakes, and repeat successful design decisions.
Abstract: Digital arithmetic plays an important role in the design of general-purpose digital processors and of embedded systems for signal processing, graphics, and communications. In spite of a mature body of knowledge in digital arithmetic, each new generation of processors or digital systems creates new arithmetic design problems. Designers, researchers, and graduate students will find solid solutions to these problems in this comprehensive, state-of-the-art exposition of digital arithmetic. Ercegovac and Lang, two of the field's leading experts, deliver a unified treatment of digital arithmetic, tying underlying theory to design practice in a technology-independent manner. They consistently use an algorithmic approach in defining arithmetic operations, illustrate concepts with examples of designs at the logic level, and discuss cost/performance characteristics throughout. Students and practicing designers alike will find Digital Arithmetic a definitive reference and a consistent teaching tool for developing a deep understanding of the "arithmetic style" of algorithms and designs. Guides readers to develop sound solutions, avoid known mistakes, and repeat successful design decisions. Presents comprehensive coveragefrom fundamental theories to current research trends. Written in a clear and engaging style by two masters of the field. Concludes each chapter with in-depth discussions of the key literature. Includes a full set of over 250 exercises, an on-line appendix with solutions to one-third of the exercises and 600 lecture slides

742 citations

Journal ArticleDOI
TL;DR: The results indicate a high fragmentation among hardware, software and AR solutions which lead to a high complexity for selecting and developing AR systems.
Abstract: Augmented Reality (AR) technologies for supporting maintenance operations have been an academic research topic for around 50 years now. In the last decade, major progresses have been made and the AR technology is getting closer to being implemented in industry. In this paper, the advantages and disadvantages of AR have been explored and quantified in terms of Key Performance Indicators (KPI) for industrial maintenance. Unfortunately, some technical issues still prevent AR from being suitable for industrial applications. This paper aims to show, through the results of a systematic literature review, the current state of the art of AR in maintenance and the most relevant technical limitations. The analysis included filtering from a large number of publications to 30 primary studies published between 1997 and 2017. The results indicate a high fragmentation among hardware, software and AR solutions which lead to a high complexity for selecting and developing AR systems. The results of the study show the areas where AR technology still lacks maturity. Future research directions are also proposed encompassing hardware, tracking and user-AR interaction in industrial maintenance is proposed.

479 citations

Journal ArticleDOI
TL;DR: Dense, texture‐based flow visualization techniques are discussed, which attempt to provide a complete, dense representation of the flow field with high spatio‐temporal coherency.
Abstract: Flow visualization has been a very attractive component of scientific visualization research for a long time. Usually very large multivariate datasets require processing. These datasets often consist of a large number of sample locations and several time steps. The steadily increasing performance of computers has recently become a driving factor for a reemergence in flow visualization research, especially in texture-based techniques. In this paper, dense, texture-based flow visualization techniques are discussed. This class of techniques attempts to provide a complete, dense representation of the flow field with high spatio-temporal coherency. An attempt of categorizing closely related solutions is incorporated and presented. Fundamentals are shortly addressed as well as advantages and disadvantages of the methods.

392 citations

Journal ArticleDOI
TL;DR: A taxonomy of division algorithms is presented which classifies the algorithms based upon their hardware implementations and impact on system design, finding that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable.
Abstract: Many algorithms have been developed for implementing division in hardware. These algorithms differ in many aspects, including quotient convergence rate, fundamental hardware primitives, and mathematical formulations. The paper presents a taxonomy of division algorithms which classifies the algorithms based upon their hardware implementations and impact on system design. Division algorithms can be divided into five classes: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. Many practical division algorithms are hybrids of several of these classes. These algorithms are explained and compared. It is found that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area.

329 citations