Home
/
Authors
/
Paolo Montuschi

Author

Paolo Montuschi

Other affiliations: Instituto Politécnico Nacional, University of Turin, University of Santiago de Compostela ...read more

Bio: Paolo Montuschi is an academic researcher from Polytechnic University of Turin. The author has contributed to research in topics: Adder & Radix. The author has an hindex of 24, co-authored 126 publications receiving 1987 citations. Previous affiliations of Paolo Montuschi include Instituto Politécnico Nacional & University of Turin.

Topics: Adder, Radix, Square root, Rounding, Token bus network ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Design and Analysis of Approximate Compressors for Multiplication

[...]

Amir Momeni¹, Jie Han², Paolo Montuschi³, Fabrizio Lombardi¹•Institutions (3)

Northeastern University¹, University of Alberta², Polytechnic University of Turin³

01 Apr 2015-IEEE Transactions on Computers

TL;DR: The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio.

...read moreread less

Abstract: Inexact (or approximate) computing is an attractive paradigm for digital processing at nanometric scales. Inexact computing is particularly interesting for computer arithmetic designs. This paper deals with the analysis and design of two new approximate 4-2 compressors for utilization in a multiplier. These designs rely on different features of compression, such that imprecision in computation (as measured by the error rate and the so-called normalized error distance) can meet with respect to circuit-based figures of merit of a design (number of transistors, delay and power consumption). Four different schemes for utilizing the proposed approximate compressors are proposed and analyzed for a Dadda multiplier. Extensive simulation results are provided and an application of the approximate multipliers to image processing is presented. The results show that the proposed designs accomplish significant reductions in power dissipation, delay and transistor count compared to an exact design; moreover, two of the proposed multiplier designs provide excellent capabilities for image multiplication with respect to average normalized error distance and peak signal-to-noise ratio (more than 50 dB for the considered image examples).

...read moreread less

447 citations

Proceedings Article•DOI•

A New Family of High.Performance Parallel Decimal Multipliers

[...]

Alvaro Vazquez¹, Elisardo Antelo¹, Paolo Montuschi²•Institutions (2)

University of Santiago de Compostela¹, Polytechnic University of Turin²

25 Jun 2007

TL;DR: Two novel architectures for parallel decimal multipliers are introduced based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits and three schemes for fast and efficient generation of partial products in parallel are presented.

...read moreread less

Abstract: This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

...read moreread less

130 citations

Journal Article•DOI•

Design and Evaluation of Approximate Logarithmic Multipliers for Low Power Error-Tolerant Applications

[...]

Weiqiang Liu¹, Jiahua Xu¹, Danye Wang¹, Chenghua Wang¹, Paolo Montuschi², Fabrizio Lombardi³ - Show less +2 more•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, Polytechnic University of Turin², Northeastern University³

05 Feb 2018-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The designs of both non-iterative and iterative approximate logarithmic multipliers (ALMs) are studied to further reduce power consumption and improve performance and it is found that the proposed approximate LMs with an appropriate number of inexact bits achieve higher accuracy and lower power consumption than conventional LMs using exact units.

...read moreread less

Abstract: In this paper, the designs of both non-iterative and iterative approximate logarithmic multipliers (ALMs) are studied to further reduce power consumption and improve performance. Non-iterative ALMs, that use three inexact mantissa adders, are presented. The proposed iterative ALMs (IALMs) use a set-one adder in both mantissa adders during an iteration; they also use lower-part-or adders and approximate mirror adders for the final addition. Error analysis and simulation results are also provided; it is found that the proposed approximate LMs with an appropriate number of inexact bits achieve higher accuracy and lower power consumption than conventional LMs using exact units. Compared with conventional LMs with exact units, the normalized mean error distance of 16-bit approximate LMs is decreased by up to 18% and the power-delay product has a reduction of up to 37%. The proposed approximate LMs are also compared with previous approximate multipliers; it is found that the proposed approximate LMs are best suitable for applications allowing larger errors, but requiring lower energy consumption. Approximate Booth multipliers fit applications with less stringent power requirements, but also requiring smaller errors. Case studies for error-tolerant computing applications are provided.

...read moreread less

109 citations

Journal Article•DOI•

Modern Computer Arithmetic

[...]

Paolo Montuschi¹, Jean-Michel Muller²•Institutions (2)

Polytechnic University of Turin¹, Centre national de la recherche scientifique²

01 Sep 2016-IEEE Computer

TL;DR: The authors introduce and prove new algorithms for dividing and square-rooting oating-point expansions, as well as for “normalizing” such expansions, and propose several approximate restoringdivider designs.

...read moreread less

Abstract: A 2009 IEEE Transactions on Computers (TC) guest editorial called computer arithmetic “the mother of all computer research and application topics.” Today, one might question what computer arithmetic still o ers in terms of advancing scienti c research; after all, multiplication and addition haven’t changed. The answer is surprisingly easy: new architectures, processors, problems, application domains, and so forth all require computations and are open to new challenges for computer arithmetic. Big data crunching, exascale computing, low-power constraints, and decimal precision are just a few domains in which advances are implicitly pushing for rapid, deep reshaping of the traditional computer-arithmetic framework. TC (www.computer.org/web/tc) has long published regular submissions as well as special sections on this topic, including one scheduled for 2017. Here, we focus on three recently published papers. In “Parallel Reproducible Summation,” James Demmel and Hong Diep Nguyen (IEEE Trans. Computers, vol. 64, no. 7, 2015, pp. 2060–2070) address result reproducibility in cases where it’s a requirement. They present a technique for floating-point reproducible addition that doesn’t depend on the order in which operations are performed, which makes it appropriate for massively parallel environments. Mioara Joldeş and her colleagues deal with manipulation of oatingpoint expansions in “Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions” (IEEE Trans. Computers, vol. 65, no. 4, 2016, pp. 1197–1210). Such expansions, which are unevaluated sums of a few oatingpoint numbers, might be used when one temporarily needs to represent numerical values with a higher precision than that o ered by the available oating-point format. The authors introduce and prove new algorithms for dividing and square-rooting oating-point expansions, as well as for “normalizing” such expansions. In “On the Design of Approximate Restoring Dividers for Error-Tolerant Applications” (IEEE Trans. Computers, vol. 65, no. 8, 2016, pp. 2522–2533), Linbin Chen and his colleagues propose several approximate restoringdivider designs. Their simulation results show that, compared with nonrestoring division schemes, their designs had superior delay, power dissipation, circuit complexity, and error tolerance. Most striking, the approximate designs o er better error tolerance “for quotient-oriented applications (image processing) than remainder-oriented applications (modulo operations).”

...read moreread less

95 citations

Journal Article•DOI•

Improved Design of High-Performance Parallel Decimal Multipliers

[...]

Alvaro Vazquez¹, Elisardo Antelo², Paolo Montuschi•Institutions (2)

École normale supérieure de Lyon¹, University of Santiago de Compostela²

01 May 2010-IEEE Transactions on Computers

TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

...read moreread less

Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

...read moreread less

93 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

DOI•

International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－

[...]

飯田裕幸, 竹田菊男, 藤本武利

20 Sep 2004

1,387 citations

Book•

Digital arithmetic

[...]

Milo D. Ercegovac, Toms Lang

24 Jun 2003

TL;DR: Digital Arithmetic, two of the field's leading experts, deliver a unified treatment of digital arithmetic, tying underlying theory to design practice in a technology-independent manner, to develop sound solutions, avoid known mistakes, and repeat successful design decisions.

...read moreread less

Abstract: Digital arithmetic plays an important role in the design of general-purpose digital processors and of embedded systems for signal processing, graphics, and communications. In spite of a mature body of knowledge in digital arithmetic, each new generation of processors or digital systems creates new arithmetic design problems. Designers, researchers, and graduate students will find solid solutions to these problems in this comprehensive, state-of-the-art exposition of digital arithmetic. Ercegovac and Lang, two of the field's leading experts, deliver a unified treatment of digital arithmetic, tying underlying theory to design practice in a technology-independent manner. They consistently use an algorithmic approach in defining arithmetic operations, illustrate concepts with examples of designs at the logic level, and discuss cost/performance characteristics throughout. Students and practicing designers alike will find Digital Arithmetic a definitive reference and a consistent teaching tool for developing a deep understanding of the "arithmetic style" of algorithms and designs. Guides readers to develop sound solutions, avoid known mistakes, and repeat successful design decisions. Presents comprehensive coveragefrom fundamental theories to current research trends. Written in a clear and engaging style by two masters of the field. Concludes each chapter with in-depth discussions of the key literature. Includes a full set of over 250 exercises, an on-line appendix with solutions to one-third of the exercises and 600 lecture slides

...read moreread less

742 citations

Journal Article•DOI•

A systematic review of augmented reality applications in maintenance

[...]

Riccardo Palmarini¹, John Ahmet Erkoyuncu¹, Rajkumar Roy¹, Hosein Torabmostaedi•Institutions (1)

Cranfield University¹

01 Feb 2018-Robotics and Computer-integrated Manufacturing

TL;DR: The results indicate a high fragmentation among hardware, software and AR solutions which lead to a high complexity for selecting and developing AR systems.

...read moreread less

Abstract: Augmented Reality (AR) technologies for supporting maintenance operations have been an academic research topic for around 50 years now. In the last decade, major progresses have been made and the AR technology is getting closer to being implemented in industry. In this paper, the advantages and disadvantages of AR have been explored and quantified in terms of Key Performance Indicators (KPI) for industrial maintenance. Unfortunately, some technical issues still prevent AR from being suitable for industrial applications. This paper aims to show, through the results of a systematic literature review, the current state of the art of AR in maintenance and the most relevant technical limitations. The analysis included filtering from a large number of publications to 30 primary studies published between 1997 and 2017. The results indicate a high fragmentation among hardware, software and AR solutions which lead to a high complexity for selecting and developing AR systems. The results of the study show the areas where AR technology still lacks maturity. Future research directions are also proposed encompassing hardware, tracking and user-AR interaction in industrial maintenance is proposed.

...read moreread less

479 citations

Journal Article•DOI•

The State of the Art in Flow Visualization : Dense and Texture-Based Techniques

[...]

Robert S. Laramee¹, Helwig Hauser¹, Helmut Doleisch¹, Benjamin Vrolijk², Frits H. Post², Daniel Weiskopf³ - Show less +2 more•Institutions (3)

VRVis¹, Delft University of Technology², University of Stuttgart³

01 Jun 2004-Computer Graphics Forum

TL;DR: Dense, texture‐based flow visualization techniques are discussed, which attempt to provide a complete, dense representation of the flow field with high spatio‐temporal coherency.

...read moreread less

Abstract: Flow visualization has been a very attractive component of scientific visualization research for a long time. Usually very large multivariate datasets require processing. These datasets often consist of a large number of sample locations and several time steps. The steadily increasing performance of computers has recently become a driving factor for a reemergence in flow visualization research, especially in texture-based techniques. In this paper, dense, texture-based flow visualization techniques are discussed. This class of techniques attempts to provide a complete, dense representation of the flow field with high spatio-temporal coherency. An attempt of categorizing closely related solutions is incorporated and presented. Fundamentals are shortly addressed as well as advantages and disadvantages of the methods.

...read moreread less

392 citations

Journal Article•DOI•

Division algorithms and implementations

[...]

S.F. Obermann¹, Michael J. Flynn¹•Institutions (1)

Stanford University¹

01 Aug 1997-IEEE Transactions on Computers

TL;DR: A taxonomy of division algorithms is presented which classifies the algorithms based upon their hardware implementations and impact on system design, finding that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable.

...read moreread less

Abstract: Many algorithms have been developed for implementing division in hardware. These algorithms differ in many aspects, including quotient convergence rate, fundamental hardware primitives, and mathematical formulations. The paper presents a taxonomy of division algorithms which classifies the algorithms based upon their hardware implementations and impact on system design. Division algorithms can be divided into five classes: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. Many practical division algorithms are hybrids of several of these classes. These algorithms are explained and compared. It is found that, for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area.

...read moreread less

329 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse