scispace - formally typeset
Search or ask a question

Showing papers by "José Monteiro published in 2012"


Journal ArticleDOI
TL;DR: This article introduces an exact common subexpression elimination (CSE) algorithm that formalizes the minimization of the number of operations as a 0-1 integer linear programming problem, and introduces a CSE heuristic algorithm that iteratively finds the most common 2-term subexpressions with the minimum conflicts among the expressions.
Abstract: This article addresses the problem of finding the fewest numbers of addition and subtraction operations in the multiplication of a constant matrix with an input vector---a fundamental operation in many linear digital signal processing transforms. We first introduce an exact common subexpression elimination (CSE) algorithm that formalizes the minimization of the number of operations as a 0-1 integer linear programming problem. Since there are still instances that the proposed exact algorithm cannot handle due to the NP-completeness of the problem, we also introduce a CSE heuristic algorithm that iteratively finds the most common 2-term subexpressions with the minimum conflicts among the expressions. Furthermore, since the main drawback of CSE algorithms is their dependency on a particular number representation, we propose a hybrid algorithm that initially finds promising realizations of linear transforms using a numerical difference method, and then applies the proposed CSE algorithm to utilize the common subexpressions iteratively. The experimental results on a comprehensive set of instances indicate that the proposed approximate algorithms find competitive results with those of the exact CSE algorithm and obtain better solutions than the prominent, previously proposed, heuristics. It is also observed that our solutions yield significant area reductions in the design of linear transforms after circuit synthesis, compared to direct realizations of linear transforms.

21 citations


Proceedings ArticleDOI
12 Mar 2012
TL;DR: This paper introduces a high-level synthesis algorithm that optimizes the area of the MCM operation and, consequently, of the FIR filter design, on field programmable gate arrays (FPGAs) by taking into account the implementation cost of each addition and subtraction operation in terms of the number of fundamental building blocks of FPGAs.
Abstract: The multiple constant multiplications (MCM) operation, which realizes the multiplication of a set of constants by a variable, has a significant impact on the complexity and performance of the digital finite impulse response (FIR) filters. Over the years, many high-level algorithms and design methods have been proposed for the efficient implementation of the MCM operation using only addition, subtraction, and shift operations. The main contribution of this paper is the introduction of a high-level synthesis algorithm that optimizes the area of the MCM operation and, consequently, of the FIR filter design, on field programmable gate arrays (FPGAs) by taking into account the implementation cost of each addition and subtraction operation in terms of the number of fundamental building blocks of FPGAs. It is observed from the experimental results that the solutions of the proposed algorithm yield less complex FIR filters on FPGAs with respect to those whose MCM part is implemented using prominent MCM algorithms and design methods.

10 citations


Proceedings ArticleDOI
05 Nov 2012
TL;DR: An exact algorithm is presented that formalizes the MTCM problem as a 0-1 integer linear programming (ILP) problem when constants are defined under a number representation and a local search method is introduced that includes an efficient MCM algorithm.
Abstract: The multiple constant multiplications (MCM) problem, that is defined as finding the minimum number of addition and subtraction operations required for the multiplication of multiple constants by an input variable, has been the subject of great interest since the complexity of many digital signal processing (DSP) systems is dominated by an MCM operation. This paper introduces a variant of the MCM problem, called multiple tunable constant multiplications (MTCM) problem, where each constant is not fixed as in the MCM problem, but can be selected from a set of possible constants. We present an exact algorithm that formalizes the MTCM problem as a 0--1 integer linear programming (ILP) problem when constants are defined under a number representation. We also introduce a local search method for the MTCM problem that includes an efficient MCM algorithm. Furthermore, we show that these techniques can be used to solve various optimization problems in finite impulse response (FIR) filter design and we apply them to one of these problems. Experimental results clearly show the efficiency of the proposed methods when compared to prominent algorithms designed for the MCM problem.

7 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the high-level algorithms obtain better solutions than prominent algorithms designed for the minimization of the number of operations in terms of gate-level area and their solutions lead to less complex digit-serial MCM designs.

7 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: A design implementing a quaternary low-power high-speed look-up table based on a voltage-mode structure and using only standard CMOS technology is presented.
Abstract: Interconnect has become preponderant in many aspects of circuit design, namely delay, power and area. This effect is particularly true for FPGAs, where interconnect is often the most limiting factor. Quaternary logic offers a means to reduce interconnect since each circuit wire can, in principle, carry the same information as two binary wires. We have proposed in [1] a design implementing a quaternary low-power high-speed look-up table. The main features of this circuit are being based on a voltage-mode structure and using only standard CMOS technology. In this paper we present the design of a prototype implementation and experimental results. These results are discussed and conclusions are drawn that provide further guidelines for improvement.

5 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: This paper presents an efficient area and power multiplication part of radix-2 FFT (Fast Fourier Transform) architecture that consists on the decomposition of the real and imaginary coefficients of the twiddle factors into less complex ones, so that the multiplication parts of the butterfly can be implemented with less area, what leads to the reduction of its power consumption.
Abstract: This paper presents an efficient area and power multiplication part of radix-2 FFT (Fast Fourier Transform) architecture. The butterfly plays a central role in the FFT computation, and the multiplication part dominates its complexity. It is composed by a product of complex data and complex coefficients named twiddle factors. The proposed strategy consists on the decomposition of the real and imaginary coefficients of the twiddle factors into less complex ones, so that the multiplication part of the butterfly can be implemented with less area, what leads to the reduction of its power consumption. The strategy also includes the use of Constant Matrix Multiplication (CMM) and gate level approaches in the decomposed coefficients. A control unit is responsible for selecting the correct constant to be used after the decomposition. The proposed architectures were synthesized using SYNOPSYS Design Compiler and the UMC130nm technology. The results show that reductions of 10% in area and 8% in power could be achieved on average, when compared with state of the art solutions.

5 citations


Proceedings ArticleDOI
12 Nov 2012
TL;DR: This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning which transforms the body of Megab-locks into an acyclic dataflow graph which can be fully pipelined and is based on the atomic execution of loop iterations.
Abstract: Dynamic partitioning is a promising technique where computations are transparently moved from a General Purpose Processor (GPP) to a coprocessor during application execution. To be effective, the mapping of computations to the coprocessor needs to consider aggressive optimizations. One of the mapping optimizations is loop pipelining, a technique extensively studied and known to allow substantial performance improvements. This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning. The technique transforms the body of Megab-locks into an acyclic dataflow graph which can be fully pipelined and is based on the atomic execution of loop iterations. For a set of 9 benchmarks without memory operations, we generated pipelined hardware versions of the loops and estimate that the presented loop pipelining technique increases the average speedup of non-pipelined coprocessor accelerated designs from 1.6× to 2.2×. For a larger set of 61 benchmarks which include memory operations, the technique achieves a speedup increase from 2.5× to 5.6×.

2 citations


Journal ArticleDOI
TL;DR: This paper proposes a method for determining the exact conditions for worst case switching activity in a small circuit area during a short time interval and shows how this method can be combined with partitioning to allow for accurate full circuit verification.
Abstract: Relentless advances in IC technologies have fueled steady increases on fabricated component density and working frequencies. As feature sizes decrease to nanometer scales, an increase in switching activity per unit of area and time is observed. When extreme switching activity occurs in a small region of an integrated circuit, malfunctions may be triggered that compromise behavior. This can be either a consequence of a decrease in bias levels in the power grid caused by IR-Drop, or due to unexpected glitching on gates' outputs caused by ground bounce. For proper circuit verification, both conditions have to be accurately estimated and accounted for. Achieving this in an accurate manner for a large circuit is a very challenging problem. In this paper we propose and compare methods for the identification of the conditions leading to extreme situations of switching activity in integrated circuits. Our approach is based on both spatial and time partitioning which are used to address the accuracy and computational requirements. We propose a method for determining the exact conditions for worst case switching activity in a small circuit area during a short time interval. We then show how this method can be combined with partitioning to allow for accurate full circuit verification.