Showing papers on "Logarithmic number system published in 2009"

PDF

Open Access

Journal Article•DOI•

A System-on-a-Chip Implementation for Embedded Real-Time Model Predictive Control

[...]

Panagiotis D. Vouzis¹, Mayuresh V. Kothare¹, Leonidas Bleris², Mark G. Arnold¹•Institutions (2)

14 Apr 2009-IEEE Transactions on Control Systems and Technology

TL;DR: The proposed MPC architecture is implemented by means of a hardware description language and then prototyped and emulated on a field-programmable gate array and yields a small-in-size and energy-efficient implementation that is capable of solving the aforementioned problems on the order of milliseconds.

...read moreread less

Abstract: This paper presents a hardware architecture for embedded real-time model predictive control (MPC). The computational cost of an MPC problem, which relies on the solution of an optimization problem at every time step, is dominated by operations on real matrices. In order to design an efficient and low-cost application-specific processor, we analyze the computational cost of MPC, and we propose a limited-resource host processor to be connected with an application-specific matrix coprocessor. The coprocessor uses a 16-b logarithmic number system arithmetic unit, which is designed using cotransformation, to carry out the required arithmetic operations. The proposed architecture is implemented by means of a hardware description language and then prototyped and emulated on a field-programmable gate array. Results on computation time and architecture area are presented and analyzed, and the functionality of the proposed architecture is verified using two case studies: a linear problem of a rotating antenna and a nonlinear glucose-regulation problem. The proposed MPC architecture yields a small-in-size and energy-efficient implementation that is capable of solving the aforementioned problems on the order of milliseconds, and we compare its performance and area requirements with other MPC designs that have appeared in the literature.

...read moreread less

95 citations

Journal Article•DOI•

A Fast Hardware Approach for Approximate, Efficient Logarithm and Antilogarithm Computations

[...]

Somnath Paul¹, Nikhil Jayakumar², Sunil P. Khatri¹•Institutions (2)

Texas A&M University¹, Texas Instruments²

01 Feb 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The novelty of the approach lies in the fact that it performs interpolation efficiently, without the need to perform multiplication or division, and the method performs both the log() and antilog() operation using the same hardware architecture.

...read moreread less

Abstract: The realization of functions such as log() and antilog() in hardware is of considerable relevance, due to their importance in several computing applications. In this paper, we present an approach to compute log() and antilog() in hardware. Our approach is based on a table lookup, followed by an interpolation step. The interpolation step is implemented in combinational logic, in a field-programmable gate array (FPGA), resulting in an area-efficient, fast design. The novelty of our approach lies in the fact that we perform interpolation efficiently, without the need to perform multiplication or division, and our method performs both the log() and antilog() operation using the same hardware architecture. We compare our work with existing methods, and show that our approach results in significantly lower memory resource utilization, for the same approximation errors. Also our method scales very well with an increase in the required accuracy, compared to existing techniques.

...read moreread less

89 citations

Journal Article•DOI•

A Lower Error and ROM-Free Logarithmic Converter for Digital Signal Processing Applications

[...]

Tso-Bing Juang¹, Sheng-Hung Chen¹, Huang-Jia Cheng¹•Institutions (1)

National Pingtung Institute of Commerce¹

08 Dec 2009-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A lower error and ROM-free logarithmic converter that reduces the overhead of computation-intensive operations for real-time digital-signal-processing applications and outperforms previously proposed one-region and two-region conversion methods.

...read moreread less

Abstract: In this brief, we propose a lower error and ROM-free logarithmic converter. The proposed converter can lead to area-efficient hardware implementation as it avoids the need for a ROM by employing simple computation units for logarithmic approximation. Our proposed logarithmic conversion algorithm partitions the exact logarithmic curve into two symmetric regions such that the slopes in the two regions that are used for logarithmic approximation are inversed. Simulation results show that the proposed algorithm achieves an error range and percentage error range of only 0.045 and 3.339%, respectively, which outperforms previously proposed one-region and two-region conversion methods. We have implemented the proposed logarithmic converter using 0.13-?m CMOS technology, and the latency is 2.8 ns. The proposed converter can be used to reduce the overhead of computation-intensive operations for real-time digital-signal-processing applications.

...read moreread less

64 citations

Journal Article•DOI•

An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC

[...]

Byeong-Gyu Nam¹, Hoi-Jun Yoo¹•Institutions (1)

KAIST¹

02 May 2009-IEEE Journal of Solid-state Circuits

TL;DR: A 4-way 32-bit stream processor core developed for handheld low-power 3-D graphics systems achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way.

...read moreread less

Abstract: A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work. The proposed stream processor core is integrated into a 3-D graphics SoC as a vertex shader to show its effectiveness. The entire SoC is fabricated into a test chip using 1-poly 6-metal 0.18 mum CMOS technology. The 17.2 mm2 chip contains 1.57 M transistors and 29 kB SRAM. The stream processor core takes 9.7 mm2 and dissipates 86.8 mW at 200 MHz operating frequency. It shows a peak performance of 141 Mvertices/s for geometry transformation (TFM) and achieves 17.5% performance improvement and 44.7% and 39.4% power and area reductions for the TFM from the latest work. For power management of the SoC, the chip is divided into the triple power domains separately controlled by dynamic voltage and frequency scaling (DVFS). With this scheme, it shows 52.4 mW power consumption at 60 fps, 50.5% power reduction from the latest work.

...read moreread less

50 citations

Journal Article•DOI•

A truly two-dimensional systolic array FPGA implementation of QR decomposition

[...]

Xiaojun Wang, Miriam Leeser¹•Institutions (1)

Northeastern University¹

29 Oct 2009-ACM Transactions in Embedded Computing Systems

TL;DR: A two-dimensional systolic array QR decomposition is implemented on a Xilinx Virtex5 FPGA using the Givens rotation algorithm, which uses straightforward floating-point divide and square root implementations, which makes it easier to be used within a larger system.

...read moreread less

Abstract: We have implemented a two-dimensional systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. QR decomposition is a key step in many DSP applications including sonar beamforming, channel equalization, and 3G wireless communication. Compared to previous work that implements Givens rotations using a one-dimensional systolic array, our implementation uses a truly two-dimensional systolic array architecture. As a result, latency scales well for larger matrices. In addition, prior work avoids divide and square root operations in the Givens rotation algorithm by using special operations such as CORDIC or special number systems such as the logarithmic number system (LNS). In contrast, our design uses straightforward floating-point divide and square root implementations, which makes it easier to be used within a larger system. In our design, the input matrix size can be configured at compile time to many different sizes, making it easily scalable to future large FPGAs or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130MHz for the IEEE single-precision floating-point format. The peak performance for a 12 × 12 input matrix is approximately 35 GFLOPs.

...read moreread less

34 citations

Proceedings Article•DOI•

Random forest-LNS architecture and vision

[...]

Hassab Elgawi Osman¹•Institutions (1)

Tokyo Institute of Technology¹

23 Jun 2009

TL;DR: Utilization of a bag of covariance matrices as object descriptor improves the object recognition accuracy while speed up the learning process and an efficient architecture for generic object recognition system based on an ensemble classifier in aFPGA environment is described.

...read moreread less

Abstract: We describe an efficient architecture for generic object recognition system based on an ensemble classifier in a Field Programmable Gate Array (FPGA) environment. Utilization of a bag of covariance matrices as object descriptor improves the object recognition accuracy while speed up the learning process. We extend this technique, and present its hardware architecture, as well as object classifier based on on-line variant of random forest (RF) implemented using Logarithmic Number System (LNS). First, we describe the algorithmic and architecture of our model, comprises several computation modules. Then test and verified the model functionality using numerical simulation in the GRAZ02 dataset domain. It has been shown that the proposed system gained strong performance over floating-point and fixed-point precisions, even when only 10% of the training examples are used and is reasonably power efficient.

...read moreread less

7 citations

Journal Article•DOI•

Error analysis of LNS addition/subtraction with direct-computation implementation

[...]

Chichyang Chen¹•Institutions (1)

Feng Chia University¹

12 Jun 2009-Iet Computers and Digital Techniques

TL;DR: The formulas for the maximum allowable errors of the exponent and logarithm computations within the LNS unit that has better or equal precision performance than that of a comparable IEEE FLP unit has been derived and can be used in the design of the L NS addition/subtraction unit with direct-computation implementation method.

...read moreread less

Abstract: Logarithmic number system (LNS) arithmetic is more efficient than floating-point (FLP) arithmetic in some complex function computation. However, computation of the log2 (1 ± 2−v) function in large word-length LNS addition/subtraction will cost a large hardware. Direct computation of the log2 (1 ± 2−v) function is a promising method for the practical implementation of large word-length LNS arithmetic. Two most important operations in this method are the exponent and logarithm computations. The authors analysed the precision requirement in computing the exponential and logarithmic functions for the direct-computation of LNS addition/subtraction. The formulas for the maximum allowable errors of the exponent and logarithm computations within the LNS unit that has better or equal precision performance than that of a comparable IEEE FLP unit has been derived. The simulation results show that these estimation formulas for the two maximum errors are correct and thus can be used in the design of the LNS addition/subtraction unit with direct-computation implementation method.

...read moreread less

3 citations

Proceedings Article•DOI•

Design and implementation of double base integer encoder in the flash ADC

[...]

Minh Son Nguyen¹, Jong-Soo Kim¹, Insoo Kim², Kyusun Choi²•Institutions (2)

University of Ulsan¹, Pennsylvania State University²

06 May 2009

TL;DR: The Constraint algorithm is suggested to solve fan-in problem of the Greedy algorithm in designing encoder circuit of the flash ADC and shows better performance in terms of layout area, power consumption, and operation speed.

...read moreread less

Abstract: The DBNR (Double Base Number Representation) has been known to represent the Multidimensional Logarithmic Number System for implementing the multiplier accumulator architecture of DSP (Digital Signal Processing). This paper also uses the DBNR to improve the bottleneck of DSP arithmetic circuits with the flash ADC (Analog-to-Digital Converter). The Constraint algorithm is suggested to solve fan-in problem of the Greedy algorithm in designing encoder circuit of the flash ADC. The Constraint algorithm shows better performance in terms of layout area, power consumption, and operation speed, compared with the FAT tree encoder, which is known as the fastest encoder circuit yielding binary output.

...read moreread less

2 citations

Proceedings Article•DOI•

A hardware algorithm for fast digit on-line logarithmic computation with exponential convergence rate

[...]

Rui-Lin Chen¹, Chichyang Chen²•Institutions (2)

Fortune Institute of Technology¹, Feng Chia University²

06 May 2009

TL;DR: The high throughput of the computing system can be attained with the use of digit pipelining in the design of the hardware architecture because the latency of the pipeline is short and the convergence rate of the algorithm is exponential.

...read moreread less

Abstract: In this research, a hardware algorithm for digit on-line logarithmic computation is proposed. This algorithm is based on a fast digit-parallel logarithmic algorithm that was proposed previously. The drawback of the previous algorithm is that the computation cannot be digit pipelined with other computations. Our new algorithm will generate the partial logarithmic result after only some input digits of the operand are available. Thus, the high throughput of the computing system can be attained with the use of digit pipelining in the design of the hardware architecture. Furthermore, the latency of the pipeline is short because the convergence rate of the algorithm is exponential. For example, when the word length of the operand is 24, the number of pipeline stages is only four. Base on our proposed digit on-line method, we have designed the architecture of a 24-bit logarithmic unit. The exhausted test of the 24-bit unit shows that our algorithm and error analysis are correct.

...read moreread less

1 citations

Proceedings Article•DOI•

Pipelined Computation of Very Large Word-Length LNS Addition/Subtraction Computation with Exponential Convergence Rate

[...]

Rui-Lin Chen¹, Chichyang Chen²•Institutions (2)

Fortune Institute of Technology¹, Feng Chia University²

14 Dec 2009

TL;DR: With this algorithm, the convergence rate of LNS addition/subtraction unit can be exponential and all the possible cases are thoroughly tested by simulations and thus the correctness of the algorithm is proved.

...read moreread less

Abstract: Very large word-length logarithmic number system (LNS) addition/subtraction requires a lot of hardware and long pipeline latency. In this paper, we proposed an algorithm that utilized two novel methods to solve these problems. With this algorithm, the convergence rate of LNS addition/subtraction unit can be exponential. All the possible cases are thoroughly tested by simulations and thus we have proved the correctness of the algorithm.

...read moreread less

1 citations

Proceedings Article•DOI•

Software implementation of LNS arithmetic in an ARM embedded system

[...]

Chichyang Chen¹, Li-Wei Liu¹, Jun-Wen Jou¹•Institutions (1)

Feng Chia University¹

25 May 2009

TL;DR: The proposed software-implemented 32-bit LNS arithmetic implementation approach is very efficient for computing complex arithmetic functions in an ARM embedded system.

...read moreread less

Abstract: Logarithmic number system (LNS) arithmetic is a good alternative for floating-point arithmetic. We have implemented 32-bit LNS arithmetic by using assembly and C languages on an ARM processor. Compared to FLP arithmetic, the proposed software-implemented LNS arithmetic can achieve a speedup factor of 9.12/13.45 in multiplication/division, with only about 34% speed degrade in addition/subtraction. For the AB function, the proposed LNS arithmetic is 91.06 times faster than the FLP arithmetic. We conclude that our proposed software LNS arithmetic implementation approach is very efficient for computing complex arithmetic functions in an ARM embedded system.

...read moreread less

Proceedings Article•DOI•

Implementation of Digital Electronic Arithmetic and its application

[...]

Khader Mohammad¹, Sos S. Agaian¹, Fred Hudson¹•Institutions (1)

University of Texas at San Antonio¹

11 Oct 2009

TL;DR: A hardware implementation of the parametric image-processing framework that will accurately process images and speed up computation for addition, subtraction, and multiplication and the design of arithmetic circuits including parallel counters, adders and multipliers based in two high performance threshold logic gate implementations that are developed.

...read moreread less

Abstract: This Parameterized Digital Electronic Arithmetic (PDEA) model replaces linear operations with non-linear ones. In this paper we introduce a hardware implementation of the parametric image-processing framework that will accurately process images and speed up computation for addition, subtraction, and multiplication. Particularly, the paper presents the design of arithmetic circuits including parallel counters, adders and multipliers based in two high performance threshold logic gate implementations that we have developed. We will also explore new microprocessor architectures to take advantage of arithmetic. The experiments executed have shown that the algorithm provides faster and better enhancements from those described in the literature. Its potential applications include computer graphics, digital signal processing and other multimedia applications.

...read moreread less