Showing papers on "Gate count published in 2010"

PDF

Open Access

Book Chapter•DOI•

A new combinational logic minimization technique with applications to cryptology

[...]

Joan Boyar¹, René Peralta²•Institutions (2)

University of Southern Denmark¹, National Institute of Standards and Technology²

20 May 2010

TL;DR: The result is, as far as the authors know, the circuit with the smallest gate count yet constructed for this function, and it is experimentally verified that the second step of the technique yields significant improvements over conventional methods when applied to randomly chosen linear transformations.

...read moreread less

Abstract: A new technique for combinational logic optimization is described. The technique is a two-step process. In the first step, the non-linearity of a circuit – as measured by the number of non-linear gates it contains – is reduced. The second step reduces the number of gates in the linear components of the already reduced circuit. The technique can be applied to arbitrary combinational logic problems, and often yields improvements even after optimization by standard methods has been performed. In this paper we show the results of our technique when applied to the S-box of the Advanced Encryption Standard (AES [6]). This is an experimental proof of concept, as opposed to a full-fledged circuit optimization effort. Nevertheless the result is, as far as we know, the circuit with the smallest gate count yet constructed for this function. We have also used the technique to improve the performance (in software) of several candidates to the Cryptographic Hash Algorithm Competition. Finally, we have experimentally verified that the second step of our technique yields significant improvements over conventional methods when applied to randomly chosen linear transformations.

...read moreread less

148 citations

Journal Article•DOI•

An Optimized Majority Logic Synthesis Methodology for Quantum-Dot Cellular Automata

[...]

Kun Kong, Yun Shang, Ruqian Lu

01 Mar 2010-IEEE Transactions on Nanotechnology

TL;DR: It is proved that the proposed method provides a minimal majority expression and an optimal QCA layout for any given three-variable Boolean function and removes all the redundancies that are produced in the process of converting a decomposed network into a majority network.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) has been widely considered as a replacement candidate for complementary metal-oxide semiconductor (CMOS). The fundamental logic device in QCA is the majority gate. In this paper, we propose an efficient methodology for majority logic synthesis of arbitrary Boolean functions. We prove that our method provides a minimal majority expression and an optimal QCA layout for any given three-variable Boolean function. In order to obtain high-quality decomposed Boolean networks, we introduce a new decomposition scheme that can decompose all Boolean networks efficiently. Furthermore, our method removes all the redundancies that are produced in the process of converting a decomposed network into a majority network. In existing methods, however, these redundancies are not considered. We have built a majority logic synthesis tool based on our method and several existing logic synthesis tools. Experiments with 40 multiple-output benchmarks indicate that, compared to existing methods, 37 benchmarks are optimized by our method, up to 31.6%, 78.2%, 75.5%, and 83.3% reduction in level count, gate count, gate input count, and inverter count, respectively, is possible with the average being 4.7%, 14.5%, 13.3%, and 26.4%, respectively. We have also implemented the QCA layouts of 10 benchmarks by using our method. Results indicate that, compared to existing methods, up to 33.3%, 76.7%, and 75.5% reduction in delay, cell count, and area, respectively, is possible with the average being 8.1%, 28.9%, and 29.0%, respectively.

...read moreread less

106 citations

Posted Content•

A new combinational logic minimization technique with applications to cryptology

[...]

Joan Boyar¹, René Peralta²•Institutions (2)

University of Southern Denmark¹, National Institute of Standards and Technology²

01 Jan 2010

TL;DR: In this article, a two-step technique for combinational logic optimization is described, where the first step reduces the nonlinearity of a circuit, as measured by the number of non-linear gates it contains, and the second step reduces a circuit's number of gates in the linear components.

...read moreread less

Abstract: A new technique for combinational logic optimization is described. The technique is a two-step process. In the first step, the nonlinearity of a circuit – as measured by the number of non-linear gates it contains – is reduced. The second step reduces the number of gates in the linear components of the already reduced circuit. The technique can be applied to arbitrary combinational logic problems, and often yields improvements even after optimization by standard methods has been performed. In this paper we show the results of our technique when applied to the S-box of the Advanced Encryption Standard (AES [5]). This is an experimental proof of concept, as opposed to a full-fledged circuit optimization effort. Nevertheless the result is, as far as we know, the circuit with the smallest gate count yet constructed for this function. We have also used the technique to improve the performance (in software) of several candidates to the Cryptographic Hash Algorithm Competition. Finally, we have experimentally verified that the second step of our technique yields significant improvements over conventional methods when applied to randomly chosen linear transformations.

...read moreread less

48 citations

Journal Article•DOI•

Bandwidth Adaptive Hardware Architecture of K-Means Clustering for Video Analysis

[...]

Tse-Wei Chen¹, Shao-Yi Chien¹•Institutions (1)

National Taiwan University¹

01 Jun 2010-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Experiments show that the proposed bandwidth adaptive hardware architecture of K-Means clustering can be used in applications such as image segmentation, and it has the maximum clock speed 400-MHz and 440-K gate count with TSMC 90-nm technology.

...read moreread less

Abstract: K-Means is a clustering algorithm that is widely applied in many fields, including pattern classification and multimedia analysis. Due to real-time requirements and computational-cost constraints in embedded systems, it is necessary to accelerate K-Means algorithm by hardware implementations in SoC environments, where the bandwidth of the system bus is strictly limited. In this paper, a bandwidth adaptive hardware architecture of K-Means clustering is proposed. Experiments show that the proposed hardware can be used in applications such as image segmentation, and it has the maximum clock speed 400-MHz and 440-K gate count with TSMC 90-nm technology. Moreover, the throughput of the proposed hardware reaches 16 dimension/cycle, and it can deal with feature vectors with different dimensions using five parallel modes to utilize the input bandwidth efficiently.

...read moreread less

45 citations

Journal Article•DOI•

Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder

[...]

Md. Saiful Islam, Zerina Begum

19 Aug 2010-arXiv: Hardware Architecture

TL;DR: This paper presents a new 4*4 parity preserving reversible logic gate, IG, which can be used to synthesize any arbitrary Boolean function and allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs.

...read moreread less

Abstract: Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

...read moreread less

39 citations

Patent•

Signal processing block for a receiver in wireless communication

[...]

Dimpesh Patel¹, Glenn Gulak¹, Mahdi Shabany¹•Institutions (1)

MaxLinear¹

24 May 2010

TL;DR: In this article, a pipelined CORDIC architecture is proposed to improve throughput and resource utilization, while reducing the gate count, by integrating the benefits of multi-dimensional annihilation capability of Householder reflections plus the low-complexity nature of the conventional 2D Givens rotations.

...read moreread less

Abstract: A QRD processor for computing input signals in a receiver for wireless communication relies upon a combination of multi-dimensional Givens Rotations, Householder Reflections and conventional two-dimensional (2D) Givens Rotations, for computing the QRD of matrices. The proposed technique integrates the benefits of multi-dimensional annihilation capability of Householder reflections plus the low-complexity nature of the conventional 2D Givens rotations. Such integration increases throughput and reduces the hardware complexity, by first decreasing the number of rotation operations required and then by enabling their parallel execution. A pipelined architecture is presented (290) that uses un-rolled pipelined CORDIC processors (245a to 245d) iteratively to improve throughput and resource utilization, while reducing the gate count.

...read moreread less

25 citations

Proceedings Article•DOI•

Synthesis of Reversible Circuits with No Ancilla Bits for Large Reversible Functions Specified with Bit Equations

[...]

Nouarddin Alhagi¹, Maher Hawash¹, Marek Perkowski¹•Institutions (1)

Portland State University¹

26 May 2010

TL;DR: A new algorithm MP(multiple pass) to synthesize large reversible binary circuits without ancilla bits is presented, which allows for synthesis of large scale reversible circuits (30-bits), which is not possible with existing algorithms.

...read moreread less

Abstract: This paper presents a new algorithm MP(multiple pass) to synthesize large reversible binary circuits without ancilla bits. The MMD algorithm requires to store a truth table (or a Reed-Muller -RM transform) as a 2^n vector for a reversible function of n variables. This representation prohibits synthesis of large functions. However, in MP we do not store such an exponentially growing data structure. The values of minterms are calculated in MP dynamically, one-by-one, from a set of logic equations that specify the reversible circuit to be designed. This allows for synthesis of large scale reversible circuits (30-bits), which is not possible with existing algorithms. In addition, our unique multipass approach where the circuit is synthesized with various, yet specific, minterm orders yields optimal solution. The algorithm returns a description of the optimal circuit with respect to gate count or quantum cost. Although the synthesis process is relatively slower, the solution is found in real-time for smaller circuits of 8 bits or less

...read moreread less

20 citations

Journal Article•DOI•

A Fully Integrated Built-In Self-Test $\Sigma{-}\Delta$ ADC Based on the Modified Controlled Sine-Wave Fitting Procedure

[...]

Hao-Chiao Hong¹, Fang-Yi Su¹, Shao-Feng Hung¹•Institutions (1)

National Chiao Tung University¹

01 Sep 2010-IEEE Transactions on Instrumentation and Measurement

TL;DR: This paper demonstrates the first fully integrated built-in self-test (BIST) Σ-Δ analog-to-digital converter (ADC) chip to the best of the authors' knowledge.

...read moreread less

Abstract: This paper demonstrates the first fully integrated built-in self-test (BIST) Σ-Δ analog-to-digital converter (ADC) chip to the best of our knowledge. The ADC under test (AUT) comprises a second-order design-for-digital-testability Σ-Δ modulator and a decimation filter. The purely digital BIST circuitry conducts single-tone tests for the signal-to-noise-and-distortion ratio (SNDR), the dynamic range, the offset, and the gain error of the AUT. The BIST design is based on the proposed modified controlled sine-wave fitting procedure to address the component overload issues, reduce the setup parameter numbers, and eliminate the need for parallel multipliers. The total gate count of the whole BIST circuitry is only 13 300. The hardware overhead is much less than the BIST design using the traditional fast Fourier transform (FFT) analysis. Measurement results show that the peak SNDR results of the proposed BIST design and the conventional FFT analysis are 75.5 and 75.3 dB, respectively. The subtle SNDR difference is already within analog test uncertainty. The BIST Σ-Δ ADC achieves a digital test bandwidth higher than 17 kHz, very close to the rated 20-kHz bandwidth of the AUT.

...read moreread less

19 citations

Proceedings Article•DOI•

A General Method of Constructing the Reversible Full-Adder

[...]

Lihui Ni¹, Zhijin Guan¹, Wenying Zhu¹•Institutions (1)

Nantong University¹

02 Apr 2010

TL;DR: According to the approach, one can realize a variety of reversible full-adders flexibly with only two reversible gates and two garbage outputs, which have improvements in the gate count and garbage count and can reduce the cost of network.

...read moreread less

Abstract: The reversible gates, attracting people’s attention increasingly, have been widely used in low-power CMOS design, optical computing and quantum computing. In many existing literatures, only the methods of constructing certain specific reversible full-adders were presented, while we proposed a general approach to construct the reversible full-adder. According to the approach, we can realize a variety of reversible full-adders flexibly with only two reversible gates and two garbage outputs, which have improvements in the gate count and garbage count and can reduce the cost of network.

...read moreread less

18 citations

Proceedings Article•DOI•

Using a pipelined S-box in compact AES hardware implementations

[...]

Cheng Wang¹, Howard M. Heys¹•Institutions (1)

St. John's University¹

20 Jun 2010

TL;DR: It is reasonable to consider using pipelined S-boxes in AES hardware implementations targeted at applications requiring low area and moderate speed, as well as a new compact AES encryption hardware core with 128-bit keys.

...read moreread less

Abstract: Pipelined S-boxes are usually used in high speed hardware implementations of the Advanced Encryption Standard (AES), and not typically found in compact implementations because of the extra complexity added by the pipeline registers. In this paper, the area and speed performance of applying a pipelined S-box to compact AES hardware implementations is examined. A new compact AES encryption hardware core with 128-bit keys is proposed. The proposed design employs a single 4-stage pipelined S-box that is shared by t he data path operation and the key expansion operation. Compared with the previous smallest encryption-only ASIC implementation of AES, it achieves an increase in throughput of 2.1 times while maintaining a similar gate count. This result indicates that it is reasonable to consider using pipelined S-boxes in AES hardware implementations targeted at applications requiring low area and moderate speed.

...read moreread less

17 citations

Proceedings Article•DOI•

High parallel variation Banyan network based permutation network for reconfigurable LDPC decoder

[...]

Xiao Peng¹, Zhixiang Chen¹, Xiongxin Zhao¹, Fumiaki Maehara¹, Satoshi Goto¹ - Show less +1 more•Institutions (1)

Waseda University¹

07 Jul 2010

TL;DR: Through introducing the bypass network, the variation Banyan network (VBN) based permutation network architecture for the reconfigurable QC-LDPC decoders and give the control signal generating algorithm for cyclic shift is put forward.

...read moreread less

Abstract: Permutation network plays an important role in the reconfigurable QC-LDPC decoder for most modern wireless communication systems with multiple code rates and various code lengths. In this paper, we propose the variation Banyan network (VBN) based permutation network architecture for the reconfigurable QC-LDPC decoders and give the control signal generating algorithm for cyclic shift. Through introducing the bypass network, we put forward the nonblocking scheme for any input number and shift number. In addition, the optimized VBN is proposed for WiMAX and WiFi standard, which can shift at most 4 groups of input data, and greatly reduce the hardware complexity. The synthesis results using the 90nm technology demonstrate that the proposed permutation network can be implemented with the gate count of 18.3k and the frequency of 600 MHz.

...read moreread less

Proceedings Article•DOI•

Reversible cryptographic hardware with optimized quantum cost and delay

[...]

Anindita Banerjee¹•Institutions (1)

Jaypee Institute of Information Technology¹

01 Dec 2010

TL;DR: Novel designs for reversible ALU of a cryptoprocessor which have been implemented in standard gate library are proposed and the quantum cost reported here are better than the lower bounds reported in literature.

...read moreread less

Abstract: In order to defend the power analysis attack reversible logic is a good candidate as it ideally does not dissipate any heat and today reversible logic is an emerging research area. In literature different designs for reversible hardware cryptography have been proposed but they have been implemented using complex gate libraries and further theorems have been proposed defining lower limit of implementation cost which is quantum cost. We have proposed novel designs for reversible ALU of a cryptoprocessor which have been implemented in standard gate library and the quantum cost reported here are better than the lower bounds reported in literature. Further we have calculated delay of the proposed designs. We have verified that our proposed designs are minimal with respect to gate count which is circuit cost by simulating it in RevKit. This is for the first time that the optimization algorithms to optimize quantum cost and delay have been applied to improvise on the cost metric in reversible ALU design.

...read moreread less

Proceedings Article•DOI•

Counter designs in quantum-dot cellular automata

[...]

Kun Kong, Yun Shang, Ruqian Lu

01 Aug 2010

TL;DR: Comparisons indicate that, by applying the method presented, the hardware requirements (i.e., complexity and area) for QCA n-bit counter can be greatly reduced.

...read moreread less

Abstract: In this paper, we present a new method for the design of an n-bit synchronous binary up counter in quantum-dot cellular automata (QCA). This method is based on the JK flip-flop which almost always produces the simplest combinational logic in traditional sequential circuits. We implement a new QCA architecture for the JK flip-flop. Compared to the existing QCA JK flip-flop, the majority gate count, cell count, clock cycle count, and area of our QCA JK flip-flop are reduced by 57.1%, 88.1%, 55.6%, and 92.0%, respectively. Based on our QCA JK flip-flop, a method of extending state cells is proposed to design the QCA layout of the n-bit counter, such that all the clock cycle counts between any two state cells become 1. This feature can ensure that one count just takes one clock cycle, in existing methods, however, one count needs to take n−1 clock cycles. Comparisons indicate that, by applying our method, the hardware requirements (i.e., complexity and area) for QCA n-bit counter can be greatly reduced.

...read moreread less

Proceedings Article•DOI•

Reversible Multiplier Circuit

[...]

Anindita Banerjee¹, Anirban Pathak¹•Institutions (1)

Jaypee Institute of Information Technology¹

19 Nov 2010

TL;DR: It is shown that the HNG, TSG and MKG gates proposed for designing of a component of multiplier circuit (full adder) is neither unique nor special and many such gates may be proposed which can also perform all boolean operations.

...read moreread less

Abstract: Multiplier circuits play an important role in reversible computation, which is helpful in diverse areas such as low power CMOS design, optical computing, DNA computing and bioinformatics. We have proposed a reversible multiplier circuit design in NCT gate library which is based on generating all partial products in one step and then summing their partial products using binary tree network. The proposed reversible multiplier design has two components which are reversible partial product generation circuit and reversible parallel adder circuit. Our design has minimum number of garbage bits, gate count, and quantum cost. We have shown that the HNG, TSG and MKG gates proposed for designing of a component of multiplier circuit (full adder) is neither unique nor special and many such gates may be proposed which can also perform all boolean operations. As an example three such new gates have been presented here.

...read moreread less

Journal Article•DOI•

Real time fractal image coder based on characteristic vector matching

[...]

Shadrokh Samavi¹, Mehdi Habibi², Shahram Shirani³, N. Rowshanbin¹•Institutions (3)

Isfahan University of Technology¹, University of Isfahan², McMaster University³

01 Nov 2010-Image and Vision Computing

TL;DR: A classification scheme is presented which allows the hardware implementation of the fractal coder based on binary classification of domain and range blocks which increases the processing speed and reduces the power consumption while the qualities of the reconstructed images are comparable with those of the available software techniques.

...read moreread less

Proceedings Article•DOI•

LDPC decoder area, timing, and energy models for early quantitative hardware cost estimates

[...]

Matthias Korb¹, Tobias G. Noll¹•Institutions (1)

RWTH Aachen University¹

01 Sep 2010

TL;DR: Generic silicon area, iteration period, and energy cost models of high-throughput LDPC decoders are derived and can be used for a fair benchmarking of the implemented decoder.

...read moreread less

Abstract: System specification of SoCs needs to be supported by quantitative cost models to avoid wrong decisions in this early design phase. For less complex logic structures like for example FIR filters such generic cost models can be derived easily because they base on a simple gate count. For LDPC decoders the influence of the global interconnect between the two basic components of such a decoder complicates the derivation of general cost models. This might be the reason why no accurate cost models are known from literature yet. In this paper generic silicon area, iteration period, and energy cost models of high-throughput LDPC decoders are derived. Those models do not only allow for a decoding-performance vs. hardware-cost trade-off analysis during system specification but can also be used later on to choose a suitable architecture for a certain specification. Finally these models can be used for a fair benchmarking of the implemented decoder.

...read moreread less

Journal Article•DOI•

Genetic algorithm for test pattern generator design

[...]

T. Garbolino¹, Gregor Papa•Institutions (1)

Silesian University of Technology¹

01 Apr 2010-Applied Intelligence

TL;DR: Results of benchmark experiments and comparison with similar studies demonstrate the efficiency of the proposed evolutionary approach, which reduces the gate count of a built-in self-test structure by concurrent optimization of multiple parameters that influence the final solution.

...read moreread less

Abstract: The paper describes an approach for the generation of a deterministic test pattern generator logic, which is composed of D-type and T-type flip-flops. This approach employs a genetic algorithm that searches for an acceptable practical solution in a large space of possible implementations. In contrast to conventional approaches the proposed one reduces the gate count of a built-in self-test structure by concurrent optimization of multiple parameters that influence the final solution. The optimization includes the search for: the optimal combination of register cells type; the presence of inverters at inputs and outputs; the test patterns order in the generated test sequence; and the bit order of test patterns. Results of benchmark experiments and comparison with similar studies demonstrate the efficiency of the proposed evolutionary approach.

...read moreread less

Posted Content•

Variable Block Carry Skip Logic using Reversible Gates

[...]

Md. Rafiqul Islam, Md. Saiful Islam, Muhammad Rezaul Karim, Abdullah Al Mahmud, Hafiz Md. Hasan Babu - Show less +1 more

19 Aug 2010-arXiv: Hardware Architecture

TL;DR: In this paper, a generalized k*k reversible gate family is proposed and a 3*3 gate of the family is discussed, which can be realized by Inverter, AND, OR, NAND, NOR, and EXOR gates.

...read moreread less

Abstract: Reversible circuits have applications in digital signal processing, computer graphics, quantum computation and cryptography. In this paper, a generalized k*k reversible gate family is proposed and a 3*3 gate of the family is discussed. Inverter, AND, OR, NAND, NOR, and EXOR gates can be realized by this gate. Implementation of a full-adder circuit using two such 3*3 gates is given. This full-adder circuit contains only two reversible gates and produces no extra garbage outputs. The proposed full-adder circuit is efficient in terms of gate count, garbage outputs and quantum cost. A 4-bit carry skip adder is designed using this full-adder circuit and a variable block carry skip adder is discussed. Necessary equations required to evaluate these adder are presented.

...read moreread less

Proceedings Article•DOI•

Configurable Pipelined Gabor Filter implementation for fingerprint image enhancement

[...]

Junbao Liu¹, Shuai Wang¹, Yi Li¹, Jun Han¹, Xiaoyang Zeng¹ - Show less +1 more•Institutions (1)

Fudan University¹

13 Dec 2010

TL;DR: A novel Gabor filter hardware scheme for the fingerprint image enhancement is presented that uses accurate local frequency and orientation to generate the corresponding convolution kernel and thus achieve a better enhancement effect.

...read moreread less

Abstract: In this paper a novel Gabor filter hardware scheme for the fingerprint image enhancement is presented. For each pixel of the image, we use accurate local frequency and orientation to generate the corresponding convolution kernel and thus achieve a better enhancement effect. And compared to the previous works, our design yields a higher throughput which is due to the pipeline techniques. Moreover the proposed design can be reconfigured to fulfill the different requirements. Evaluation results demonstrate that, when convolution kernel size is 11×11, our design can achieve 2MPixels/s @ 250MHz, and equivalent gate count is 63.8k at SMIC 0.13um worst process corner. Indeed, it's very suitable for the embedded fingerprint recognition system.

...read moreread less

Proceedings Article•DOI•

Scalable FFT processor for MIMO-OFDM based SDR systems

[...]

Gijung Yang¹, Yunho Jung¹•Institutions (1)

Korea Aerospace University¹

05 May 2010

TL;DR: An area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems by reducing the required number of nontrivial multipliers with mixed-radix (MR) and multi-path delay commutator (MDC) architecture.

...read moreread less

Abstract: In this paper, an area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems. The proposed scalable FFT processor can support the variable length of 64, 128, 512, 1024 and 2048. By reducing the required number of nontrivial multipliers with mixed-radix (MR) and multi-path delay commutator (MDC) architecture, the complexity of the proposed FFT processor is dramatically decreased. The proposed FFT processor was designed in hardware description language (HDL) and synthesized to gate-level circuits using 0.13um CMOS standard cell library. With the proposed architecture, the gate count for the processor is 46K and the size of memory is 90Kbits, which are reduced by 59% and 39%, respectively, compared with those of the 4-channel radix-2 single-path delay feedback (R2SDF) FFT processor. Also, compared with 4-channel radix-2 MDC (R2MDC) FFT processor, it is confirmed that the gate count and memory size are reduced by 16.4% and 26.8%, respectively.

...read moreread less

Journal Article•DOI•

Fiber remote configuration for an optically reconfigurable gate array with four configuration contexts

[...]

Yumiko Ueno¹, Minoru Watanabe¹•Institutions (1)

Shizuoka University¹

01 Dec 2010-Optics Communications

TL;DR: In this paper, a new remotely reconfigurable gate array architecture with four configuration contexts that enable remote reconfiguration using optical fiber networks is presented, and discussion of the availability of this architecture and plans based on the experimental results.

...read moreread less

Proceedings Article•DOI•

Design of FFT processor for IEEE802.16m MIMO-OFDM systems

[...]

Youn Ok Park¹, Jong-Won Park²•Institutions (2)

Electronics and Telecommunications Research Institute¹, Chungnam National University²

23 Dec 2010

TL;DR: An area-efficient FFT processor is proposed for IEEE 802.16m mobile WiMAX systems and can support the variable length of 512, 1024, 2048 and 4096 by reducing the required number of non-trivial multipliers with mixed-radix (MR) and multi-path delay commutator (MDC) architecture.

...read moreread less

Abstract: In this paper, an area-efficient FFT processor is proposed for IEEE 802.16m mobile WiMAX systems. The proposed scalable FFT processor can support the variable length of 512, 1024, 2048 and 4096. By reducing the required number of non-trivial multipliers with mixed-radix (MR) and multi-path delay commutator (MDC) architecture, the complexity of the proposed FFT processor is dramatically decreased without sacrificing system throughput. The proposed FFT processor was designed in hardware description language (HDL) and synthesized to gate-level circuits using 0.18um CMOS standard cell library. With the proposed architecture, the gate count for the processor is 49K and the size of memory is 96Kbits, which are reduced by 12% and 26%, respectively, compared with those of the 4-channel radix-2 MDC (R2MDC) FFT processor.

...read moreread less

Proceedings Article•DOI•

Implementation of High Efficient CAVLC Encoder for H.264/AVC

[...]

Yong-Jun Kim¹, Kyu-Yeul Wang¹, Sang-Seol Lee¹, Byung-Soo Kim¹, Bo-Keun Choi¹, Duck-Jin Chung¹ - Show less +2 more•Institutions (1)

Inha University¹

17 Sep 2010

TL;DR: The design and VLSI implement of high efficient Context-based Adaptive Variable Length Coding (CAVLC) encoder which adopted a modified VLC look up table technique and parallel processing and is suitable for real-time video applications.

...read moreread less

Abstract: This paper proposes the design and VLSI implement of high efficient Context-based Adaptive Variable Length Coding (CAVLC) encoder which adopted a modified Variable Length Coding (VLC) look up table technique and parallel processing. The proposed CAVLC encoder used upper and under buffer as input buffer to perform zigzag scanning with both way ordering. Because of this, the proposed CAVLC encoder can be read and write concurrently. Moreover, we design the CAVLC encoder procedure with parallel processing which uses two generators for information signals and control signals to operate CAVLC modules such as a coeff_token (TotalCeff and TrailingOnes) module, a level module, a total_zeros module, and a run_before module. The proposed CAVLC is prototyped in Verilog-HDL, implemented and synthesized with megnachip 0.18 µm CMOS tech. The synthesis result shows that the gate count is about 12K with the clock constraint of 140Mhz. The proposed CAVLC encoder is suitable for real-time video applications.

...read moreread less

Proceedings Article•DOI•

Comparative analysis of parallel SAD calculation hardware architectures for H.264/AVC video coding

[...]

Claudio Machado Diniz¹, Guilherme Correa¹, Altamiro Amadeu Susin¹, Sergio Bampi¹•Institutions (1)

Universidade Federal do Rio Grande do Sul¹

01 Feb 2010

TL;DR: A comparative analysis of nine hardware architecture alternatives for SAD calculation processing unit, varying the parallelism level (4, 8 and 16 samples in parallel) and the number of pipeline stages shows that fewer stage pipeline versions achieved fewer energy consumption and higher throughput when compared with deeper pipeline versions.

...read moreread less

Abstract: Sum of Absolute Difference (SAD) is a low complexity distortion metric widely employed in the mode decision stage of real-time video encoders. In H.264/AVC encoding, the state-of-the-art video coding standard, motion estimation responds for the most computational complexity, most of it coming from the SAD calculation for all the candidate blocks. Considering an H.264/AVC motion estimation hardware architecture [5], SAD calculation stands for 79% of total gate count. Therefore, focusing on SAD hardware design space exploration can result in important area, power and performance improvement of H.264/AVC ASIC video encoders, but such exploration were not investigated in previous works. Concerning this question, this work firstly presents a comparative analysis of nine hardware architecture alternatives for SAD calculation processing unit, varying the parallelism level (4, 8 and 16 samples in parallel) and the number of pipeline stages. The comparison is presented in terms of total gate count, processing cycles, throughput, power and energy consumption. Results shown that fewer stage pipeline versions achieved fewer energy consumption and higher throughput (operating at a restricted clock frequency) when compared with deeper pipeline versions. These analyses are useful to select the best SAD architectural alternative to fit each application requirement, from low-power mobile to high resolution H.264/AVC on-chip video encoder.

...read moreread less

Proceedings Article•DOI•

Digital Signal processing — from simulation to silicon

[...]

C.R.S. Fludger, J. C. Geyer, T. Duthel, C. Schulien

04 Nov 2010

TL;DR: This tutorial outlines the development of an ASIC for a CP-QPSK transponder and outlines the pitfalls and challenges.

...read moreread less

Abstract: Optical communications systems at 10, 40 and WOG now use digital signal processing to enhance the signal tolerance against channel impairments such as chromatic dispersion and PMD The power of digital signal processing is driven by developments in CMOS integration, which push the limits of gate count, feature sizes and development budgets But what steps are involved in developing a full custom ASIC ? What are the pitfalls and challenges ? This tutorial outlines the development of an ASIC for a CP-QPSK transponder

...read moreread less

Journal Article•DOI•

High-speed low-complexity architecture for reed-solomon decoders

[...]

Yung-Kuei Lu¹, Ming-Der Shieh¹•Institutions (1)

National Cheng Kung University¹

01 Jul 2010-IEICE Transactions on Information and Systems

TL;DR: Analytical results show that the proposed architecture has the smallest critical path delay, latency, and area-time complexity in comparison with similar studies.

...read moreread less

Abstract: This paper presents a high-speed, low-complexity VLSI architecture based on the modified Euclidean (ME) algorithm for Reed-Solomon decoders. The low-complexity feature of the proposed architecture is obtained by reformulating the error locator and error evaluator polynomials to remove redundant information in the ME algorithm proposed by Truong. This increases the hardware utilization of the processing elements used to solve the key equation and reduces hardware by 30.4%. The proposed architecture retains the high-speed feature of Truong's ME algorithm with a reduced latency, achieved by changing the initial settings of the design. Analytical results show that the proposed architecture has the smallest critical path delay, latency, and area-time complexity in comparison with similar studies. An example RS(255, 239) decoder design, implemented using the TSMC 0.18µm process, can reach a throughput rate of 3Gbps at an operating frequency of 375MHz and with a total gate count of 27, 271.

...read moreread less

Proceedings Article•DOI•

A low energy high speed Reed-Solomon decoder using Decomposed Inversionless Berlekamp-Massey Algorithm

[...]

Hazem A. Ahmed¹, Hamed Salah¹, Tallal Elshabrawy¹, Hossam A. H. Fahmy²•Institutions (2)

German University in Cairo¹, Cairo University²

01 Nov 2010

TL;DR: An area efficient, low energy, high speed pipelined architecture for a Reed-Solomon decoder based on Decomposed Inversionless Berlekamp-Massey Algorithm, where the error locator and evaluator polynomial can be computed serially.

...read moreread less

Abstract: This paper proposes an area efficient, low energy, high speed pipelined architecture for a Reed-Solomon decoder based on Decomposed Inversionless Berlekamp-Massey Algorithm, where the error locator and evaluator polynomial can be computed serially. In the proposed architecture, a new scheduling of t Finite Field Multipliers (FFMs) is used to calculate the error locator and evaluator polynomials to achieve a good balance between area, latency, and throughput. This architecture is tested in two different decoders. The first one is a pipelined two parallel decoder, as two parallel syndrome and two parallel Chien search are used. The second one is a conventional pipelined decoder, as conventional syndrome and Chien search are used. Both decoders have been implemented by 0.13µm CMOS IBM standard cells. The two parallel RS(255, 239) decoder has gate count of 37.6K and area of 1.18mm2, simulation results show this approach can work successfully at the data rate 7.4Gbps and the power dissipation is 50mW. The conventional RS(255, 239) decoder has gate count of 30.7K and area of 0.99mm2. Simulation results show this approach can work successfully at the data rate 4.85Gbps and the power dissipation is 29.28mW.

...read moreread less

Proceedings Article•DOI•

Area efficient-high throughput sub-pipelined design of the AES in CMOS 180nm

[...]

Abdallah Y. Alma'aitah¹, Zine-Eddine Abid²•Institutions (2)

Queen's University¹, Higher Colleges of Technology²

01 Dec 2010-Intelligent Decision Technologies

TL;DR: A modified sub-pipelined structure is proposed targeting high speed and low power-delay product of the compact AES design with on-the-fly key expansion unit, by adding 25.8% in hardware complexity to the existing ASIC designs.

...read moreread less

Abstract: In this paper, efficient hardware of one of the most popular encryption algorithms, the Advanced Encryption Standard (AES), is presented. A modified sub-pipelined structure is proposed targeting high speed and low power-delay product of the compact AES design with on-the-fly key expansion unit. By adding 25.8% in hardware complexity to the existing ASIC designs, the throughput is increased more than 158% with better overall power-delay product. Compared to other compact AES implementation the proposed structure can go up to 6Gbit/sec with about 13k gate count.

...read moreread less

Proceedings Article•DOI•

A high throughput VLSI design with hybrid memory architecture for H.264/AVC CABAC decoder

[...]

Yuan-Hsin Liao¹, Gwo-Long Li¹, Tian-Sheuan Chang¹•Institutions (1)

National Chiao Tung University¹

03 Aug 2010

TL;DR: A high throughput context-based adaptive binary arithmetic coding (CABAC) decoding design with hybrid memory architecture for H.264/AVC is presented and an efficient mathematical transform method is proposed to further decrease the critical path of two-symbol binary arithmetic decoding procedure.

...read moreread less

Abstract: A high throughput context-based adaptive binary arithmetic coding (CABAC) decoding design with hybrid memory architecture for H.264/AVC is presented in this paper. To accelerate the decoding speed with hardware cost consideration, a new hybrid memory two-symbol parallel decoding technique is proposed. In addition, an efficient mathematical transform method is also proposed to further decrease the critical path of two-symbol binary arithmetic decoding procedure. The proposed architecture is implemented by UMC 90nm technology and experimental results show that our proposal can operate at 264 MHz with 42.37k gate count, and the throughput is 483.1 Mbins/sec, which surpasses previous design with 48.6% hardware cost saving.

...read moreread less

Journal Article•DOI•

Generic Permutation Network for QC-LDPC Decoder

[...]

Xiao Peng¹, Xiongxin Zhao¹, Zhixiang Chen¹, Fumiaki Maehara¹, Satoshi Goto¹ - Show less +1 more•Institutions (1)

Waseda University¹

01 Dec 2010-IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

TL;DR: The generic permutation network (GPN) for the reconfigurable QC-LDPC decoder could break through the input number restriction, such as power of 2 and other limited number, and optimize the network for any application in demand.

...read moreread less

Abstract: Permutation network plays an important role in the reconfigurable QC-LDPC decoder for most modem wireless communication systems with multiple code rates and various code lengths. This paper presents the generic permutation network (GPN) for the reconfigurable QC-LDPC decoder. Compared with conventional permutation networks, this proposal could break through the input number restriction, such as power of 2 and other limited number, and optimize the network for any application in demand. Moreover, the proposed scheme could greatly reduce the latency because of less stages and efficient control signal generating algorithm. In addition, the proposed network processes the nature of high parallelism which could enable several groups of data to be cyclically shifted simultaneously. The synthesis results using the 90 nm technology demonstrate that this architecture can be implemented with the gate count of 18.3k for WiMAX standard at the frequency of 600 MHz and 10.9k for WiFi standard at the frequency of 800 MHz.

...read moreread less