Showing papers on "Gate count published in 2013"

PDF

Open Access

Journal Article•DOI•

A High-Speed Low-Complexity Modified ${\rm Radix}-2^{5}$ FFT Processor for High Rate WPAN Applications

[...]

Taesang Cho¹, Hanho Lee¹•Institutions (1)

01 Jan 2013-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel modified radix-25 FFT algorithm that reduces the hardware complexity is proposed, which can reduce the number of complex multiplications and the size of the twiddle factor memory.

...read moreread less

Abstract: This paper presents a high-speed low-complexity modified radix-25 512-point fast Fourier transform (FFT) processor using an eight data-path pipelined approach for high rate wireless personal area network applications. A novel modified radix-25 FFT algorithm that reduces the hardware complexity is proposed. This method can reduce the number of complex multiplications and the size of the twiddle factor memory. It also uses a complex constant multiplier instead of a complex Booth multiplier. The proposed FFT processor achieves a signal-to-quantization noise ratio of 35 dB at 12 bit internal word length. The proposed processor has been designed and implemented using 90-nm CMOS technology with a supply voltage of 1.2 V. The results demonstrate that the total gate count of the proposed FFT processor is 290 K. Furthermore, the highest throughput rate is up to 2.5 GS/s at 310 MHz while requiring much less hardware complexity.

...read moreread less

63 citations

Patent•

Circuit and system of a low density one-time programmable memory

[...]

Shine C. Chung

06 Feb 2013

TL;DR: In this article, a low-density One-Time Programmable (OTP) memory is disclosed to achieve low gate count and low overhead in the peripheral circuits to save the cost.

...read moreread less

Abstract: A low density One-Time Programmable (OTP) memory is disclosed to achieve low gate count and low overhead in the peripheral circuits to save the cost. A maximum-length Linear Feedback Shift Register (LFSR) can be used to generate 2 n −1 address spaces from an n-bit address. The registers used in the address generator can have two latches. Each latch has two cross-coupled inverters with two outputs coupled to the drains of two MOS input devices, respectively. The inputs of the latch are coupled to the gates of the MOS input devices, respectively. The sources of the MOS input devices are coupled to the drains of at least one MOS device(s), whose gate(s) are coupled to a clock signal and whose source(s) are coupled to a supply voltage. The two latches can be constructed in serial with the outputs of the first latch coupled to the inputs of the second latch.

...read moreread less

41 citations

Journal Article•DOI•

A Low-Latency Low-Power QR-Decomposition ASIC Implementation in 0.13 $\mu{\rm m}$ CMOS

[...]

Mahdi Shabany¹, Dimpesh Patel², P.G. Gulak²•Institutions (2)

Sharif University of Technology¹, University of Toronto²

24 Jan 2013-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A semi-pipelined semi-iterative architecture is presented for the QRD core, that uses innovative design ideas to develop 2-D, Householder 3-D and 4-D/2-D configurable CORDIC processors, such that they can perform the maximum possible number of vectoring and rotation operations within the given number of cycles, while minimizing gate count and maximizing the resource utilization.

...read moreread less

Abstract: This paper presents a hybrid QR decomposition (QRD) design that reduces the number of computations and increases their execution parallelism by using a unique combination of Multi-dimensional Givens rotations, Householder transformations and conventional 2-D Givens rotations. A semi-pipelined semi-iterative architecture is presented for the QRD core, that uses innovative design ideas to develop 2-D, Householder 3-D and 4-D/2-D configurable CORDIC processors, such that they can perform the maximum possible number of vectoring and rotation operations within the given number of cycles, while minimizing gate count and maximizing the resource utilization. Test results for the 0.3 mm2 QRD chip, fabricated in 0.13 μm 1P8M CMOS technology, demonstrate that the proposed design for 4×4 complex matrices attains the lowest reported processing latency of 40 clock cycles (144 ns) at 278 MHz and dissipates 48.2 mW at 1.3 V supply and 25°C. It outperforms all of the previously published QRD designs by offering the highest QR processing efficiency.

...read moreread less

39 citations

Proceedings Article•DOI•

A reconfigurable inverse transform architecture design for HEVC decoder

[...]

Pai-Tse Chiang¹, Tian-Sheuan Chang¹•Institutions (1)

National Chiao Tung University¹

19 May 2013

TL;DR: A reconfigurable hardware design which can support the inverse transform size from 4×4 to 32×32 in HEVC (High Efficiency Video Coding) and only needs about 133.8K gate count is presented.

...read moreread less

Abstract: In this paper, we present a reconfigurable hardware design which can support the inverse transform size from 4×4 to 32×32 in HEVC (High Efficiency Video Coding). We explore the coefficient properties of various inverse transforms such that a base inverse transform unit can be reconfigured or refined to generate other size of inverse transform. The implementation in 90nm technology can support 3840×2160@30fps processing and only needs about 133.8K gate count, which can save 53% of gate count when compared with previous work.

...read moreread less

30 citations

Journal Article•DOI•

Design of a compact reversible fault tolerant field programmable gate array: A novel approach in reversible logic synthesis

[...]

Md. Shamsujjoha¹, Hafiz Md. Hasan Babu¹, Lafifa Jamal¹•Institutions (1)

University of Dhaka¹

01 Jun 2013-Microelectronics Journal

TL;DR: The comparative results show that the proposed design of the FPGA is much better in terms of gate count, garbage outputs, quantum cost, delay, and hardware complexity than the existing approaches.

...read moreread less

15 citations

Journal Article•DOI•

An Improved Direct Digital Synthesizer Using Hybrid Wave Pipelining and CORDIC algorithm for Software Defined Radio

[...]

M. Madheswaran, T. Menakadevi¹•Institutions (1)

Adhiyamaan College of Engineering¹

01 Jun 2013-Circuits Systems and Signal Processing

TL;DR: The improved Direct Digital Synthesizer (DDS) using the Hybrid Wave Pipelining (HWP) technique and COordinate Rotation DIgital Computer (CORDIC) algorithm for Software Defined Radio (SDR) is presented in this paper.

...read moreread less

Abstract: The improved Direct Digital Synthesizer (DDS) using the Hybrid Wave Pipelining (HWP) technique and COordinate Rotation DIgital Computer (CORDIC) algorithm for Software Defined Radio (SDR) is presented in this paper. In order to achieve high throughput, the hybrid wave pipelining technique is adopted. The HWP can be used to speed up the circuits without insertion of storage elements. The CORDIC algorithm is used for phase-to-amplitude conversion and utilized for dynamic transformation rather than Read Only Memory (ROM) static addressing. The frequency resolution and phase resolution are achieved as 0.023 Hz and 0.088 degree, respectively, at the maximum operating frequency of 199.288 MHz for the proposed DDS architecture. The spectral purity of the proposed design has been improved to 114 dBc with a throughput of 94 %. This paper is focused on the design and implementation of DDS using hybrid wave pipelining with CORDIC approach to target on Xilinx Spartan 3 (XC3S400-5PQ208) Field Programmable Gate Array (FPGA) with a speed grade of −5. The proposed DDS design reduces the gate count from 49.4 % to 18.2 % as compared to the conventional pipelined Read Only Memory Look Up Table (ROMLUT) DDS method. The throughput of the proposed method has been improved from 78 % to 94 % and 55 % of total power reduction as compared with conventional DDS. The performance of the improved DDS architecture is compared with several existing DDS architectures and it is found that the present design is outperforming and can be used for software defined radios.

...read moreread less

14 citations

Journal Article•DOI•

High Throughput Stochastic Log-MAP Turbo-Decoder Based on Low Bits Computation

[...]

Jienan Chen¹, Jianhao Hu¹•Institutions (1)

University of Electronic Science and Technology of China¹

20 Aug 2013-IEEE Signal Processing Letters

TL;DR: The report shows that the proposed decoder can achieve 4.0 Gbps with 7.1 M gate count to decode a 2048-length R=1/3 turbo code, when the bit error rate (BER) is 10-5 @ Eb/N0=1.25 dB.

...read moreread less

Abstract: In this letter, we propose a high throughput stochastic Low Bits Computation (LBC) turbo decoder. We represent the signal by a 3-bits width stochastic stream, which improves the accuracy of stochastic computation significantly. We have designed and synthesized our design based on CMOS 90 nm technology. The report shows that the proposed decoder can achieve 4.0 Gbps with 7.1 M gate count to decode a 2048-length R=1/3 turbo code, when the bit error rate (BER) is 10-5 @ Eb/N0=1.25 dB.

...read moreread less

13 citations

Journal Article•DOI•

Efficient quantum computing between remote qubits in linear nearest neighbor architectures

[...]

Preethika Kumar¹•Institutions (1)

Wichita State University¹

01 Apr 2013-Quantum Information Processing

TL;DR: A new scheme is proposed for implementing gate operations between remote qubits in linear nearest neighbor (LNN) architectures, one that does not require qubits to be adjacent to each other in order to perform a gate operation between them, using a new two-control, one-target controlled-unitary gate operation, which is referred to as the C2(−I) gate.

...read moreread less

Abstract: We propose a new scheme for implementing gate operations between remote qubits in linear nearest neighbor (LNN) architectures, one that does not require qubits to be adjacent to each other in order to perform a gate operation between them. The key feature of our scheme is a new two-control, one-target controlled-unitary gate operation, which we refer to as the C2(?I) gate. The gate operation can be implemented easily in a single step, requiring only a single control parameter of the system Hamiltonian. Using the C2(?I) gate, we show how to implement CNOT gate operations between remote qubits that do not have any direct coupling between them, along an LNN array. Since this is achieved without requiring swap operations or additional ancilla qubits in the circuit, the quantum cost of our circuit can be more than 50 % lower than those using conventional swap methods. All CNOT gate operations between remote qubits can be achieved with fidelity greater than 99.5 %.

...read moreread less

12 citations

Journal Article•DOI•

Design and Implementation of Low-Power Hardware Architecture With Single-Cycle Divider for On-Line Clustering Algorithm

[...]

Tse-Wei Chen¹, Makoto Ikeda¹•Institutions (1)

University of Tokyo¹

23 May 2013-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The high-throughput parallel divider in the moving-average engine is a new solution to reduce the computational time of one division operation to a single clock cycle and to calculate cumulative moving averages with no precision loss.

...read moreread less

Abstract: A dual-stage hardware architecture that supports two kinds of moving averages for the on-line clustering algorithm is proposed. The architectural design of this work is different from the one of previous works that focus on the iterative clustering algorithm. The system includes a set of memories that operates in ping-pong mode, so that the Manhattan distances can be computed when the centroids are updated. The high-throughput parallel divider in the moving-average engine is a new solution to reduce the computational time of one division operation to a single clock cycle and to calculate cumulative moving averages with no precision loss. Two hardware examples show the robustness of the proposed architecture, and the architectural analysis is performed with the 90 nm CMOS technology. In the first example, the gate count is the smallest and the normalized power consumption of this work is the lowest among previous works. In the second example, the architecture is compared with related works, which implement the Self-Organizing Map (SOM) algorithm. The proposed work has high flexibility for parameter combinations and can achieve high performance for color quantization in a single iteration. The functionalities of the proposed system are also verified with the background subtraction application.

...read moreread less

9 citations

Proceedings Article•DOI•

An approach to constructing reversible multi-qubit benchmarks with provably minimal implementations

[...]

Jerzy Jegier, Pawel Kerntopf¹, Marek Szyprowski¹•Institutions (1)

Warsaw University of Technology¹

01 Aug 2013

TL;DR: It is shown how to extrapolate 3- and 4-variable reversible functions implemented by gate count minimal circuits having regular structure to construct sequences of reversible functions of an arbitrary number of variables.

...read moreread less

Abstract: This paper reports on a method of the construction of new difficult benchmarks for reversible logic synthesis. It is shown how to extrapolate 3- and 4-variable reversible functions implemented by gate count minimal circuits having regular structure. In this way sequences of reversible functions of an arbitrary number of variables have been constructed for which we have built minimal circuits implementing them. For two example sequences of functions we applied the synthesis tool Revkit trying different synthesis algorithms. The outcome shows a large gap between the circuits synthesized by the tool and the ones proved minimal by construction.

...read moreread less

8 citations

Proceedings Article•DOI•

Embedded tutorial: Applications of reversible logic in cryptography and coding theory

[...]

Kamalika Datta¹, Indranil Sengupta²•Institutions (2)

Indian Institute of Engineering Science and Technology, Shibpur¹, Indian Institute of Technology Kharagpur²

07 Mar 2013

TL;DR: To evaluate the goodness of a synthesized netlist, various metrics such as gate count, quantum cost, and equivalent transistor cost have been considered by various researchers.

...read moreread less

Abstract: Summary form only given, as follows. With the increasing emphasis on low-power design and quantum computation, research activities in the area of reversible logic synthesis and testing have gained momentum over the last couple of decades. It is expected that reversible logic will provide us with a viable alternative to building ultra-low power circuits and systems in not too distant future. In the classical works for synthesis of reversible circuits, gate libraries comprising of standard reversible gates like NOT, CNOT, TOFFOLI, FREDKIN, etc. are considered. To evaluate the goodness of a synthesized netlist, various metrics such as gate count, quantum cost, and equivalent transistor cost have been considered by various researchers. The synthesis approaches that have been reported can be broadly categorized into three groups: (a) exact synthesis approaches which try to obtain optimal reversible gate netlists, but can be used for small circuits only, (b) heuristic based approaches which try to utilize some domain knowledge intelligently to reduce the complexity of search, and can be used for somewhat larger circuits, and (c) synthesis approaches that rely on higher level functional representations like binary decision diagram (BDD) or exclusive sum-ofproducts (ESOP). The last approach is scalable to larger circuits (with 200 inputs or more), however, the synthesized netlist is not optimal and various rule-based heuristic approaches have been proposed to minimize the cost. There have been works also that report techniques for implementing sequential circuits with reversible properties, which will be useful for building complex systems containing finite-state machines. There are various transformations that are carried out as part of cryptographic algorithms that are inherently reversible in nature. For instance, any block cipher that uses a key K to transform a plaintext P into a ciphertext C during encryption must be reversible, because decryption will be doing just the reverse (C to P). Also, in standard symmetric block ciphers like DES or AES, there is a combinational block called substitution box or S-box which is also reversible in nature. In AES, the S-box has 8 inputs and 8 outputs, and implements a one-to-one onto mapping. The same reversibility requirements hold for stream ciphers and public-key ciphers like RSA. Although not much work has been carried out in the area of reversible implementations of cryptographic algorithms, this can be a very good area for future research. Similar considerations hold for various coding and decoding techniques used in communication, which are also inherently reversible in nature. Some examples of such coding/decoding are Manchester, Differential Manchester, Bipolar AMI, 4B/5B, 8B/10B, Hamming error correcting code, etc. All these techniques can potentially be implemented using reversible logic circuits. Specific case studies of some of the areas as mentioned will be reported, with synthesis results.

...read moreread less

Proceedings Article•DOI•

Programmable lowpower implementation of the HEVC Adaptive Loop Filter

[...]

Ilkka Hautala¹, Jani Boutellier¹, Jari Hannuksela¹•Institutions (1)

University of Oulu¹

26 May 2013

TL;DR: This paper presents a programmable application specific instruction processor for the Adaptive Loop Filter, and to the authors' best knowledge this is the first programmable solution for ALF on embedded devices.

...read moreread less

Abstract: The Adaptive Loop Filter (ALF) is a subjective and objective image quality improving filter in the High Efficiency Video Coding standard (HEVC). The ALF has shown to be computationally complex and its complexity has been reduced during the HEVC development process. In the HEVC TestModel HM-7.0 ALF is a 9×7 cross + 3×3 square shaped filter. This paper presents a programmable application specific instruction processor for the ALF. The proposed processor processes 1920×1080p luminance frames at 30 frames per second, when operated at a clock frequency of 311MHz. Low power consumption and a low gate count make the proposed processor suitable for embedded devices. The processor program code is written in pure C-language, which allows versatile use of the circuit and updates to the filter functionality without modifying the processor design. To the authors' best knowledge this is the first programmable solution for ALF on embedded devices.

...read moreread less

Journal Article•DOI•

Design of Low-Area HEVC Core Transform Architecture

[...]

Seung-Mok Han, Woo-Jin Nam, Seongsoo Lee¹•Institutions (1)

Soongsil University¹

30 Jun 2013

TL;DR: In this paper, the core transform architecture is implemented with only adders and shifters instead of area-consuming multipliers, which can process from to blocks with common hardware by reusing processing elements.

...read moreread less

Abstract: This paper proposes and implements an core transform architecture, which is one of the major processes in HEVC video compression standard. The proposed core transform architecture is implemented with only adders and shifters instead of area-consuming multipliers. Shifters in the proposed core transform architecture are implemented in wires and multiplexers, which significantly reduces chip area. Also, it can process from to blocks with common hardware by reusing processing elements. Designed core transform architecture in 0.13um technology can process a block with 2-D transform in 130 cycles, and its gate count is 101,015 gates.

...read moreread less

Journal Article•DOI•

A Very Compact AES-SPIHT Selective Encryption Computer Architecture Design with Improved S-Box

[...]

Jia Hao Kong¹, Li-Minn Ang², Kah Phooi Seng²•Institutions (2)

University of Nottingham¹, Edith Cowan University²

31 Jul 2013-The Journal of Engineering

TL;DR: This paper presents a version of minimized S-box with two separate proposals and improvements in the overall gate count, and presents a selective encryption architecture (SEA) which incorporates the CISA as a part of the encryption core, accompanied by the set partitioning in hierarchical trees (SPIHT) algorithm as a complete selective encryption system.

...read moreread less

Abstract: The “S-box” algorithm is a key component in the Advanced Encryption Standard (AES) due to its nonlinear property. Various implementation approaches have been researched and discussed meeting stringent application goals (such as low power, high throughput, low area), but the ultimate goal for many researchers is to find a compact and small hardware footprint for the S-box circuit. In this paper, we present our version of minimized S-box with two separate proposals and improvements in the overall gate count. The compact S-box is adopted with a compact and optimum processor architecture specifically tailored for the AES, namely, the compact instruction set architecture (CISA). To further justify and strengthen the purpose of the compact crypto-processor’s application, we have also presented a selective encryption architecture (SEA) which incorporates the CISA as a part of the encryption core, accompanied by the set partitioning in hierarchical trees (SPIHT) algorithm as a complete selective encryption system.

...read moreread less

Journal Article•DOI•

A Dual-Mode Deblocking Filter Design for HEVC and H.264/AVC

[...]

Muchen Li¹, Jinjia Zhou, Dajiang Zhou¹, Xiao Peng¹, Satoshi Goto¹ - Show less +1 more•Institutions (1)

Waseda University¹

01 Jun 2013-IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

TL;DR: A novel dual-mode deblocking filter architecture which could support both of the HEVC and H.264/AVC standards and save 30% gate counts than the dedicated ones in filter part is introduced.

...read moreread less

Abstract: SUMMARY As the successive video compression standard of H.264/AVC, High E fficiency Video Codec (HEVC) will play an important role in video coding area. In the deblocking filter part, HEVC inherits the basic property of H.264/AVC and gives some new features. Based on this variation, this paper introduces a novel dual-mode deblocking filter architecture which could support both of the HEVC and H.264/AVC standards. For HEVC standard, the proposed symmetric unified-cross unit (SUCU) based filtering scheme greatly reduces the design complexity. As a result, processing a 16×16 block needs 24 clock cycles. For H.264/AVC standard, it takes 48 clock cycles for a 16 × 16 macro-block (MB). In synthesis result, the proposed architecture occupies 41.6k equivalent gate count at frequency of 200 MHz in SMIC 65 nm library, which could satisfy the throughput requirement of super hi-vision (SHV) on 60 fps. With filter reusing scheme, the universal design for the two standards saves 30% gate counts than the dedicated ones in filter part. In addition, the total power consumption could be reduced by 57.2% with skipping mode when the edges need not be fil

...read moreread less

Journal Article•DOI•

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder

[...]

Hongkyun Jung, Kwangki Ryoo

31 May 2013-The Journal of the Korean Institute of Information and Communication Engineering

TL;DR: An intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter.

...read moreread less

Abstract: In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.

...read moreread less

Proceedings Article•DOI•

Low power multi-lane MIPI CSI-2 receiver design and hardware implementations

[...]

Yueh-Chuan Lu, Zong-Yi Chen¹, Pao-Chi Chang•Institutions (1)

National Central University¹

03 Jun 2013

TL;DR: A low power multi-Lane Mobile Industry Processor Interface (MIPI) Camera Serial Interface 2 (CSI-2) receiver architecture which adopts an 8-Byte parallel CSI protocol layer for hardware implementations which reduces more than 37%~43% logic power consumption measured in chip.

...read moreread less

Abstract: This paper proposes a low power multi-Lane Mobile Industry Processor Interface (MIPI) Camera Serial Interface 2 (CSI-2) receiver architecture which adopts an 8-Byte parallel CSI protocol layer for hardware implementations. The proposed scheme can work in environment with 4 data Lanes and 1 Gb/s per data Lane, i.e. with maximum data rate 4 Gb/s, at 62.5 MHz which increases logic operations from 8 ns (125 MHz) to 16 ns (62.5 MHz) without throughput degradation. Therefore, the supply voltage (1.2 V) can be reduced and the power consumption can also be reduced. The proposed architecture is implemented by 0.13 μm CMOS technology and the total gate count is 32.7 K. It not only reduces the operating clock rate but also reduces more than 37%~43% logic power consumption measured in chip.

...read moreread less

Journal Article•DOI•

A New High Radix-2 r ( r ≥ 8) Multibit Recoding Algorithm for Large Operand Size ( N ≥ 32) Multipliers

[...]

A. K. Oudjida, Nicolas Chaillet, M. L. Berrandjia, Ahmed Liacha

01 Apr 2013-Journal of Low Power Electronics

TL;DR: A new recursive recoding algorithm is proposed that shortens the critical path of the multiplier and reduces the hardware complexity of partial-product-generators as well and provides an optimal space/time partitioning of themultiplier architecture for any size N of the operands.

...read moreread less

Abstract: This paper addresses the problem of multiplication with large operand sizes (N≥32). We propose a new recursive recoding algorithm that shortens the critical path of the multiplier and reduces the hardware complexity of partial-product-generators as well. The new recoding algorithm provides an optimal space/time partitioning of the multiplier architecture for any size N of the operands. As a result, the critical path is drastically reduced to 33 N / 2 - 3 with no area overhead in comparison to modified Booth algorithm that shows a critical path of N/2 in adder stages. For instance, only 7 adder stages are needed for a 64-bit two's complement multiplier. Confronted to reference algorithms for N=64, important gain ratios of 1.62, 1.71, 2.64 are obtained in terms of multiply-time, energy consumption per multiply- operation, and total gate count, respectively.

...read moreread less

Journal Article•DOI•

Reversible logic synthesis by quantum rotation gates

[...]

Afshin Abdollahi¹, Mehdi Saeedi¹, Massoud Pedram¹•Institutions (1)

University of Southern California¹

01 Sep 2013-Quantum Information & Computation

TL;DR: A rotation-based synthesis framework for reversible logic that constructs intermediate quantum states that may be in superposition and combines techniques from reversible Boolean logic and quantum computation is proposed.

...read moreread less

Abstract: A rotation-based synthesis framework for reversible logic is proposed. We develop a canonical representation based on binary decision diagrams and introduce operators to manipulate the developed representation model. Furthermore, a recursive functional bidecomposition approach is proposed to automatically synthesize a given function. While Boolean reversible logic is particularly addressed, our framework constructs intermediate quantum states that may be in superposition, hence we combine techniques from reversible Boolean logic and quantum computation. The proposed approach results in quadratic gate count for multiple-control Toffoli gates without ancillae, linear depth for quantum carry-ripple adder, and quasilinear size for quantum multiplexer.

...read moreread less

Book Chapter•DOI•

Design of Novel Algorithm and Architecture for Gaussian Based Color Image Enhancement System for Real Time Applications

[...]

M. C. Hanumantharaju¹, M. Ravishankar¹, D. R. Rameshbabu¹•Institutions (1)

Dayananda Sagar College of Engineering¹

18 Jan 2013

TL;DR: In this article, the color image enhancement is achieved by first convolving an original image with a Gaussian kernel since Gaussian distribution is a point spread function which smoothes the image and then logarithm domain processing and gain/offset corrections are employed in order to enhance and translate pixels into the display range of 0 to 255.

...read moreread less

Abstract: This paper presents the development of a new algorithm for Gaussian based color image enhancement system. The algorithm has been designed into architecture suitable for FPGA/ASIC implementation. The color image enhancement is achieved by first convolving an original image with a Gaussian kernel since Gaussian distribution is a point spread function which smoothes the image. Further, logarithm-domain processing and gain/offset corrections are employed in order to enhance and translate pixels into the display range of 0 to 255. The proposed algorithm not only provides better dynamic range compression and color rendition effect but also achieves color constancy in an image. The design exploits high degrees of pipelining and parallel processing to achieve real time performance. The design has been realized by RTL compliant Verilog coding and fits into a single FPGA with a gate count utilization of 321,804. The proposed method is implemented using Xilinx Virtex-II Pro XC2VP40-7FF1148 FPGA device and is capable of processing high resolution color motion pictures of sizes of up to 1600×1200 pixels at the real time video rate of 116 frames per second. This shows that the proposed design would work for not only still images but also for high resolution video sequences.

...read moreread less

Journal Article•DOI•

FPGA Implementation of an LFSR based Pseudorandom Pattern Generator for MEMS Testing

[...]

Md. Fokhrul Islam, M. A. Ali, Burhanuddin Yeop Majlis

23 Aug 2013-International Journal of Computer Applications

TL;DR: This paper presents the FPGA implementation of an LFSR based pseudorandom pattern generator that has the characteristics of high speed, low power consumption and it is especially suited in processing environment where uniform distribution random numbers are required.

...read moreread less

Abstract: strides in programmable logic density, speed and hardware description language (HDL) have empowered the engineer with the ability to implement high-performance digital functionality within field programmable gate array (FPGA). Linear feedback shift resister (LFSR) has become one of the central elements used in testing and self testing of contemporary complex electronic systems like processors, controllers and integrated circuits (ICs). This paper presents the FPGA implementation of an LFSR based pseudorandom pattern generator. This LFSR has the characteristics of high speed, low power consumption and it is especially suited in processing environment where uniform distribution random numbers are required. A typical application of the pattern generator considered in this work is the testing of micro- electro-mechanical-system (MEMS), where low power consumption is required. Very high speed integrated circuit HDL (VHDL) was used to implement the LFSR on FPGA. A testbench in VHDL was used to verify the correctness of the design. The compiled VHDL code was been synthesized into gate level. Area and timing optimization were done to achieve a very low gate count of 436 and increase the design speed to 178MHz Mentor Graphics and Xilinx ISE 6, electronic design automation (EDA) tool suite and DIGILENT D2SB PROTO BOARD were used for the overall FPGA implementation process.

...read moreread less

Proceedings Article•DOI•

Embedded 8-bit AES in wireless Bluetooth application

[...]

Chi-Wu Huang¹, Shao-Wei Kuo¹, Chi-Jeng Chang¹•Institutions (1)

National Taiwan Normal University¹

04 Jul 2013

TL;DR: An 8-bit AES direct FPGA hardware implementation of CFB/OFB operations without using the Block RAM (BRAM) is presented, which is the smallest gate count for the 8- bit ASIC implementation ever proposed.

...read moreread less

Abstract: This paper presents an 8-bit AES direct FPGA hardware implementation of CFB/OFB operations without using the Block RAM (BRAM). The 8-bit AES core is then embedded through a microcontroller to interface with Bluetooth wireless for performing encryption or decryption. Two sets of the embedded systems are configured together to experiment the AES operation of the image encryption and decryption through wireless communication achieved the baud rate of 0.23 Megabits per second (Mbps). CFB/OFB operations have two advantages over ECB operation; one is the low area circuit design, and the other is the complete hiding of input patterns in plain image with identical colors. Though CFB/OFB implementation without BRAM has a little larger slice area then the implementation with RAM, yet the non-BRAM in ASIC implementation achieved only 2.2K gates, synthesized using 0.18μm technology, which is the smallest gate count for the 8-bit ASIC implementation ever proposed.

...read moreread less

Proceedings Article•DOI•

A BCH decoding architecture with mixed parallelization degrees for flash controller applications

[...]

Jens Spinner, Jurgen Freudenberger, Christoph Baumhof, Axel Mehnert, Richard Willems - Show less +1 more

01 Sep 2013

TL;DR: A new BCH decoding architecture is presented that combines different parallelization degrees for the Berlekamp-Massey algorithm and the Chien search, which significantly reduces the number of required multipliers.

...read moreread less

Abstract: Error correction coding (ECC) has become one of the most important tasks of flash memory controllers. The gate count of ECC hardware is taking up a significant share of the overall SOC logic. Scaling ECC strength to growing error correction requirements has become increasingly difficult when considering cost and area limitations. In this work, a new BCH decoding architecture is presented that combines different parallelization degrees for the Berlekamp-Massey algorithm and the Chien search. This approach significantly reduces the number of required multipliers. Nevertheless, the average decoding speed is equal to that of a fully parallel implementation.

...read moreread less

Journal Article•DOI•

New Cost-Effective Simplified Euclid's Algorithm for Reed-Solomon Decoders

[...]

Jaehyun Baek¹, Myung Hoon Sunwoo²•Institutions (2)

Samsung¹, Ajou University²

01 May 2013

TL;DR: The new proposed SE algorithm, using new initial conditions and polynomials, can significantly reduce the computation complexity compared with the existing ME and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, since it has the least number of coefficients in the newInitial conditions.

...read moreread less

Abstract: This paper proposes a cost-effective simplified Euclid's (SE) algorithm for Reed-Solomon decoders, which can replace the existing modified Euclid's (ME) algorithm. The new proposed SE algorithm, using new initial conditions and polynomials, can significantly reduce the computation complexity compared with the existing ME and reformulated inversionless Berlekamp-Massey (RiBM) algorithms, since it has the least number of coefficients in the new initial conditions. Thus, the proposed SE architecture, consisting of only 3t basic cells, has the smallest area among the existing key solver blocks, where t means the error correction capability. In addition, the SE architecture requires only the latency of 2t clock cycles to solve the key equation without initial latency. The proposed RS decoder has been synthesized using the 0.18 μm Samsung cell library, and the gate count of the RS decoder, excluding FIFO memory, is only 40,136 for the (255, 239, 8) RS code.

...read moreread less

Proceedings Article•DOI•

A 6.72-Gb/s, 8pJ/bit/iteration WPAN LDPC decoder in 65nm CMOS

[...]

Zhixiang Chen¹, Xiao Peng¹, Xiongxin Zhao¹, Leona Okamura¹, Dajiang Zhou¹, Satoshi Goto¹ - Show less +2 more•Institutions (1)

Waseda University¹

29 Apr 2013

TL;DR: An LDPC decoder in 65nm CMOS targeting WPAN (IEEE 802.15.3c) is presented with measurement results and a modified-PCM based message permutation strategy with compatible data flow is proposed to solve the network problem raised by high parallelism LDPC decoding.

...read moreread less

Abstract: An LDPC decoder in 65nm CMOS targeting WPAN (IEEE 802.15.3c) is presented with measurement results. A modified-PCM based message permutation strategy with compatible data flow is proposed to solve the network problem raised by high parallelism LDPC decoding. Compared to the state-of-art, decoder chip achieves 17.7%, 33.5% and 49% improvements in chip density, gate count and energy efficiency, respectively.

...read moreread less

Book Chapter•DOI•

An Intra Prediction Hardware Architecture with Low Computational Complexity for HEVC Decoder

[...]

Hongkyun Jung¹, Kwangki Ryoo¹•Institutions (1)

Hanbat National University¹

01 Jan 2013

TL;DR: Hardware architecture with shared operation unit, common operation unit and fast smoothing decision algorithm is proposed to reduce computational complexity of intra prediction in HEVC decoder and uses only bit-comparators instead of arithmetic operators.

...read moreread less

Abstract: In this paper, hardware architecture with shared operation unit, common operation unit and fast smoothing decision algorithm is proposed to reduce computational complexity of intra prediction in HEVC decoder. The shared operation unit shares adders computing common operations in smoothing equations to remove the computational redundancy and pre-computes the mean value of reference pixels for removing an idle cycle in DC mode. The common operation unit uses one operation unit to generate predicted pixels and filters predicted pixels in all prediction modes to reduce the number of operation units for each mode. The decision algorithm uses only bit-comparators instead of arithmetic operators. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency of the architecture are 40.5 k and 164 MHz, respectively. The number of processing cycles of the architecture for one 4 × 4 PU is one cycle and about 93.7 % less than the previous one.

...read moreread less

A Low Power Multi-Bit Flip-Flop Design for Counter Measure Circuit in Cryptographic Applications

[...]

M Jagadeeswari

01 Jan 2013

TL;DR: A combination table which can store the flip-flops that can be merged is introduced in the counter measure circuit to reduce the power as well as area in the clock power circuit.

...read moreread less

Abstract: The clock power is the major dynamic power source VLSI circuits. The multi bit flip-flop technique is one of the techniques used to reduce the clock power. The power reduction is achieved through the merging of flip-flops based on certain timing constraints. A combination table which can store the flip-flops that can be merged is introduced in the proposed work. The Differential Power Analysis (DPA) is a big threat to crypto chips since it can efficiently disclose the secret key. Using the self generated true random sequence based counter measure circuit the differential power attack can be reduced. The multi-bit flip-flop technique is introduced in the counter measure circuit to reduce the power as well as area. According to the experimental results it is found that the flip- flops after merging reduces the dynamic power about 27.27% and the total power about 12.59%. It is also found that the total gate count is reduced from 7709 to 7389. Keywords- merging,combination table; multi-bit flip-flop; differential power analysis;LFSR;counter measure circuit

...read moreread less

Proceedings Article•DOI•

Randomness analysis on grain - 128 stream cipher

[...]

Kamaruzzaman Seman, Nurzi Juana Mohd Zaizi

30 Sep 2013

TL;DR: The randomness analysis of Grain-128 stream cipher algorithm by using NIST Statistical Test Suite is introduced and it is obtained that this algorithm is not random at the 1% significance level.

...read moreread less

Abstract: In this work, the randomness analysis of Grain-128 stream cipher algorithm by using NIST Statistical Test Suite is introduced. The NIST Statistical Test Suite is applied to determine the randomness of this algorithm. The Grain - 128 is based on LFSR, NLFSR and Boolean function with suitable for limited resources like gate count, power consumption and area chip. It uses 128-bit key and 96-bit initial value (IV). Based on our result of conducting the analysis, we obtained that this algorithm is not random at the 1% significance level.

...read moreread less

Journal Article•DOI•

A low-power oriented architecture for H.264 variable block size motion estimation based on a resource sharing scheme

[...]

Majdi Elhaji¹, Abdelkrim Zitouni¹, Samy Meftali², Jean-Luc Dekeyser², Rached Tourki¹ - Show less +1 more•Institutions (2)

University of Monastir¹, university of lille²

01 Sep 2013-Integration

TL;DR: A flexible VLSI architecture for full-search VBSME (FSVBSME), allowing the partitioning of the source frames into sixteen 4x4 sub-blocks and using a MVP scheme, which can offer higher processing speed, lower power consumption, lower latency and lower gate count complexity.

...read moreread less

Journal Article•DOI•

A Low-Overhead Interference Canceller for High-Mobility STBC-OFDM Systems

[...]

Hsiao-Yun Chen¹, Wei-Kai Chang¹, Shyh-Jye Jou¹•Institutions (1)

National Chiao Tung University¹

18 Mar 2013-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The proposed low-overhead space-time block code (STBC) interference canceller combined with the two-stage channel estimator can be applied to wireless metropolitan area network (WMAN), like IEEE 802.16e system.

...read moreread less

Abstract: This paper proposes a low-overhead space-time block code (STBC) interference canceller for high-mobility STBC-orthogonal frequency division multiplexing (STBC-OFDM) systems. The proposed STBC interference canceller combined with the two-stage channel estimator can be applied to wireless metropolitan area network (WMAN), like IEEE 802.16e system. At the vehicle speeds of 240 km/hr for 16 quadrature amplitude modulation (16 QAM), the bit error rate (BER)can be improved about 10 times of that just using the two-stage channel estimator. The proposed design is implemented in 90 nm CMOS technology. The gate count is 109.3 K, and the power dissipation is 1.45 mW at 83.3 MHz operation frequency with 1 V power supply. However, up to 61% hardware can be reused from the existed two-stage channel estimator design. After reusing, the proposed STBC interference canceller requires only 42.2 K gates, which is 4.9% overhead of the two-stage channel estimator.

...read moreread less