scispace - formally typeset
Search or ask a question

Showing papers on "Gate count published in 2015"


Journal ArticleDOI
TL;DR: This paper presents the first silicon-proven stochastic LDPC decoder to support multiple code rates for IEEE 802.15.3c applications and achieves over 90% reduction of routing wires, 73.8% and 11.5% enhancement of hardware and energy efficiency, respectively.
Abstract: This paper presents the first silicon-proven stochastic LDPC decoder to support multiple code rates for IEEE 802.15.3c applications. The critical path is improved by a reconfigurable stochastic check node unit (CNU) and variable node unit (VNU); therefore, a high throughput scheme can be realized with 768 MHz clock frequency. To achieve higher hardware and energy efficiency, the reduced complexity architecture of tracking forecast memory is experimentally investigated to implement the variable node units for IEEE 802.15.3c applications. Based on the properties of parity check matrices and stochastic arithmetic, the optimized routing networks with re-permutation techniques are adopted to enhance chip utilization. Considering the measurement uncertainties, a delay-lock loop with isolated power domain and a test environment consisting of an encoder, an AWGN generator and bypass circuits are also designed for inner clock and information generation. With these features, our proposed fully parallel LDPC decoder chip fabricated in 90-nm CMOS process with 760.3 K gate count can achieve 7.92 Gb/s data rate and power consumption of 437.2 mW under 1.2 V supply voltage. Compared to the state-of-the-art IEEE 802.15.3c LDPC decoder chips, our proposed chip achieves over 90% reduction of routing wires, 73.8% and 11.5% enhancement of hardware and energy efficiency, respectively.

51 citations


Journal ArticleDOI
TL;DR: This paper presents two low-resource implementations of a 1,024-bit Rabin encryption variant called WIPR—in embedded software and in hardware and shows that the main performance bottleneck of the system is not the encryption time but rather the air interface.
Abstract: Passive radio-frequency identification (RFID) tags have long been thought to be too weak to implement public-key cryptography: It is commonly assumed that the power consumption, gate count and computation time of full-strength encryption exceed the capabilities of RFID tags. In this paper, we demonstrate that these assumptions are incorrect. We present two low-resource implementations of a 1,024-bit Rabin encryption variant called WIPR--in embedded software and in hardware. Our experiments with the software implementation show that the main performance bottleneck of the system is not the encryption time but rather the air interface and that the reader's implementation of the electronic product code Class-1 Generation-2 RFID standard has a crucial effect on the system's overall performance. Next, using a highly optimized hardware implementation, we investigate the trade-offs between speed, area and power consumption to derive a practical working point for a hardware implementation of WIPR. Our recommended implementation has a data-path area of 4,184 gate equivalents, an encryption time of 180 ms and an average power consumption of 11 $$\upmu $$ μ W, well within the established operating envelope for passive RFID tags.

41 citations


Book ChapterDOI
08 Apr 2015
TL;DR: The method combines a circuit simulation with a formal verification in order to detect the functional inequivalence of the parent and its offspring and enabled a 34 % reduction in gate count even if the optimizer was executed only for 15 min.
Abstract: A new approach to the evolutionary optimization of large digital circuits is introduced in this paper. In contrast with evolutionary circuit design, the goal of the evolutionary circuit optimization is to minimize the number of gates (or other non-functional parameters) of already functional circuit. The method combines a circuit simulation with a formal verification in order to detect the functional inequivalence of the parent and its offspring. An extensive set of 100 benchmarks circuits is used to evaluate the performance of the method as well as the utilized evolutionary approach. Moreover, the role of neutral mutations in the context of evolutionary optimization is investigated. In average, the method enabled a 34 % reduction in gate count even if the optimizer was executed only for 15 min.

34 citations


Journal ArticleDOI
TL;DR: The architectural details, design examples and complexity comparisons show that the SPA-MCDM-FB is easy to design and offers substantial savings in gate count, number of variable multipliers and group delay over other filter banks.
Abstract: This paper presents a design of linear-phase, low-complexity, reconfigurable digital filter bank that offers independent and complete control over the bandwidth as well as the center frequency of all subbands. The proposed filter bank is designed by integrating spectral parameter approximation (SPA) technique with the modified coefficient decimation method (MCDM), referred to as SPA-MCDM-FB. The architectural details, design examples and complexity comparisons show that the SPA-MCDM-FB is easy to design and offers substantial savings in gate count, number of variable multipliers and group delay over other filter banks. Moreover, these savings increase further with the increase in the filter-bank resolution (i.e., number of subbands). The SPA-MCDM-FB is then combined with the upper confidence bound (UCB)-based decision-making algorithm to search the vacant band(s) of any desired bandwidth for spectrum-sensing application in cognitive radio (CR). The simulations results verify that the proposed scheme offers superior performance [i.e., improved utilization of vacant subband(s)] and needs fewer gate counts compared to uniform filter bank and UCB-algorithm-based schemes. Furthermore, the functionality and advantages of the SPA-MCDM-FB are also verified for the channelization operation in CR supporting multiple communication standards.

20 citations


Proceedings ArticleDOI
21 Jul 2015
TL;DR: The design successfully integrates the SVD module with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework and outperform previous designs significantly.
Abstract: Precoding is an effective scheme in pre-compensating the wireless channel impairments and the singular value decomposition (SVD) scheme is a popular choice. This paper presents a unified high throughput SVD/QRD precoder chip design for MIMO OFDM systems. A hardware-implementation-friendly Givens Rotation (GR) based SVD computing scheme is developed first. It starts with a bi-diagonalization phase followed by an iterative diagonalization phase consisting of successive nullification sweeps. A convergence detection mechanism is employed to terminate the computations if the required precision is achieved. The design successfully integrates the SVD module (for precoding) with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework. The design features a two-level pipelined, fully parallel architecture and CORDIC processors are employed to implement the GR modules efficiently. Various design optimization techniques are applied to reduce the circuit complexity and the power consumption. The implementation using TSMC 90nm process technology indicates a 35.75M SVDs per second throughput rate when operating at 143MHz. Both the throughput rate and the gate count efficiency of the proposed one outperform previous designs significantly.

14 citations


Journal ArticleDOI
01 Jan 2015
TL;DR: An algorithm to synthesizereversible function in their positive-polarity Reed Muller (PPRM) expansion and the Hamming Distance approach to select suitable transformation path is proposed.
Abstract: In this paper, we present an efficient reversible logic synthesis algorithm that uses Toffoli and mixed-polarity based Toffoli gate. In this paper, we propose an algorithm to synthesizereversible function in their positive-polarity Reed Muller (PPRM) expansion and usethe Hamming Distance (HD) approach to select suitable transformation path. Once a transformation path is defined, suitable gates for substitution are selected through the gate matching factor and reduction is performed. The algorithm does not generate any extra lines and thus keeping the synthesized function in its simplest form. The algorithm target on efficient way to synthesize three variables based reversible function into a cascade of Toffoli and mixed-polarity based Toffoli gate in term of quantum cost and gate count. Experimental results showthat the proposed algorithm is efficient in terms of the realization of all three variable based reversible functions.

11 citations


Proceedings ArticleDOI
01 Nov 2015
TL;DR: This paper presents design techniques that result in minimal complexity of 4×4-bits unsigned reversible multiplier in terms of the number of gates, theNumber of garbage outputs, thenumber of constant inputs and the total quantum cost.
Abstract: Reversible digital multipliers theoretically do not dissipate energy but are very complex in design. To reduce the gate count, many proposed optimizations employ large reversible gates, which however neither decrease the circuit complexity nor its quantum cost. In this paper, we present design techniques that result in minimal complexity of 4×4-bits unsigned reversible multiplier in terms of the number of gates, the number of garbage outputs, the number of constant inputs and the total quantum cost.

8 citations


Journal ArticleDOI
TL;DR: This brief proposes a dynamic error-compensation circuit for a fixed-width squarer based on the Booth-folding technique that achieves the best tradeoff between area efficiency, cost, and accuracy.
Abstract: This brief proposes a dynamic error-compensation circuit for a fixed-width squarer based on the Booth-folding technique. According to the expected value of the partial product through the Booth encoder, a closed form of the compensated value can be derived, including column information that can be used to improve accuracy. The proposed compensation circuit was derived using a mathematical probability model, which means that it is easily implemented for bit lengths of 32, 64, and longer. Implemented using the Taiwan Semiconductor Manufacturing Company Ltd. 0.18- $\mu\mbox{m}$ CMOS process, the proposed 32-bit squarer achieved an operation frequency of 50 MHz and a gate count of 3.7 k. Compared with previous solutions, the proposed squarer achieves the best tradeoff between area efficiency, cost, and accuracy.

8 citations


Patent
31 Aug 2015
TL;DR: In this paper, the authors describe integrated circuits such as microcontrollers with a low energy accelerator processor circuit or other application specific integrated processor circuit including a load store circuit operative to perform load and store operations associated with at least one register and a low gate count shift circuit to selectively shift the data of the register by only an integer number bits less than the register data width without using a barrel shifter for low power operation to support vector operations for FFT or filtering functions.
Abstract: Described examples include integrated circuits such as microcontrollers with a low energy accelerator processor circuit or other application specific integrated processor circuit including a load store circuit operative to perform load and store operations associated with at least one register and a low gate count shift circuit to selectively shift the data of the register by only an integer number bits less than the register data width without using a barrel shifter for low power operation to support vector operations for FFT or filtering functions.

7 citations


Book ChapterDOI
TL;DR: In this paper, the authors present a quantum circuit representation consisting of qubit initialisations (I), a network of controlled-NOT gates (C) and measurements with respect to different bases (M).
Abstract: We present a quantum circuit representation consisting entirely of qubit initialisations (I), a network of controlled-NOT gates (C) and measurements with respect to different bases (M). The ICM representation is useful for optimisation of quantum circuits that include teleportation, which is required for fault-tolerant, error corrected quantum computation. The non-deterministic nature of teleportation necessitates the conditional introduction of corrective quantum gates and additional ancillae during circuit execution. Therefore, the standard optimisation objectives, gate count and number of wires, are not well-defined for general teleportation-based circuits. The transformation of a circuit into the ICM representation provides a canonical form for an exact fault-tolerant, error corrected circuit needed for optimisation prior to the final implementation in a realistic hardware model.

7 citations


Book ChapterDOI
19 Oct 2015
TL;DR: It is reported on fast constant-time Intel SSSE3 and ARM NEON SIMD WHIRLBOB implementations that keep full miniboxes in registers and access them via SIMD shuffles and an FPGA implementation that compares favorably to 7,972 required for Keccak / Keyak on the same target platform.
Abstract: WHIRLBOB, also known as STRIBOBr2, is an AEAD (Authenticated Encryption with Associated Data) algorithm derived from STRIBOBr1 and the Whirlpool hash algorithm. WHIRLBOB/STRIBOBr2 is a second round candidate in the CAESAR competition. As with STRIBOBr1, the reduced-size Sponge design has a strong provable security link with a standardized hash algorithm. The new design utilizes only the LPS or ρ component of Whirlpool in flexibly domain-separated BLNK Sponge mode. The number of rounds is increased from 10 to 12 as a countermeasure against Rebound Distinguishing attacks. The 8 ×8 - bit S-Box used by Whirlpool and WHIRLBOB is constructed from 4 ×4 - bit “MiniBoxes”. We report on fast constant-time Intel SSSE3 and ARM NEON SIMD WHIRLBOB implementations that keep full miniboxes in registers and access them via SIMD shuffles. This is an efficient countermeasure against AES-style cache timing side-channel attacks. Another main advantage of WHIRLBOB over STRIBOBr1 (and most other AEADs) is its greatly reduced implementation footprint on lightweight platforms. On many lower-end microcontrollers the total software footprint of π+BLNK = WHIRLBOB AEAD is less than half a kilobyte. We also report an FPGA implementation that requires 4,946 logic units for a single round of WHIRLBOB, which compares favorably to 7,972 required for Keccak / Keyak on the same target platform. The relatively small S-Box gate count also enables efficient 64-bit bitsliced straight-line implementations. We finally present some discussion and analysis on the relationships between WHIRLBOB, Whirlpool, the Russian GOST Streebog hash, and the recent draft Russian Encryption Standard Kuznyechik.

Journal ArticleDOI
TL;DR: In this article, it was shown that using a combination of these standard techniques presented by Barenco et al. (Phys Rev A 52(5):3457, 1995), one can create an n-qubit version of the Toffoli using less gates and the same number of ancilla qubits as recent work using computer optimization.
Abstract: In this paper, we show that it is possible to adapt a qudit scheme for creating a controlled-Toffoli created by Ralph et al. (Phys Rev A 75:022313, 2007) to be applicable to qubits. While this scheme requires more gates than standard schemes for creating large controlled gates, we show that with simple adaptations, it is directly equivalent to the standard scheme in the literature. This scheme is the most gate-efficient way of creating large controlled unitaries currently known; however, it is expensive in terms of the number of ancilla qubits used. We go on to show that using a combination of these standard techniques presented by Barenco et al. (Phys Rev A 52(5):3457, 1995), we can create an n-qubit version of the Toffoli using less gates and the same number of ancilla qubits as recent work using computer optimization. This would be useful in any architecture of quantum computing where gates are cheap but qubit initialization is expensive.

Proceedings ArticleDOI
24 May 2015
TL;DR: A scheme that computes the CE signals to be transmitted based on box-constrained regression (coordinate-descent), with an O(2MK) complexity per iteration per user symbol is proposed, with a very high throughput of 500 Msamples/sec.
Abstract: This study describes a high throughput constant envelope (CE) pre-coder for Massive MIMO systems. A large number of antennas (M), in the order of 100s, serve a relatively small number of users (K) simultaneously. The stringent amplitude constraint (only phase changes) in the CE scheme is motivated by the use of highly power-efficient non-linear RF power amplifiers. We propose a scheme that computes the CE signals to be transmitted based on box-constrained regression (coordinate-descent), with an O(2MK) complexity per iteration per user symbol. A highly scalable systolic architecture is implemented, where M Processing Elements (PEs) perform the pre-coding for a system with up to K=16 users. This systolic architecture results in a very high throughput of 500 Msamples/sec (at 500 MHz clock rate) with a gate count of 14K per PE in 65nm technology.

Journal ArticleDOI
TL;DR: It is presented that Inventive0 gate is much more efficient and optimized approach as compared to their existing design, in terms of gate count, garbage outputs and constant inputs, and shows that the novel designs are compact, fast as well as low power.
Abstract: A large amount of research is currently going on in the field of reversible logic, which have low heat dissipation, low power consumption, which is the main factor to apply reversible in digital VLSI circuit design.This paper introduces reversible gate named as ‘Inventive0 gate’. The novel gate is synthesis the efficient adder modules with minimum garbage output and gate count. The Inventive0 gate capable of implementing a 4-bit ripple carry adder and carry skip adders. It is presented that Inventive0 gate is much more efficient and optimized approach as compared to their existing design, in terms of gate count, garbage outputs and constant inputs. In addition, some popular available reversible gates are implemented in the MOS transistor design the implementation kept in mind for minimum MOS transistor count and are completely reversible in behaviour more precise forward and backward computation. Lesser architectural complexity show that the novel designs are compact, fast as well as low power.

Proceedings ArticleDOI
21 Jul 2015
TL;DR: A high-efficiency two-parallel Reed-Solomon decoder based on the compensated simplified reformulated inversionless Berlekamp-Massey (CS-RiBM) algorithm, which meets the demands of next generation short-reach optical systems.
Abstract: This paper presents a high-efficiency two-parallel Reed-Solomon (RS) decoder based on the compensated simplified reformulated inversionless Berlekamp-Massey (CS-RiBM) algorithm. To achieve high speed and low hardware complexity, the key equation solver (KES) block is designed by pipelining and folding processing. With TSMC 90nm process, the simulation results reveal that the 16-Channel proposed architecture can operate up to 625MHz and achieve a throughput rate of 156 Gbps with a total gate count of 269,000. The area of the proposed decoder is at least 35.6% fewer with the same technology, which meets the demands of next generation short-reach optical systems.

Proceedings ArticleDOI
04 Apr 2015
TL;DR: A new 5*5 parity preserving reversible gate is proposed in this paper, named as P2RG, which is better in terms of gate count, garbage outputs, constant inputs and area than the existing similitudes.
Abstract: Modern VLSI circuit design is governed by low power consumption requirements of ICs. Reversible logic has received great importance because of no information bit loss during computation which results in low power dissipation. Moreover, there is a need to convert the reversible circuits into fault tolerant reversible circuits to detect the occurrence of errors. Parity preserving property can be used for this. A new 5*5 parity preserving reversible gate is proposed in this paper, named as P2RG. The most significant aspect of this work is that it can work both as a full adder and a full subtract or by using one P2RG and Fred kin gate only. Proposed design is better in terms of gate count, garbage outputs, constant inputs and area than the existing similitudes. Thus, this paper provides the initial threshold to design more complex systems which will be able to execute more complicated operations using parity preserving reversible logic.

01 Jan 2015
TL;DR: In this article, a high throughput constant envelope (CE) pre-coder for massive MIMO systems is proposed, where a large number of antennas (M) serve a relatively small number of users (K) simultaneously.
Abstract: This study describes a high throughput constant envelope (CE) pre-coder for Massive MIMO systems. A large number of antennas (M), in the order of 100s, serve a relatively small number of users (K) simultaneously. The stringent amplitude constraint (only phase changes) in the CE scheme is motivated by the use of highly power-efficient non-linear RF power amplifiers. We propose a scheme that computes the CE signals to be transmitted based on box-constrained regression (coordinatedescent),with an O(2MK) complexity per iteration per user symbol. A highly scalable systolic architecture is implemented, where M Processing Elements (PEs) perform the pre-coding for a system with up to K = 16 users. This systolic architecture results in a very high throughput of 500 Msamples/sec (at 500 MHz clock rate) with a gate count of 14 K per PE in 65 nm technology. (Less)

Journal ArticleDOI
TL;DR: A reversible carry look ahead adder and an array multiplier that result in less garbage outputs, constant inputs, and less gate count compared to previous existing designs and gain better improvements in terms of power and area when compared to conventional adders and multiplier.
Abstract: Reversible logic is one of the promising research areas in low power applications such as quantum computing, optical information processing and low power CMOS design. In this paper we present a reversible carry look ahead adder and an array multiplier. The circuits are designed such that they result in less garbage outputs, constant inputs, and less gate count compared to previous existing designs. We also gain better improvements in terms of power and area when compared to conventional adders and multipliers. The implemented designs are simulated using NC launch and synthesized by RTL compiler. Keywords Reversible, Garbage constant, Garbage output. 1. INTRODUCTION Power dissipation is one of the important problems faced now a day in VLSI design [4]. The combinational circuit dissipates KTlog 2 [1] Joules of heat for every bit of information to be lasted, irrespective of the technology used .where K is Boltzmann constant and T is temperature. Heat dissipation reduces the life span of the circuits. The information is lost when input bits are not able to recover from the output vectors. Reversible gates naturally take care of heat, since input vectors are uniquely recovered from the output vectors. That is there is one-to-one correspondence between input vectors and output vectors. Each output of the Reversible gates is used once, that is the Reversible circuit is feedback free. Some of the terms related to Reversible logic are [2, 3].

Proceedings ArticleDOI
26 Jun 2015
TL;DR: This paper presents the implementation of CORDIC algorithm on a configurable architecture to port various transforms which forms the heart of image and signal processing applications and shows the benefits of the architecture in terms of configurability and cycle timing with a reduced gate count implementation.
Abstract: The CORDIC (COrdinated Rotation DIgital Computer)[1] algorithm provides an efficient and accurate platform to compute various trigonometric, linear and non-linear functions using only shift-add operations. This paper presents the implementation of CORDIC algorithm on a configurable architecture to port various transforms which forms the heart of image and signal processing applications. We shall demonstrate this through the mapping of trigonometric, hyperbolic, logarithm and exponential CORDIC functions on the proposed architecture. Furthermore, we show the benefits of the architecture in terms of configurability and cycle timing with a reduced gate count implementation.

Proceedings ArticleDOI
01 Jun 2015
TL;DR: This is the first attempt in the literature to implement a Berger check logic using reversible gates and also a Berger Check prediction circuit to check all unidirectional and multiple errors for all reversible Adders/Subtractors.
Abstract: In order to continue the revolution in the computer hardware performance, we need to reduce the energy dissipated in each logic operation. Energy dissipation can be reduced by preventing information loss. This is achieved by designing the circuits using reversible logic gates. It has wider applications in the fields of quantum computing, nanotechnology, and many more. Fault Tolerant logic has become a very important technique in the present day electronics, in order to minimize the errors in the circuit. Fault Tolerant gates with error detection capability, turns down the need for any external hardware to test the circuit. In this paper, we have proposed a reversible Fault Tolerant Adder/Subtractor and also a Berger check prediction circuit to check all unidirectional and multiple errors for all reversible Adders/Subtractors. This is the first attempt in the literature to implement a Berger check logic using reversible gates. The performance parameters of both reversible Berger check logic and the Fault Tolerant reversible logic has been compared and analyzed. Fault Tolerant logic, though it doesn't require any external hardware, the quantum cost is found to be 25% more than Berger check logic and garbage outputs required is 75% more, similarly ancilla inputs is 25%, gate count is 45% and delay of circuit is 30% more than the proposed Berger check circuit. Hence, the Berger check circuit is found to more efficient in terms of quantum cost, delay and gate count.

Journal ArticleDOI
01 Feb 2015
TL;DR: The reuse-oriented Boolean simplification (ROBS) technique is proposed to overcome the intrinsic unbalance computation load between FM0 and Manchester and improve the hardware utilization rate (HUR) from 50 % to 90 %.
Abstract: The Dedicated Short-Range Communication (DSRC) is an emerging standard to push the vehicular communication into modern automotive industry. The DSRC standard generally applies FM0 and Manchester to reach DC-balance enhancing the signal reliability. However, the intrinsic unbalance computation load between FM0 and Manchester makes their VLSI architecture with poor hardware utilization. In this paper, the reuse-oriented Boolean simplification (ROBS) technique is proposed to overcome this problem. The ROBS technique constructs the balance-type architecture to improve the hardware utilization rate (HUR) from 50 % to 90 %. The analysis of how the clock-skew affects the balance-type architecture is also discussed. This work is realized by 0.18um 1P6M CMOS technology with cell-based design flow. The gate count is 25.61, which is normalized to a 2-input NAND gate. The power consumption is [email protected] for FM0 encoding and [email protected] for Manchester encoding. The encoding capability is up to 27 Mbps that can fully support the DSRC standards of America, Europe and Japan.

Proceedings ArticleDOI
12 Dec 2015
TL;DR: A new lightweight stream cipher, SVH, is proposed, based on dual pseudo-random transformation and output feedback, that can achieve sufficient security margin against known attacks, and compares favourably to other hardware oriented stream ciphers like Grain.
Abstract: A new lightweight stream cipher, SVH, is proposed. The design targets hardware environments where gate count, power consumption and memory is very limited. It is based on dual pseudo-random transformation and output feedback. The block of key size is 64 bits and SVH can achieve sufficient security margin against known attacks, such as linear cryptanalysis, differential cryptanalysis, impossible differential cryptanalysis. Hardware implementation of SVH is around 1171GE, which is comparable with the 1458 GE hardware implementation of Grain. The software implementation of SVH on 8-bit microcontroller is about 19.55Mb/s, and its efficiency is 30 times as much as that of Grain in RFID environment. The hardware complexity and throughput compares favourably to other hardware oriented stream ciphers like Grain.

31 Aug 2015
TL;DR: Since there are different cost considerations such as garbage outputs, gate count, quantum cost methods for specific cost reductions may be established.
Abstract: Nowadays, reversible computing is more fascinative research area to curtail power dissipation in comparison of conventional computing. In conventional computing, logic circuit dissipates more power by losing bits of information. Reversible computing recovers from losing bits of information through same number of output vector from same number of input vector and thus decreases the power dissipation. Since there are different cost considerations such as garbage outputs, gate count, quantum cost methods for specific cost reductions may be established.

Proceedings ArticleDOI
19 Mar 2015
TL;DR: This paper proposes a methodology to generate the multiple test patterns varying in single bit position for built-in-self-test (BIST) using Gray counter and Decoder to improve correlation between the subsequent test vectors.
Abstract: This paper proposes a methodology to generate the multiple test patterns varying in single bit position for built-in-self-test (BIST) The traditional patterns which were generated using Linear feedback shift registers lack correlation between consecutive test vectors So, in order to improve correlation between the subsequent test vectors, the patterns were produced using Gray counter and Decoder The Area optimization is achieved by reducing the total number of gate count to implement the design In order to optimize the power, the number of toggles between the subsequent test vectors is curtailed The generated test patterns have an advantage of minimum transition sequence Simulation results on multiplier circuit shows a reduction of 54% in area overhead and 12% in power overhead compared to pattern generation using Reconfigurable Johnson counter and LFSR 100% fault coverage is achieved while generating patterns using gray counter, decoder and accumulator architecture Time coverage is same as time required for generating patterns using existing methodology The methodology for producing the test vectors for BIST is coded using VHDL and simulations were performed with ModelSim 100b The Area utilization and the power report were obtained with the help of Xilinx ISE 91 software

Proceedings ArticleDOI
01 Nov 2015
TL;DR: This paper presents the implementation of DCT and CORDIC on a novel configurable architecture ported onto a state of the art FPGA, which uses only shifts and adds to perform multiplication, thereby reducing the gate count.
Abstract: Discrete Cosine Transform (DCT) operations, used in compression algorithm, have great significance in image and signal processing applications where the cosine computation forms an integral part. The CORDIC (COrdinated Rotation Digital Computer) algorithm provides a simplistic and accurate platform to compute various trigonometric, linear and non-linear functions using only shift-add operations. Due to inherently repetitive nature of DCT and CORDIC function, it yields to efficient hardware implementations. This paper presents the implementation of DCT and CORDIC on a novel configurable architecture ported onto a state of the art FPGA. The proposed architecture uses only shifts and adds to perform multiplication, thereby reducing the gate count. The design takes 192 clock cycles and 336 clock cycles/image block to compute cosine using CORDIC and DCT, respectively. The L2 norm of the hardware reconstructed image is 15.77 at 84.37% compression on a 128×128 image and computes cosine (CORDIC) with accuracy upto 98%.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: It is shown that starting from the simple RTL architecture parameters, e.g., gate count, and using performance figures taken from ITRS, it is able to figure out the (space-time-energy) performance limits of CMOS implementation of any logic architecture.
Abstract: In this paper, effects of expected physical limits of CMOS technology on the performance of small-scale System-on-Chips (SoCs) are described. The exponential progress of CMOS technology has entered to the saturation phase. This could be called, if we like, a third phase of the Moore's law. In this third phase of development, the peak-performance of SoCs is not any more in the main concern. Instead of that, we could see an explosion of creative small-size applications: bionic appendages, smartphones with smart sensors, networks of tiny sensors, and a host of other applications we have yet to imagine. The International Technology Roadmap for Semiconductors (ITRS) lists expected performance parameters for CMOS technology up to the year 2028. In this paper, it is shown that starting from the simple RTL architecture parameters, e.g., gate count, and using performance figures taken from ITRS, we are able to figure out the (space-time-energy) performance limits of CMOS implementation of any logic architecture. As a practical example, the study of performance limits of a novel digital receiver suitable to be used in different capillary networks of Internet-of-Things applications is described, here. Design space exploration technique described in this paper can be used to find out performance limits of wide range of smart object applications.

Proceedings ArticleDOI
01 Jun 2015
TL;DR: BDD based synthesis technique along with evolutionary computation method is explored, demonstrating that this approach reduces the gate count and quantum cost at the cost of increase in the number of lines.
Abstract: Reversible computing is an emerging and promising technique due to its wide applications in quantum, optical and DNA computing and many more. Reversible circuit synthesis is a main focus for researchers as conventional synthesis techniques are not suitable for reversible circuits. Our work focuses on BDD based synthesis as it has capabilities of realizing circuit for large boolean functions unlike other reversible synthesis methods. Existing BDD based synthesis techniques rely on positive [1] and negative [2] controlled Toffoli gates. In this paper work we explore BDD based synthesis technique along with evolutionary computation method. We employed Fredkin and elementary CNOT gate library. Experimental results demonstrate that this approach reduces the gate count and quantum cost at the cost of increase in the number of lines.

Proceedings ArticleDOI
21 Jul 2015
TL;DR: This work designs and implements a distributed video decoder which is majorly composed of low-density parity-check accumulate (LDPCA), correlation noise modeling, soft input computation, and side information creation.
Abstract: Distributed video coding (DVC), based on Slepian-Wolf Theorem and/or Wyner-Ziv Theorem, was proposed to apply for the situation with little encoding and big decoding. We design and implement a distributed video decoder which is majorly composed of low-density parity-check accumulate (LDPCA), correlation noise modeling, soft input computation, and side information creation. Our proposed DVC decoder architecture, implemented in TSMC 90nm GUTM process technology, can meet the requirement of decoding a QCIF video with a speed of 30fps. The maximum operating frequency of the designed chip is 100MHz, the chip area is 4.67 mm2, and the gate count is 690K.

01 Jan 2015
TL;DR: The idea of this project was to create a microprocessor as a building block in VHDL than later easily can be included in a larger design.
Abstract: Coarse Grained Arrays (CGAs) with run-time reconfigurability play an important role in accelerating reconfigurable computing applications. It is challenging to design On-chip Communication Networks (OCNs) for such CGAs with dynamic run-time reconfigurability whilst satisfying the tight budgets of power and area for an embedded system. This project presents a siliconproven design of a circuit-switched OCN fabric with a dynamic path-setup scheme capable of supporting an embedded coarse-grained processor array. The paper involves design of a RISC core processor and simulating it. A Reduced Instruction Set Compiler (RISC) is a microprocessor that had been designed to perform a small set of instructions, with the aim of increasing the overall speed of the processor while executing the instruction along with reduction of area(gate count) and power consumption. The RISC architecture follows the philosophy that one instruction should be performed every cycle. This work presents the design and implementation of a 32 bit RISC soft core processor intended for computer architecture introduction considered to be an effective solution for computer comprehension. The idea of this project was to create a microprocessor as a building block in VHDL than later easily can be included in a larger design. The processor gets the data from the serial device uart which can reduce the transmission cost. It will be useful in systems where a problem is easy to solve in software. However at a high level of complexity it is easier to implement the function along with reduced power consumption and area (gate count). In this project XILINX ISE 12.3i is used for logical verification and further synthesizing.

Posted Content
TL;DR: In this article, the Inventive0 gate is proposed to synthesize efficient adder modules with minimum garbage output and gate count, which can implement a 4-bit ripple carry adder and carry.
Abstract: A large amount of research is currently going on in the field of reversible logic, which have low heat dissipation, low power consumption, which is the main factor to apply reversible in digital VLSI circuit design. This paper introduces reversible gate named as Inventive0 gate. The novel gate is synthesis the efficient adder modules with minimum garbage output and gate count. The Inventive0 gate capable of implementing a 4-bit ripple carry adder and carry skip this http URL is presented that Inventive0 gate is much more efficient and optimized approach as compared to their existing design, in terms of gate count, garbage outputs and constant inputs. In addition, some popular available reversible gates are implemented in the MOS transistor design the implementation kept in mind for minimum MOS transistor count and are completely reversible in behavior more precise forward and backward computation. Lesser architectural complexity show that the novel designs are compact, fast as well as low power.