scispace - formally typeset
Search or ask a question

Showing papers on "Gate count published in 2002"


Journal ArticleDOI
TL;DR: The key ideas applied to the design of Reed-Solomon (RS) decoder blocks in these devices, especially those for achieving high throughput and reducing complexity and power are presented.
Abstract: Two standard forward error correction (FEC) devices for 10- and 40-Gb/s optical systems are presented. The first FEC device includes RS(255, 239) FEC, BCH(4359, 4320) FEC, and standard compliant framing and performance monitoring functions. It can support a single 10-Gb/s channel or four asynchronous 2.5-Gb/s channels. The second FEC device implements RS(255, 239) FEC at a data rate of 40 Gb/s. This paper presents the key ideas applied to the design of Reed-Solomon (RS) decoder blocks in these devices, especially those for achieving high throughput and reducing complexity and power. Implemented in a 1.5-V, 0.16-/spl mu/m CMOS technology, the RS decoder in the 10-Gb/s, quad 2.5-Gb/s device has a core gate count of 424 K and consumes 343 mW; the 40-Gb/s RS decoder has a core gate count of 364 K and an estimated power consumption of 360 mW. The 40-Gb/s RS FEC is the highest throughput implementation reported to date.

114 citations


Patent
Tao Pi1, Patrick J. Crotty1
10 Sep 2002
TL;DR: In this paper, a low-voltage lookup table (LUT) for a field programmable gate array (FPGA) is designed to operate reliably at low voltage levels.
Abstract: A lookup table (LUT) for a field programmable gate array (FPGA) is designed to operate reliably at low voltage levels. The low-voltage LUT uses CMOS pass gates instead of unpaired N-channel transistors to select one memory cell output as the LUT output signal. Therefore, no voltage drop occurs across the pass gates. While this modification significantly increases the overall gate count of the LUT, this disadvantage can be mitigated by removing the half-latches required in current designs, and by removing initialization circuitry made unnecessary by the modification. Some embodiments include a decoder that decreases the number of pass gates between the memory cells and the output terminal, at the cost of an increased delay on the input paths that traverse the decoder.

108 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: The use of the gated clock approach to reduce power consumption is analyzed and compared and it is worth noting that implementation of the three gated Clock strategies leads also to a design with the smallest gate count.
Abstract: In this paper the use of the gated clock approach to reduce power consumption is analyzed and compared. The approach has been implemented following three different strategies that allow the approach to be efficiently used under different design conditions. To verify the strength of the approach it has been implemented during the design of a programmable interrupt controller (PIC). The results found show a 2/spl times/ factor reduction in the average power consumption through the use of the three strategies. Moreover, the results have been also compared with those obtained through an automatic implementation of one of the gated clock strategies allowed by Synopsys's power compiler. In this second case only about 25% of power consumption is saved. It is worth noting that implementation of the three gated clock strategies leads also to a design with the smallest gate count.

46 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: Two efficient architectures for digit-serial normal basis (DSNB) multipliers over GF(2/sup m/) are presented and are compared with the existing ones in terms of gate and time complexities.
Abstract: In this article, two efficient architectures for digit-serial normal basis (DSNB) multipliers over GF(2/sup m/) are presented. These two structures have the same gate count and time complexity. A straightforward implementation leaves gate redundancy in both of them. An algorithm which can considerably reduce the redundancy is also developed. Moreover, the proposed architectures are compared with the existing ones in terms of gate and time complexities.

35 citations


Proceedings ArticleDOI
07 Nov 2002
TL;DR: VLSI design of a reconfigurable multimode Reed Solomon (RS) codec for various high-speed communication systems suitable for multi-mode systems such as the xDSL and the cable modem systems is presented.
Abstract: This paper presents the VLSI design of a reconfigurable multimode Reed Solomon (RS) codec for various high-speed communication systems. Our decoder design is based on the Euclidean algorithm such that the datapath units are regular and simple. With its ability to support a variety of (n, k, t) RS specifications (0/spl les/t/spl les/8) and (0/spl les/n/spl les/255), this RS codec design is suitable for multi-mode systems such as the xDSL and the cable modem systems. The chip operates at a clock frequency of 100 MHz and has a data processing rate of 800 Mbits/s in 0.35 /spl mu/m CMOS technology at the supply voltage of 3.3 V. The total gate count is 34,647 gates and the core size is only 1,578 /spl times/ 1,560 /spl mu/m/sup 2/.

24 citations


Patent
18 Oct 2002
TL;DR: A modular Galois-field subfield-power integrated inverter-multiplier circuit that can be used to perform Galois field division over GF(245) is described in this article.
Abstract: A modular Galois-field subfield-power integrated inverter-multiplier circuit that may be used to perform Galois-field division over GF(245). The integrated inverter-multiplier circuit combines subfield-power and parallel multiplication and inversion operations performed therein. The circuit is modular, has a relatively low gate count, and is easily pipelined because it does not use random logic. The circuit implements mathematical calculations known as “Galois-field arithmetic” that are required for a variety of digital signaling and processing applications such as Reed-Solomon and Bose-Chaudhuri-Hochquenghem (BCH) error-correction coding systems. Galois-field division is particularly difficult, typically requiring either a great deal of time or highly complex circuits, or both. The circuit uses a unique combination of subfield and power inversion techniques to carry out multiplicative inversion. Furthermore, the circuit uniquely implements Galois-field division by carrying out the multiplicative inversion and the multiplication simultaneously and in parallel. This substantially increases computation speed. The modularity and pipelineability of the present invention also make system design easier and increases the speed and reduces the gate count of an integrated circuit embodying the inverter-multiplier circuit.

17 citations


Book ChapterDOI
28 Nov 2002
TL;DR: In this paper, the authors describe the efficient implementation of Maximum Distance Separable (MDS) mappings and Substitution-boxes (S-boxes) in gate-level hardware for application to Substitution Permutation Network (SPN) block cipher design.
Abstract: This paper describes the efficient implementation of Maximum Distance Separable (MDS) mappings and Substitution-boxes (S-boxes) in gate-level hardware for application to Substitution-Permutation Network (SPN) block cipher design Different implementations of parameterized MDS mappings and S-boxes are evaluated using gate count as the space complexity measure and gate levels traversed as the time complexity measure On this basis, a method to optimize MDS codes for hardware is introduced by considering the complexity analysis of bit parallel multipliers We also provide a general architecture to implement any invertible S-box which has low space and time complexities As an example, two efficient implementations of Rijndael, the Advanced Encryption Standard (AES), are considered to examine the different tradeoffs between speed and time

15 citations


Patent
13 May 2002
TL;DR: In this paper, the Smith-Waterman algorithm is used for high-speed computerized comparison analysis of two or more linear symbol or character sequences, such as biological nucleic acid sequences, protein sequences, or other long linear arrays of characters.
Abstract: Improved processors and processing methods are disclosed for high-speed computerized comparison analysis of two or more linear symbol or character sequences, such as biological nucleic acid sequences, protein sequences, or other long linear arrays of characters. These improved processors and processing methods, which are suitable for use with recursive analytical techniques such as the Smith-Waterman algorithm, and the like, are optimized for minimum gate count and maximum clock cycle computing efficiency. This is done by interleaving multiple linear sequence comparison operations per processor, which optimizes use of the processor's resources. In use, a plurality of such processors are embedded in high-density integrated circuit chips, and run synchronously to efficiently analyze long sequences. Such processor designs and methods exceed the performance of currently available designs, and facilitate higher dimensional sequence comparison analysis between three or more linear sequences.

11 citations


Proceedings ArticleDOI
12 Aug 2002
TL;DR: This work proposes a method to do area estimation that makes use of the concept of Boolean networks and introduces an invariant area complexity measure which captures the gate-count requirement of a design.
Abstract: Early power estimation requires one to estimate the area (gate count) of a design from a high-level description. We propose a method to do this that makes use of the concept of Boolean networks (BN) and introduces an invariant area complexity measure which captures the gate-count requirement of a design. The method can be adapted to be used at different points on the area/delay tradeoff curve, with different synthesizer/mapper tools, and different target gate libraries. The area model is experimentally verified and tested using a number of ISCAS and MCNC benchmark circuits and two different target cell libraries, on two different synthesis systems.

11 citations


Proceedings Article
01 Dec 2002
TL;DR: A high speed Reed-Solomon decoder chip for optical communications that features a high speed and area-efficient key equation solver using a novel inversionless decomposed architecture for Euclidean algorithm is presented.
Abstract: In this paper, a high speed Reed-Solomon (RS) decoder chip for optical communications is presented. It mainly contains one (255,239) RS decoder with 4K-bit embedded memory. Due to the operation speed limitation in I/O pad, a Delay Lock Loop (DLL) circuit is also included to generate internal high-speed clock. The RS decoder features a high speed and area-efficient key equation solver using a novel inversionless decomposed architecture for Euclidean algorithm. The test chip is implemented by 0.35µm CMOS SPQM standard cells with chip area of 2.61mm × 2.62mm. The RS decoder has the gate count of 12.4K. Test results show the proposed chip can support 2.35-Gbps data rate while operating at 294MHz with the supply voltage of 3.3V.

8 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: This paper proposes an embedded DSP core for communication applications with targets of demodulation/synchronization operation that contains distinguish instructions, and special function blocks like dual MAC, sub-word multiplier, dedicated FIR filter and multi-levels slicer.
Abstract: This paper proposes an embedded DSP core for communication applications with targets of demodulation/synchronization operation. Besides providing a basic instruction set, similar to current day 16-bit DSP processors, it contains distinguish instructions, and special function blocks like dual MAC, sub-word multiplier, dedicated FIR filter and multi-levels slicer, which make this DSP processor more efficient for several communication tasks. Also, the entire architecture is parameterized such that it can be embedded in a variety of applications. In the design of the chip, we adapt gray coded addressing for lowering switching activity, pipeline register sharing for reducing pipeline register and the entire architecture is used to reduce power dissipation. The DSP chip is implemented by synthesizable Verilog code with TSMC 0.35 /spl mu/m SPQM cell library. The equivalent gate count of the core without memory is about 50 k. The chip area is 4.10 mm*4.10 mm (with on chip memory).

Proceedings ArticleDOI
07 Aug 2002
TL;DR: The synthesis results show that the MME module is capable of reducing the execution cycle counts by 69% to 81% for several multimedia critical loops, with an average speedup of 3.41, while the gate count overhead is kept under 11%.
Abstract: This paper presents a cost-effective multimedia extension to the ARM7 architecture (v4T), a low-cost embedded microprocessor. A multimedia extension (MME) module has been constructed as a functional unit for the ARM7 processor core. It provides common media operations such as saturating addition and multiply-accumulation, etc. All operations of the MME module perform subword parallelism on byte or half-word entities. The SIMD-styled operations allow a higher processing throughput. The module has been successfully integrated with a synthesizable ARM7 core. The synthesis results show that the MME module is capable of reducing the execution cycle counts by 69% to 81% for several multimedia critical loops, with an average speedup of 3.41, while the gate count overhead is kept under 11%.

Journal Article
TL;DR: Simulation results show that the proposed ADC BIST scheme can detect not only catastrophic faults but also some parametric faults and the total gate count of the proposed BIST circuit is about 150.
Abstract: As integrated circuit fabrication techniques advance, a complex system can be integrated on a single chip: namely, a system-on-a-chip (SOC). A SOC consists of many intellectual property (IP) building blocks, including analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) which should provide certain built-in self-test (BIST) scheme to minimize the testing cost. Due to the analog nature of ADCs and DACs, digital BIST schemes are not applicable. This paper proposes a simple ADC BIST scheme based on a ramp test. The proposed BIST scheme is veried by simulation with a 6-bit pipelined ADC. Simulation results show that the proposed ADC BIST scheme can detect not only catastrophic faults but also some parametric faults. The total gate count of the proposed BIST circuit is about 150.

Patent
17 Jul 2002
TL;DR: In this paper, a FIR filter in a Gigabit transceiver in which data words are represented in three bits: SIGN representing word sign, SHIFT representing requirement for a shift operation, and ZERO indicating whether the word is zero.
Abstract: A FIR filter in a Gigabit transceiver in which data words are represented in three bits: SIGN representing word sign, SHIFT representing requirement for a shift operation, and ZERO indicating whether the word is zero. An AND gate ANDs an input coefficient and the ZERO bit, an XOR gate XORs the SIGN bit and the output of the AND gate, and a multiplier left-shifts the coefficient using the SHIFT bit and the output of the XOR gate. The circuit has a very low gate count.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: Adaptive sample rate converters for the multi-standard mobile transceiver have been designed and the complexity analysis includes computational complexity, gate count, area and power consumption estimation for the chosen platform.
Abstract: Adaptive sample rate converters (SRC) for the multi-standard mobile transceiver have been designed. GSM, UMTS and HIPERLAN2 standards have been chosen to establish the requirements for the SRCs. In the transmitter, the output signal of the data modulator is sampled at the symbol or chip frequency specified by the standards and needs to be converted into a common updating frequency of the D/A-converter. Pulse shaping and emission mask requirements are taken into account. In the receiver, A/D-conversion is also performed at a common fixed clock rate. Thus the SRC is needed to convert the A/D-converter sampling rate into a symbol or chip frequency that will be applied for further baseband signal processing. Channelization requirements defined by the standards are used as criteria for the performance analysis in the receiver. Based on the performance analysis, the complexity analysis is covered. It includes computational complexity, gate count, area and power consumption estimation for the chosen platform.

Proceedings ArticleDOI
12 Aug 2002
TL;DR: In this article, an invariant area complexity measure is proposed to capture the gate-count requirement of a design, which can be used at different points on the area/delay tradeoff curve, with different synthesizer/mapper tools, and different target gate libraries.
Abstract: Early power estimation requires one to estimate the area (gate count) of a design from a high-level description. We propose a method to do this that makes use of the concept of Boolean networks (BN) and introduces an invariant area complexity measure which captures the gate-count requirement of a design. The method can be adapted to be used at different points on the area/delay tradeoff curve, with different synthesizer/mapper tools, and different target gate libraries. The area model is experimentally verified and tested using a number of ISCAS and MCNC benchmark circuits and two different target cell libraries, on two different synthesis systems.

Proceedings ArticleDOI
A. Peczalski1
09 Mar 2002
TL;DR: In this paper, the authors proposed a mixed mode combination of RF, analog and digital circuits on the same chip for military and aerospace systems, which can only be achieved by employing mixed mode combinations of RF and analog circuits.
Abstract: Increasing numbers of military and aerospace systems require miniaturization and low power which can only be achieved by employing mixed mode combinations of RF, analog and digital circuits on the same chip. GPS receivers are an excellent example of the subsystem required to fit in small munitions or the soldier watch. At the same time, the requirements for jamming and spoofing resistance and encryption decoding increase the gate count to 20-30 Mgates. Similar in complexity the analog section may require 16-bit analog to digital converter for digital beam forming. Demanding RF front-end performance section includes the low noise amplifiers with noise figure below 2 dB and with associated gain of 30-40 dB. Such extreme mixed mode requirements can be only met with specialized technology like Silicon-on-Insulator (SOI) on high resistivity substrate. Such substrate provides excellent isolation and 10 dB lower noise than in bulk CMOS.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A new effective silicon implementation of a Reed Solomon engine is presented and results in terms of gate count, throughput and latency show a competitive advantage when compared to existing Reed Solomon engines, as well as allowing a wider programmability.
Abstract: A new effective silicon implementation of a Reed Solomon engine is presented. By a further optimization of the modified Berlekamp Massey algorithm presented by Jeng and Troung (1999), the number of Galois Field (GF) multipliers involved in the calculation of the errata locator polynomial can be shown to be a linear function of the number of parity symbols. The use of a circular structure in the calculation of the discrepancy makes the calculation itself independent of the number of iterations involved in the algorithm. New variables are introduced in the error magnitude calculation in order to use hardware resources already present, thus minimizing the number of logic gates. Along with the codeword length and the number of parity bytes, programmability involves GF primitive polynomials and the code generator polynomial. Results in terms of gate count, throughput and latency show a competitive advantage when compared to existing Reed Solomon engines, as well as allowing a wider programmability.