scispace - formally typeset
Search or ask a question

Showing papers on "Gate count published in 2000"


Journal ArticleDOI
TL;DR: This design combines the techniques of fast direct two-dimensional DCT algorithm, the bit level adder-based distributed arithmetic, and common subexpression sharing to reduce the hardware cost and enhance the computing speed.
Abstract: This paper presents a cost-effective processor core design that features the simplest hardware and is suitable for discrete cosine transform/indiscrete cosine transform (DCT/IDCT) operations in H.263 and digital camera. This design combines the techniques of fast direct two-dimensional DCT algorithm, the bit level adder-based distributed arithmetic, and common subexpression sharing to reduce the hardware cost and enhance the computing speed. The resulting architecture is very simple and regular such that it can be easily scaled for higher throughput rate requirements. The DCT design has been implemented by 0.6 /spl mu/m SPDM CMOS technology and only costs 1493 gate count, or 0.78 mm/sup 2/. The proposed design can meet real-time DCT/IDCT requirements of the H.263 codec system for QCIF image frame size at 10 frames/s with 4:2:0 color format. Moreover, the proposed design still possesses additional computing power for other operations when operating at 33 MHz.

87 citations


Posted Content
TL;DR: This paper describes various techniques to reduce the number of logic gates needed to implement the DES S-boxes in bitslice software, and achieves an improvement over the previous best result.
Abstract: This paper describes various techniques to reduce the number of logic gates needed to implement the DES S-boxes in bitslice software. Using standard logic gates, an average of 56 gates per S-box was achieved, while an average of 51 was produced when non-standard gates were utilized. This is an improvement over the previous best result, which used an average of 61 non-standard gates.

38 citations


Patent
Paul L. Opsahl1
22 Feb 2000
TL;DR: In this article, a polynomial expansion of the z-transform characterization of an nth order differentiator component's output is used to implement a differentiator having reduced gates, where the connection of the inputs and outputs is dependent on the expansion.
Abstract: A polynomial expansion of the z-transform characterization of an n'th order differentiator component's output is utilized to implement a differentiator having reduced gates. The differentiator component comprises at least one adder and a plurality of latches, both having inputs and outputs. The connection of the inputs and outputs is dependent on a polynomial expansion of the z-transform characterization of the differentiator components output. A method of reducing gates in an Nth order differentiator component includes characterizing the differentiator component's output by a z-transform. A polynomial expansion of the z-transform characterization is used to implement a differentiator. A differentiator that is implemented based on a polynomial expansion utilizes fewer gates to achieve the same mathematical function.

13 citations


Journal ArticleDOI
TL;DR: An automated design validation scheme for gate-level combinational and sequential circuits that borrows methods from simulation and test generation for physical faults, and verifies a circuit with respect to a modeled set of design errors is investigated.
Abstract: We investigate an automated design validation scheme for gate-level combinational and sequential circuits that borrows methods from simulation and test generation for physical faults, and verifies a circuit with respect to a modeled set of design errors. The error models used in prior research are examined and reduced to five types: gate substitution errors (GSEs), gate count errors (GCEs), input count errors (ICEs), wrong input errors (WIEs), and latch count errors (LCEs). Conditions are derived for a gate to be testable for GSEs, which lead to small, complete test sets for GSEss near-minimal test sets are also derived for GCEs. We analyze undetectability in design errors and relate it to single stuck-line (SSL) redundancy. We show how to map all the foregoing error types into SSL faults, and describe an extensive set of experiments to evaluate the proposed method. These experiments demonstrate that high coverage of the modeled errors can be achieved with small test sets obtained with standard test generation and simulation tools for physical faults.

12 citations


Dissertation
01 Jan 2000
TL;DR: The purpose of this thesis is to develop this framework for orientation and photogrammetry systems for FPGAs by tailoring the algorithms, architectures, and precisions to fit into an FPGA.
Abstract: There is great demand today for real-time computer vision systems, with applications including image enhancement, target detection and surveillance, autonomous navigation, and scene reconstruction. These operations generally require extensive computing power; when multiple conventional processors and custom gate arrays are inappropriate, due to either excessive cost or risk, a class of devices known as Field-Programmable Gate Arrays (FPGAs) can be employed. FPGAs offer the flexibility of a programmable solution and nearly the performance of a custom gate array. When implementing a custom algorithm in an FPGA, one must be more efficient than with a gate array technology. By tailoring the algorithms, architectures, and precisions, the gate count of an algorithm may be sufficiently reduced to fit into an FPGA. The challenge is to perform this customization of the algorithm, while still maintaining the required performance. The techniques required to perform algorithmic optimization for FPGAs are scattered across many fields; what is currently lacking is a framework for utilizing all these well known and developing techniques. The purpose of this thesis is to develop this framework for orientation and photogrammetry systems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

10 citations


Journal ArticleDOI
TL;DR: A new very large scale integration architecture for cost-effective morphological filters and its design and chip implementation is presented and it can reduce the hardware cost by using a feedback loop path and a decoder/encoder pair comparator.
Abstract: This paper proposes a new very large scale integration architecture for cost-effective morphological filters and presents its design and chip implementation. The proposed architecture can reduce the hardware cost by using a feedback loop path and a decoder/encoder pair comparator. The feedback loop path can reuse partial results to reduce the number of add/subtract units. The decoder/encoder pair comparator using a modified decoding function can reduce the gate count and propagation delay especially when the size of morphological operations increases. We used the 0.8-/spl mu/m SOG cell library (KG60 K) and the total number of gates is only 2667. The proposed morphological filter chip has actually been fabricated and is running at 30 MHz that meets the real-time image processing requirement of the ITU-R BT.601 standard.

8 citations


Journal ArticleDOI
01 Nov 2000
TL;DR: The authors describe the application of Dong's Code to the implementation of a checkbit prediction scheme for concurrent error detection (CED) in VLSI processors, and shows a reduction in the gate count required for checkbit Prediction hardware for the ALU of 27 and 11%, respectively.
Abstract: The authors describe the application of Dong's Code to the implementation of a checkbit prediction scheme for concurrent error detection (CED) in VLSI processors. Checkbit prediction is the only method which will permit the detection of both data transfer and data processing errors. Dong's Code has the advantage that its error detection capability is a function of the number of checkbits used, independent of the number of databits being processed; that is the error detection capability of code can be made to be application specific. The applicability of the scheme for implementing a `CED' test strategy in VLSI circuits is demonstrated by integrating this test method into a 32 bit RISC processor. The impact of the test scheme on the design is subsequently analysed in terms of area overheads and effect on performance. A comparison is made with two self-testing ALUs, one using Berger Code and the other Bose-Lin Code; Dong's Code shows a reduction in the gate count required for checkbit prediction hardware for the ALU of 27 and 11%, respectively. When Dong's Code was used for CED in the 32 bit RISC Processor, the area overhead incurred amounted to 55.5%, which is much less than duplication.

6 citations


Patent
31 Jan 2000
TL;DR: In this article, a block-oriented pixel filtering method was proposed to reduce the number of hardware multipliers required for an image processing operation by increasing the speed of the pixel filter and rearranging the math operations.
Abstract: A method and apparatus for block-oriented pixel filtering reduces the number of hardware multipliers required for an image processing operation by increasing the speed of the pixel filter and rearranging the math operations. A sorter is employed in the line buffers so that defined groups of input pixel components are provided to the multipliers of the pixel filter. An accumulator is employed to receive products from the multipliers and assemble output pixels. The savings in gate count from reducing the number of multipliers is greater than additional costs, if any, of the sorter and other logic. The method and apparatus of the invention also simplify the addressing logic for the provision of scaling coefficients during an image processing operation.

4 citations


Proceedings ArticleDOI
01 Dec 2000
TL;DR: The Reed-Solomon decoder features an area-efficient key equation solver using a novel decomposed Euclidean algorithm that can run up to 87MHz.
Abstract: A (204, 188) Reed-Solomon decoder for DVB application is presented. The RS decoder features an area-efficient key equation solver using a novel decomposed Euclidean algorithm. We implement the RS decoder using 0.35/spl mu/m CMOS IP4M standard cells, where the total gate count is about 16K/spl sim/17K. Test results show that the RS decoder chip can run up to 87MHz.

2 citations


Proceedings ArticleDOI
01 Jan 2000
TL;DR: A survey of tools and techniques used for verification of complex multi-million gate application specific integrated circuits (ASICs) that are used for designing communications networks such as switch/routers is presented.
Abstract: In this paper, we present a survey of tools and techniques used for verification of complex multi-million gate application specific integrated circuits (ASICs) We are particularly interested in the verification of ASICs that are used for designing communications networks such as switch/routers With tremendous increase in gate count, verification of ASICs has become a major challenge and one of the greatest design concerns A new VLSI design methodology that takes into account verification issues in the early phase of design and utilizes state-of-the-art techniques has become a necessity

2 citations


Book ChapterDOI
01 Jan 2000
TL;DR: The HTMT system concept and the requirements it places on the cryogenic processing unit, including the feasibility and path to high gate count, high clock rate SCE chips at an integration level of >100 kgates/cm2 for the processors, cryogenic RAM, and inter-processor network are addressed.
Abstract: A Petaflops computer represents a thousand-fold improvement over today’s largest massively-parallel-processor machines, which are susceptible to fundamental time-of-flight and power dissipation limits. Ultra-low power and ultra-high speed single-flux-quantum electronics is an enabling technology solution for near-term petaflops computing. The proposed Hybrid Technology Multi-Threaded (HTMT) petaflops-scale computer architecture includes thousands of superconductor computational modules operating at 100 GHz with an I/O throughput of 40 Petabit/s. This presents challenges in integration level of superconductor ICs, RAM size and access time, chip- to-chip and out-of-dewar I/O, modular packaging, power supply, and power dissipation. The HTMT system concept and the requirements it places on the cryogenic processing unit are described. The feasibility and path to high gate count, high clock rate SCE chips at an integration level of >100 kgates/cm2 for the processors, cryogenic RAM, and inter-processor network are addressed. Compact, optimized system-level packaging is necessary to achieve the computational density and interconnect bandwidth. Modular packaging and automated circuit testing are required manufacturing costs. Critical technology challenges that exist for packaging, testing, and the I/O datalink are discussed.

Proceedings ArticleDOI
01 Jan 2000
TL;DR: In this paper, the feasibility of bi-directional gating in a semiconductor optical amplifier was investigated and two 10 Gb/s signals were simultaneously gated in an SOA with less than 2.5 dB of penalty.
Abstract: Increase in optical transmission capacity pushes the throughput requirements of optical cross-connects to be used in all-optical networks. A method, based on the connection symmetry of the network, which reduces the gate count in an OXC has previously been proposed. In order to obtain the gate reduction it is required that the switching elements in the OXC can operate in a bi-directional configuration. Semiconductor optical amplifiers are key elements for fast loss-less optical switching systems. In this paper we report the feasibility of bi-directional gating in a semiconductor optical amplifier. Two 10 Gb/s signals were simultaneously gated bi-directionally in an SOA with less than 2.5 dB of penalty.