scispace - formally typeset
Search or ask a question
Topic

Gate count

About: Gate count is a research topic. Over the lifetime, 1020 publications have been published within this topic receiving 13535 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A new VLSI architecture design for an H.264/AVC CABAC decoder is presented, which optimizes both decode decision and decode bypass engines for high throughput, and improves context model allocation for efficient external memory access.
Abstract: Context-based adaptive binary arithmetic coding (CABAC) is the major entropy-coding algorithm employed in H.264/AVC. In this paper, we present a new VLSI architecture design for an H.264/AVC CABAC decoder, which optimizes both decode decision and decode bypass engines for high throughput, and improves context model allocation for efficient external memory access. Based on the fact that the most possible symbol (MPS) branch is much simpler than the least possible symbol (LPS) branch, a newly organized decode decision engine consisting of two serially concatenated MPS branches and one LPS branch is proposed to achieve better parallelism at lower timing path cost. A look-ahead context index (ctxIdx) calculation mechanism is designed to provide the context model for the second MPS branch. A head-zero detector is proposed to improve the performance of the decode bypass engine according to UEGk encoding features. In addition, to lower the frequency of memory access, we reorganize the context models in external memory and use three circular buffers to cache the context models, neighboring information, and bit stream, respectively. A pre-fetching mechanism with a prediction scheme is adopted to load the corresponding content to a circular buffer to hide external memory latency. Experimental results show that our design can operate at 250 MHz with a 20.71k gate count in SMIC18 silicon technology, and that it achieves an average data decoding rate of 1.5 bins/cycle.

1 citations

Proceedings ArticleDOI
01 Dec 2006
TL;DR: The implementation result shows that the proposed subword parallel architecture can save 48% of hardware cost in terms of gate count for morphological operations and save even more than 99% ofHardware cost for CCL when both compared with other works, but the programmability and processing speed are remained as well as them.
Abstract: Connected component labeling (CCL) and morphological operations are two widely used techniques in vision automation and pattern analysis. There are several hardware architectures proposed for these two operations in literature. However, most of them have two drawbacks: hardware cost inefficient and poor bus bandwidth utilization. This paper applies subword level parallelism on the design of hardware architectures for these two techniques. The implementation result shows that the proposed subword parallel architecture can save 48% of hardware cost in terms of gate count for morphological operations and save even more than 99% of hardware cost in terms of gate count for CCL when both compared with other works, but the programmability and processing speed are remained as well as them. Besides, this architecture is also better in bus bandwidth utilization when compared with other works.

1 citations

Proceedings ArticleDOI
03 Aug 2010
TL;DR: In this paper, a fully digital frequency-tracking clock and data recovery circuit is presented, implemented in a commercial USB 2.0 transceiver platform, and adopts a new clock phase selection feedback approach rather than the conventional feed-forward blind oversampling methodology, allowing for frequency- tracking capability.
Abstract: In this paper, a fully digital frequency-tracking clock and data recovery circuit is presented. The circuit is implemented in a commercial USB 2.0 transceiver platform, and adopts a new clock phase selection feedback approach rather than the conventional feed-forward blind oversampling methodology. This allows for frequency-tracking capability. In spite of the feedback loop used, the circuit is implemented in an entirely digital design flow, allowing for a minimized gate count and power consumption. Formal analysis of the circuit paths allows for ease-of-reuse and retargeting, and is also presented in detail.

1 citations

Proceedings ArticleDOI
09 Jul 2007
TL;DR: An efficient SIMD architecture with parallel memory for 2D cosine transforms of multiple video standards and application specific instructions are presented to accelerate the transform kernels, such as butterfly and rotate operations with scaling, rounding and clipping.
Abstract: This paper proposes an efficient SIMD architecture with parallel memory for 2D cosine transforms of multiple video standards. A novel parallel memory scheme is employed to provide conflict-free parallel access in both horizontal and vertical directions with the successive or even/odd mode, as well as to eliminate data permutation and matrix transposition. Furthermore, application specific instructions are presented to accelerate the transform kernels, such as butterfly and rotate operations with scaling, rounding and clipping. The simulation results show that proposed architecture achieves significant performance improvement with low hardware cost of 3.2 K equivalent gate count for parallel memory subsystem (not including SRAMs) and 19.8 K for arithmetic units@250 MHz in 0.18 mum process.

1 citations

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, the authors propose a highly parallel two-dimensional (2D) HEVC transform hardware architecture, implemented in 32-nm VLSI technology, which allows very high-resolution and frame-rate video coding by way of a very fast transform operations.
Abstract: This paper proposes a highly parallel two-dimensional (2D) HEVC transform hardware architecture, implemented in 32-nm VLSI technology. The design allows very high-resolution and frame-rate video coding by way of a very fast HEVC transform operations. It is based on a split architecture, where the individual transform type and size is separated into its own core, therefore enables pixel-level parallelism in the 2D parallel and folded structures. This work also implements the full specification of the HEVC transform for both the DCT and DST transforms, with performance, power, and area analyses for the two structures. Results show very significant speed up over existing unified architectures, with only a relatively modest increase in total gate count. The design is suitable for applications that require very high video resolution and frame rate.

Network Information
Related Topics (5)
CMOS
81.3K papers, 1.1M citations
84% related
Electronic circuit
114.2K papers, 971.5K citations
81% related
Integrated circuit
82.7K papers, 1M citations
80% related
Transistor
138K papers, 1.4M citations
79% related
Decoding methods
65.7K papers, 900K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20236
202219
202151
202047
201938
201847