scispace - formally typeset
Search or ask a question
Topic

Gate count

About: Gate count is a research topic. Over the lifetime, 1020 publications have been published within this topic receiving 13535 citations.


Papers
More filters
Posted ContentDOI
14 Feb 2023
TL;DR: In this article , the authors propose an algorithm for synthesizing a sequence of $pi/4$ Pauli rotations with a minimal number of Hadamard gates, based on this result, they present an algorithm which optimally minimizes the number of HGs lying between the first and the last HGs.
Abstract: The Clifford$+T$ gate set is commonly used to perform universal quantum computation. In such setup the $T$ gate is typically much more expensive to implement in a fault-tolerant way than Clifford gates. To improve the feasibility of fault-tolerant quantum computing it is then crucial to minimize the number of $T$ gates. Many algorithms, yielding effective results, have been designed to address this problem. It has been demonstrated that performing a pre-processing step consisting of reducing the number of Hadamard gates in the circuit can help to exploit the full potential of these algorithms and thereby lead to a substantial $T$-count reduction. Moreover, minimizing the number of Hadamard gates also restrains the number of additional qubits and operations resulting from the gadgetization of Hadamard gates, a procedure used by some compilers to further reduce the number of $T$ gates. In this work we tackle the Hadamard gate reduction problem, and propose an algorithm for synthesizing a sequence of $\pi/4$ Pauli rotations with a minimal number of Hadamard gates. Based on this result, we present an algorithm which optimally minimizes the number of Hadamard gates lying between the first and the last $T$ gate of the circuit.
Journal ArticleDOI
TL;DR: An optimized residual data decoder architecture is proposed to improve the performance in H.264/AVC and the execution cycle of the proposed architecture is about 88.5% less than that of the existing designs.
Abstract: In this paper, an optimized residual data decoder architecture is proposed to improve the performance in H.264/AVC. The proposed architecture is an integrated architecture that combined parallel inverse transform architecture and parallel inverse quantization architecture with common operation units applied new inverse quantization equations. The equations without division operation can reduce execution time and quantity of operation for inverse quantization process. The common operation unit uses multiplier and left shifter for the equations. The inverse quantization architecture with four common operation units can reduce execution cycle of inverse quantization to one cycle. The inverse transform architecture consists of eight inverse transform operation units. Therefore, the architecture can reduce the execution cycle of inverse transform to one cycle. Because inverse quantization operation and inverse transform operation are concurrency, the execution cycle of inverse transform and inverse quantization operation for one block is one cycle. The proposed architecture is synthesized using Magnachip 0.18um CMOS technology. The gate count and the critical path delay of the architecture are 21.9k and 5.5ns, respectively. The throughput of the architecture can achieve 2.89Gpixels/sec at the maximum clock frequency of 181MHz. As the result of measuring the performance of the proposed architecture using the extracted data from JM 9.4, the execution cycle of the proposed architecture is about 88.5% less than that of the existing designs.
Journal ArticleDOI
TL;DR: The authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained.
Abstract: Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18µm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6MHz operating frequency at a cost of 84.1k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5k-gate.
Book ChapterDOI
01 Jan 2020
TL;DR: A PFSCL gate is a single-level topology of source-coupled transistor pairs where a parallel combination of N-input transistors is coupled to a single transistor which is being driven by the gate output as a feedback connection as discussed by the authors.
Abstract: A PFSCL gate is a single-level topology of source-coupled transistor pairs wherein a parallel combination of N-input transistors is coupled to a single transistor which is being driven by the gate output as a feedback connection.
01 Jul 2008
TL;DR: An efficient culling scheme for low power 3D graphics processors that consists of the selection and back-face culling in the geometry engine and the elimination of pixels outside in the rasterizer engine is proposed.
Abstract: Recently, portable devices employ applications using 3D graphics such as 3D games and 3D navigations. The portable devices require small area and low power consumption. We propose an efficient culling scheme for low power 3D graphics processors. The proposed culling scheme consists of the selection and back-face culling in the geometry engine and the elimination of pixels outside in the rasterizer engine. The new scheme reduced both the hardware complexity and the number of operation cycles of culling operations. We design a 3D graphic pipeline using Verilog-HDL according to the proposed scheme, and verify it on an FPGA prototyping board. The latency of the proposed architecture is reduced by 15 cycles and the gate count of the synthesized result is reduced by 8%.

Network Information
Related Topics (5)
CMOS
81.3K papers, 1.1M citations
84% related
Electronic circuit
114.2K papers, 971.5K citations
81% related
Integrated circuit
82.7K papers, 1M citations
80% related
Transistor
138K papers, 1.4M citations
79% related
Decoding methods
65.7K papers, 900K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20236
202219
202151
202047
201938
201847