Topic
Gate count
About: Gate count is a research topic. Over the lifetime, 1020 publications have been published within this topic receiving 13535 citations.
Papers published on a yearly basis
Papers
More filters
•
07 Dec 2020TL;DR: In this paper, a scalable and high-throughput pipeline FFT processor architecture is proposed, where four FFT points are processed every clock cycle to achieve high throughput with reasonable operating frequency.
Abstract: In this paper, we propose a scalable and high-throughput pipeline FFT processor architecture and evaluate its variations. To achieve high-throughput with reasonable operating frequency, the proposed architecture utilizes radix-4, where four points are processed every clock cycle. Like IP core generators, our architecture can be reconfigured by changing the number of FFT stages to support various numbers of FFT points. Our architecture is based on fixed-point arithmetic to relieve the complexity but might be extended to support floating-point implementation to keep high dynamic ranges. The proposed architecture achieves four times the throughput of the operating frequency. For example, 492Msamples/sec of throughput can be achieved when the operating frequency is 123 MHz, which may be a reasonable performance for 5G OFDM implementation. In this case, the gate count of our 4K-point FFT is 443,419, excluding SRAMs for pipeline buffers.
••
01 Oct 2018TL;DR: Simulation results show that, compared with state-of-the-art bit-serial and conventional parallel FFT processors, the proposed technique is superior in terms of silicon area, power consumption, dynamic energy use due to variable precision arithmetic.
Abstract: In this paper, a new approach is proposed for designing ultra-low-power FFT (Fast Fourier Transform) system suitable for use in energy harvesting powered sensors. Bit-serial architecture is adopted to reduce the power consumption of butterfly operation. Simulation results show that, compared with state-of-the-art bit-serial and conventional parallel FFT processors, the proposed technique is superior in terms of silicon area, power consumption, dynamic energy use due to variable precision arithmetic. A sample design of a 64-point FFT shows that the implementation can save about 40% area and 36% leakage power compared with a conventional parallel counterpart, accordingly achieving significant power benefits at a low sample rate and low voltage domain. The dynamic variation of the arithmetic precision can be achieved through a simple modification of the controller with hardware area overhead of 10% gate count.
01 Jan 2012
TL;DR: The comparison between Spartan-3A devices show same numbers for four input LUTs, occupied slices, bonded IOBs, total equivalent gate count but their average connection and maximum pin delays are different.
Abstract: This paper presents Spartan-3A devices; including, XC5S50A (package: tq144, speed grade: -5), XC3S200A (package: ft256, speed grade: -5), XC3S400A (package: Fg400, speed grade: -5), XC3S700A (package: fg484, speed grade: -5) field programmable gate array (FPGA) design and implementation using Very High speed integrated circuit Hardware Description Language (VHDL) based Braun's multipliers. The resources utilization is obtained for 4×4, 6×6, 8×8 and 12×12 Braun's multipliers. The comparison between Spartan-3A devices show same numbers for four input LUTs, occupied slices, bonded IOBs, total equivalent gate count but their average connection and maximum pin delays are different. For average and maximum pin delays all devices show to some extent comparable behaviour.
••
TL;DR: The hardware design of rate control for real-time video encoded is proposed, where a quadratic rate distortion model with high-computational complexity is not used when quantization parameter values are being decided, and average complexity weight values of frames are used to calculate QP.
Abstract: In this paper, the hardware design of rate control for real-time video encoded is proposed In the proposed method, a quadratic rate distortion model with high-computational complexity is not used when quantization parameter values are being decided Instead, for low-computational complexity, average complexity weight values of frames are used to calculate QP For high speed and low computational prediction, the MAD is predicted based on the coded basic unit, using spacial and temporal correlation in sequences The rate control is designed with the hardware for fast QP decision In the proposed method, a quadratic rate distortion model with high-computational complexity is not used when quantization parameter values are being decided Instead, for low-computational complexity, average complexity weight values of frames are used to calculate QP In addition, the rate control is designed with the hardware for fast QP decision The execution cycle and gate count of the proposed architecture were reduced about 65% and 85% respectively compared with those of previous architecture The proposed RC was implemented using Verilog HDL and synthesized with UMC standard cell library The synthesis result shows that the gate count of the architecture is about 191k with 108MHz clock frequency
••
01 Feb 2020TL;DR: The designed architecture is based on a low-complexity algorithm developed to reduce the DMM-1 computational effort and to avoid the use of memory and the reached results surpass the published works in terms of throughput and power dissipation.
Abstract: This paper presents a low-power and high-performance hardware design for the Depth Modeling Mode 1 (DMM-1) of the 3D-High Efficiency Video Coding (3D-HEVC). The designed architecture is based on a low-complexity algorithm developed to reduce the DMM-1 computational effort and to avoid the use of memory. The architecture was described in VHDL, and the ASIC synthesis was performed for the TSMC 40nm technology. The synthesis results showed a gate count of 1,283k gates and a power dissipation of 51.36 mW, when running at 82 MHz. The designed architecture is capable of processing 3D 1080p videos with up to 11 views at 30 frames per second. The reached results surpass the published works in terms of throughput and power dissipation.