scispace - formally typeset
Search or ask a question

Showing papers in "Vlsi Design in 2016"


Journal ArticleDOI
TL;DR: A Shared Reed-Muller Decision Diagram SRMDD based on fixed polarity AND-XOR decomposition to represent multioutput Boolean functions and a proposed thermal-aware synthesis has been validated by obtaining absolute temperature of the synthesized circuits using HotSpot tool.
Abstract: The increased number of complex functional units exerts high power-density within a very-large-scale integration VLSI chip which results in overheating. Power-densities directly converge into temperature which reduces the yield of the circuit. An adverse effect of power-density reduction is the increase in area. So, there is a trade-off between area and power-density. In this paper, we introduce a Shared Reed-Muller Decision Diagram SRMDD based on fixed polarity AND-XOR decomposition to represent multioutput Boolean functions. By recursively applying transformations and reductions, we obtained a compact SRMDD. A heuristic based on Genetic Algorithm GA increases the sharing of product terms by judicious choice of polarity of input variables in SRMDD expansion and a suitable area and power-density trade-off has been enumerated. This is the first effort ever to incorporate the power-density as a measure of temperature estimation in AND-XOR expansion process. The results of logic synthesis are incorporated with physical design in CADENCE digital synthesis tool to obtain the floor-plan silicon area and power profile. The proposed thermal-aware synthesis has been validated by obtaining absolute temperature of the synthesized circuits using HotSpot tool. We have experimented with 29 benchmark circuits. The minimized AND-XOR circuit realization shows average savings up to 15.23% improvement in silicon area and up to 17.02% improvement in temperature over the sum-of-product SOP based logic minimization.

11 citations


Journal ArticleDOI
TL;DR: Comparison analysis of different types of adders in Synopsis Design Compiler using different standard cell libraries at 32/28 nm for derived architectures for carry-select adder (CSA), Common Boolean Logic (CBL) based adders, ripple carry adder, and Carry Look-Ahead Adder for 8-, 16-, 32-, and 64-bit length.
Abstract: Addition usually affects the overall performance of digital systems and an arithmetic function. Adders are most widely used in applications like multipliers, DSP (i.e., FFT, FIR, and IIR). In digital adders, the speed of addition is constrained by the time required to propagate a carry through the adder. Various techniques have been proposed to design fast adders. We have derived architectures for carry-select adder (CSA), Common Boolean Logic (CBL) based adders, ripple carry adder (RCA), and Carry Look-Ahead Adder (CLA) for 8-, 16-, 32-, and 64-bit length. In this work we have done comparative analysis of different types of adders in Synopsis Design Compiler using different standard cell libraries at 32/28 nm. Also, the designs are analyzed for the stuck at faults (s-a-0, s-a-1) using Synopsis TetraMAX.

11 citations


Journal ArticleDOI
TL;DR: A novel peak-statistical algorithm and judgment logic PSJ for multifrequency signal application of Autogain Control Loop AGC in hearing aid SoC and the low-power circuit topology and noise-optimizing technique are adopted to improve the signal-to-noise ratio SNR.
Abstract: A novel peak-statistical algorithm and judgment logic PSJ for multifrequency signal application of Autogain Control Loop AGC in hearing aid SoC is proposed in this paper. Under a condition of multifrequency signal, it tracks the amplitude change and makes statistical data of them. Finally, the judgment is decided and the circuit gain is controlled precisely. The AGC circuit is implemented with 0.13 μm 1P8M CMOS mixed-signal technology. Meanwhile, the low-power circuit topology and noise-optimizing technique are adopted to improve the signal-to-noise ratio SNR of our circuit. Under 1 V voltage supply, the peak SNR achieves 69.2 dB and total harmonic distortion THD is 65.3 dB with 89 μW power consumption.

7 citations


Journal ArticleDOI
TL;DR: A three-input exclusive-OR (XOR) gate, a vital element in digital system design, is chosen to elaborate the approach and it is found that the proposed XOR gate performs best in terms of most of the performance parameters.
Abstract: This paper presents a new proposal for three-input logic function implementation in MOS current mode logic (MCML) style. The conventional realization of such logic employs three levels of stacked source-coupled transistor pairs. It puts restriction on minimum power supply requirement and results in increased static power. The new proposal presents a circuit element named as quad-tail cell which reduces number of stacked source-coupled transistor levels by two. A three-input exclusive-OR (XOR) gate, a vital element in digital system design, is chosen to elaborate the approach. Its behavior is analyzed and SPICE simulations using TSMC 180 nm CMOS technology parameters are included to support the theoretical concept. The performance of the proposed circuit is compared with its counterparts based on CMOS complementary pass transistor logic, conventional MCML, and cascading of existing two input tripple-tail XOR cells and applying triple-tail concept in conventional MCML topology. It is found that the proposed XOR gate performs best in terms of most of the performance parameters. The sensitivity of the proposed XOR gate towards process variation shows a variation of 1.54 between the best and worst case. As an extension, a realization of 4 : 1 multiplexer has also been included.

6 citations


Journal ArticleDOI
TL;DR: The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.
Abstract: This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.

4 citations


Journal ArticleDOI
TL;DR: In this work, the offset mismatch and gain mismatch are calibrated by an adaptive statistics calibration algorithm based on LMS iteration; the timing mismatch is estimated by performing the correlation calculation of the outputs of subchannels and corrected by an improved fractional delay filter based on Farrow structure.
Abstract: A low complexity all-digital background calibration technique based on statistics is proposed. The basic idea of the statistics calibration technique is that the output average energy of each channel of TIADC will be consistent ideally, since each channel samples the same input signal, and therefore the energy deviation directly reflects the mismatch errors of channels. In this work, the offset mismatch and gain mismatch are calibrated by an adaptive statistics calibration algorithm based on LMS iteration; the timing mismatch is estimated by performing the correlation calculation of the outputs of subchannels and corrected by an improved fractional delay filter based on Farrow structure. Applied to a four-channel 12-bit 400 MHz TIADC, simulation results show that, with calibration, the SNDR raises from 22.5 dB to 71.8 dB and ENOB rises from 3.4 bits to 11.6 bits for a 164.6 MHz sinusoidal input. Compared with traditional methods, the proposed schemes are more feasible to implement and consume less hardware resources.

4 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed test architecture is extensively scalable in terms of hardware overhead and performance overhead that makes it applicable to many-cores with more than a thousand processing cores.
Abstract: More pronounced aging effects, more frequent early-life failures, and incomplete testing and verification processes due to time-to-market pressure in new fabrication technologies impose reliability challenges on forthcoming systems. A promising solution to these reliability challenges is self-test and self-reconfiguration with no or limited external control. In this work a scalable self-test mechanism for periodic online testing of many-core processor has been proposed. This test mechanism facilitates autonomous detection and omission of faulty cores and makes graceful degradation of the many-core architecture possible. Several test components are incorporated in the many-core architecture that distribute test stimuli, suspend normal operation of individual processing cores, apply test, and detect faulty cores. Test is performed concurrently with the system normal operation without any noticeable downtime at the application level. Experimental results show that the proposed test architecture is extensively scalable in terms of hardware overhead and performance overhead that makes it applicable to many-cores with more than a thousand processing cores.

3 citations


Journal ArticleDOI
TL;DR: The introduction of segmentation of the CMPs’ processor pool ensures a better efficiency, in determining the inconsistencies, by reducing the number of computation steps in the verification logic.
Abstract: This work reports an effective design of cache system for Chip Multiprocessors CMPs It introduces built-in logic for verification of cache coherence in CMPs realizing directory based protocol It is developed around the cellular automata CA machine, invented by John von Neumann in the 1950s A special class of CA referred to as single length cycle 2-attractor cellular automata TACA has been planted to detect the inconsistencies in cache line states of processors’ private caches The TACA module captures coherence status of the CMPs’ cache system and memorizes any inconsistent recording of the cache line states during the processors’ reference to a memory block Theory has been developed to empower a TACA to analyse the cache state updates and then to settle to an attractor state indicating quick decision on a faulty recording of cache line status The introduction of segmentation of the CMPs’ processor pool ensures a better efficiency, in determining the inconsistencies, by reducing the number of computation steps in the verification logic The hardware requirement for the verification logic points to the fact that the overhead of proposed coherence verification module is much lesser than that of the conventional verification units and is insignificant with respect to the cost involved in CMPs’ cache system

3 citations


Journal ArticleDOI
TL;DR: An efficient Finite State Machine FSM based reconfigurable architecture for fingerprint recognition using fusion scores with correlation matching technique for FVC2004 DB3 Database is proposed and performance parameters such as TSR Total Success Rate, FAR False Acceptance Rate, and FRR False Rejection Rate are computed.
Abstract: The fingerprint identification is an efficient biometric technique to authenticate human beings in real-time Big Data Analytics. In this paper, we propose an efficient Finite State Machine FSM based reconfigurable architecture for fingerprint recognition. The fingerprint image is resized, and Compound Linear Binary Pattern CLBP is applied on fingerprint, followed by histogram to obtain histogram CLBP features. Discrete Wavelet Transform DWT Level 2 features are obtained by the same methodology. The novel matching score of CLBP is computed using histogram CLBP features of test image and fingerprint images in the database. Similarly, the DWT matching score is computed using DWT features of test image and fingerprint images in the database. Further, the matching scores of CLBP and DWT are fused with arithmetic equation using improvement factor. The performance parameters such as TSR Total Success Rate, FAR False Acceptance Rate, and FRR False Rejection Rate are computed using fusion scores with correlation matching technique for FVC2004 DB3 Database. The proposed fusion based VLSI architecture is synthesized on Virtex xc5vlx30T-3 FPGA board using Finite State Machine resulting in optimized parameters.

3 citations


Journal ArticleDOI
TL;DR: The main motivation behind this work is to propose the best efficient, reusable, and automated framework for modeling and verification of image signal processor (ISP) designs that shows better results and significant improvement is observed in product verification time, verification cost, and quality of the designs.
Abstract: In VLSI industry, image signal processing algorithms are developed and evaluated using software models before implementation of RTL and firmware. After the finalization of the algorithm, software models are used as a golden reference model for the image signal processor (ISP) RTL and firmware development. In this paper, we are describing the unified and modular modeling framework of image signal processing algorithms used for different applications such as ISP algorithms development, reference for hardware (HW) implementation, reference for firmware (FW) implementation, and bit-true certification. The universal verification methodology- (UVM-) based functional verification framework of image signal processors using software reference models is described. Further, IP-XACT based tools for automatic generation of functional verification environment files and model map files are described. The proposed framework is developed both with host interface and with core using virtual register interface (VRI) approach. This modeling and functional verification framework is used in real-time image signal processing applications including cellphone, smart cameras, and image compression. The main motivation behind this work is to propose the best efficient, reusable, and automated framework for modeling and verification of image signal processor (ISP) designs. The proposed framework shows better results and significant improvement is observed in product verification time, verification cost, and quality of the designs.

2 citations


Journal ArticleDOI
TL;DR: An improved energy-efficient capacitor switching scheme of SAR ADC is proposed for implantable bioelectronic applications, with sequence initialization, novel logic control, and capacitive subconversion, and 97.6% switching energy is reduced compared to the traditional structure.
Abstract: Low-power analog-to-digital converter (ADC) is a crucial part of wearable or implantable bioelectronics. In order to reduce the power of successive-approximation-register (SAR) ADC, an improved energy-efficient capacitor switching scheme of SAR ADC is proposed for implantable bioelectronic applications. With sequence initialization, novel logic control, and capacitive subconversion, 97.6% switching energy is reduced compared to the traditional structure. Moreover, thanks to the top-plate sampling and capacitive subconversion, 87% input-capacitance reduction can be achieved over the conventional structure. A 10-bit SAR ADC with this proposed switching scheme is realized in 65?nm CMOS. With 1.514?KHz differential sinusoidal input signals sampled at 50?KS/s, the ADC achieves an SNDR of 61.4?dB and only consumes power of 450?nW. The area of this SAR ADC IP core is only 136??m × 176??m, making it also area-efficient and very suitable for biomedical electronics application.

Journal ArticleDOI
Zhang Weigong1, Li Chao1, Qiu Keni1, Zhang Shaonan1, Chen Xianglong 
TL;DR: This paper proposes a novel time synchronization method, which can meet the requirement of time precision in UM-BUS, and shows that the synchronous precision can achieve a bias less than 20 ns.
Abstract: UM-BUS is a novel dynamically reconfigurable high-speed serial bus for embedded systems. It can achieve fault tolerance by detecting the channel status in real time and reconfigure dynamically at run-time. The bus supports direct interconnections between up to eight master nodes and multiple slave nodes. In order to solve the time synchronization problem among master nodes, this paper proposes a novel time synchronization method, which can meet the requirement of time precision in UM-BUS. In this proposed method, time is firstly broadcasted through time broadcast packets. Then, the transmission delay and time deviations via three handshakes during link self-checking and channel detection can be worked out referring to the IEEE 1588 protocol. Thereby, each node calibrates its own time according to the broadcasted time. The proposed method has been proved to meet the requirement of real-time time synchronization. The experimental results show that the synchronous precision can achieve a bias less than 20 ns.