scispace - formally typeset
Search or ask a question

Showing papers by "Yu Hu published in 2011"


Proceedings ArticleDOI
05 Sep 2011
TL;DR: This work shows that a large portion (40%-60% for the circuits in the authors' experiments) of the total used LUT configuration bits are don't care bits, and proposes to decide the logic values of don's care bits such that soft errors are reduced.
Abstract: SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to Single Event Upsets (SEUs). We show that a large portion (40%-60% for the circuits in our experiments) of the total used LUT configuration bits are don't care bits, and propose to decide the logic values of don't care bits such that soft errors are reduced. Our approaches are efficient and do not change LUT level placement and routing. Therefore, they are suitable for design closure. For the ten largest combinational MCNC benchmark circuits mapped for 6-LUTs, our approaches obtain 20% chip level Mean Time To Failure (MTTF) improvements, compared to the baseline mapped by Berkeley ABC mapper. They obtain 3× more chip level MTTF improvements and are 128× faster when compared to the existing best in-place IPD algorithm.

15 citations


Journal ArticleDOI
TL;DR: This paper is the first in-depth study on FPGA retiming for SET mitigation and increases mean-time-to-failure (MTTF) by 78% for variational SETs with a 10-min runtime limit while preserving the clock frequency on ISCAS89 benchmark circuits.
Abstract: For anti-fuse or flash-memory-based field-programmable gate arrays (FPGAs), single-event transient (SET)-induced faults are significantly more pronounced than single-event upsets (SEUs). While most existing work studies SEU, this paper proposes a retiming algorithm for mitigating variational SETs (i.e., SETs with different durations and strengths). Considering the reshaping effect of an SET pulse caused by broadening and attenuation during its propagation, SET-aware retiming (SaR) redistributes combinational paths via post layout retiming and minimizes the possibility that an SET pulse is latched. The SaR problem is formulated as an integer linear programming (ILP) problem and solved efficiently by a progressive ILP approach. In contrast to existing SET-mitigation techniques, the proposed SaR does not change the FPGA architecture or the layout of an FPGA application. Instead, it reconfigures the connection between a flip-flop and an LUT within a programmable logic block. Experimental results show that SaR increases mean-time-to-failure (MTTF) by 78% for variational SETs with a 10-min runtime limit while preserving the clock frequency on ISCAS89 benchmark circuits. To the best of our knowledge, this paper is the first in-depth study on FPGA retiming for SET mitigation.

14 citations


Proceedings ArticleDOI
Xiaoyu Shi1, Dahua Zeng1, Yu Hu1, Guohui Lin1, Osmar R. Zaïane1 
14 Mar 2011
TL;DR: This paper presents an efficient algorithm to detect the global topological similarity between two circuits, IDUCS, which simply inserts a plugin for circuit similarity detection, and therefore preserves the “push-button” feature, significantly simplifying the engineering complexity of incremental tasks.
Abstract: This paper presents an efficient algorithm to detect the global topological similarity between two circuits. By applying the proposed circuit similarity algorithm in an incremental design flow, IDUCS (incremental design using circuit similarity), the design and optimization effort in the previous design iterations is automatically captured and can be used to guide the next design iteration. IDUCS is able to identify the similarity between the original netlist and the modified one with aggressive resynthesis, which might destroy the naming and local structures of the original netlist. This is superior to the existing design preservation approaches such as naming and local topological matching. Furthermore, IDUCS simply inserts a plugin for circuit similarity detection, and therefore preserves the “push-button” feature, significantly simplifying the engineering complexity of incremental tasks. As a case study, we perform the proposed IDUCS process to generate the placement for a logically resynthesized netlist based on the placement of the original netlist and the circuit similarity between the original and the modified logic-level netlists. The experimental results show our IDUCS-based placement is 28X faster than versatile place and route (VPR) with comparable wire length and estimated critical delay.

11 citations


Proceedings ArticleDOI
05 Sep 2011
TL;DR: This paper proposes an FPGA-based framework for massive-scale grid-based multi-agent simulation and achieves a speedup of 290x with two million agents, compared to the C implementation.
Abstract: Multi-agent simulation (MAS) is a widely used paradigm for modeling and simulating real world complex system, ranging from ant colony foraging to online trading. The performance of existing MAS software, however, suffers when simulating massive-scale multi-agent systems on traditional serial processing processors. In this paper, we propose an FPGA-based framework for massive-scale grid-based MAS. Memory interleaving, parallel tasks partition, and computing pipeline are adopted to improve system throughput. A classical MAS benchmark, Conway's Game of Life, is used as a case study to illustrate how to map grid-based models to our MAS framework. We implemented it on a Xilinx Virtex-5 FPGA board and achieved a speedup of 290x with two million agents, compared to the C implementation.

9 citations


Proceedings ArticleDOI
14 Mar 2011
TL;DR: This work proposes a cross-layer optimized placement and routing algorithm to reduce the soft error rate by incorporating the application level and the physical level factor together and shows that it can reduce the SER by 14% with no area and performance overhead.
Abstract: As the feature size of FPGA shrinks to nanometers, soft errors increasingly become an important concern for SRAM-based FPGAs. Without consideration of the application level impact, existing reliability-oriented placement and routing approaches analyze soft error rate (SER) only at the physical level, consequently completing the design with suboptimal soft error mitigation. Our analysis shows that the statistical variation of the application level factor is significant. Hence in this work, we first propose a cube-based analysis to efficiently and accurately evaluate the application level factor. And then we propose a cross-layer optimized placement and routing algorithm to reduce the SER by incorporating the application level and the physical level factor together. Experimental results show that, the average difference of the application level factor between our cube-based method and Monte Carlo golden simulation is less than 0.01. Moreover, compared with the baseline VPR placement and routing technique, the cross-layer optimized placement and routing algorithm can reduce the SER by 14% with no area and performance overhead.

7 citations


Patent
16 Nov 2011
TL;DR: In this paper, the authors proposed a fault diagnosis system for combination logic faults, which can be used to generate a plurality of random fault models without any area and wiring cost under the condition that new diagnosis vectors do not need to be loaded, and the traditional diagnosis processes of the combination logic fault are not changed.
Abstract: The invention relates to a fault diagnosis system and a fault diagnosis method. The fault diagnosis method is used for diagnosing fault positions in a digital integrated circuit and comprises the following steps: step 1: establishing a fault tuple equivalent tree capable of interpreting the failure vector for each failure vector; step 2: marking the latent faults in the fault tuple equivalent tree; step 3: according to the marking results of the latent faults in the fault tuple equivalent tree, selecting the most possible fault occurrence position from each latent fault, and adding the position to the final candidate fault position set; and step 4: deleting the fault tuples which are equivalent to the faults in the final candidate fault position set or can be interpreted by the faults in the final candidate fault position set from the fault tuple equivalent tree. The system and the method of the invention can be used for diagnosing combination logic faults generating a plurality of random fault models without any area and wiring cost under the condition that new diagnosis vectors do not need to be loaded, and the traditional diagnosis processes of the combination logic faults are not changed.

6 citations


Proceedings ArticleDOI
14 Mar 2011
TL;DR: A substantial-impact-filter based method to tolerate voltage emergencies, including a metric intermittent vulnerability factor for intermittent timing faults (IV Fitf) to quantitatively estimate the vulnerability of microprocessor structures (load/store queue and register file) to voltage emergencies.
Abstract: Supply voltage fluctuation caused by inductive noises has become a critical problem in microprocessor design. A voltage emergency occurs when supply voltage variation exceeds the acceptable voltage margin, jeopardizing the microprocessor reliability. Existing techniques assume all voltage emergencies would definitely lead to incorrect program execution and prudently activate rollbacks or flushes to recover, and consequently incur high performance overhead. We observe that not all voltage emergencies result in external visible errors, which can be exploited to avoid unnecessary protection. In this paper, we propose a substantial-impact-filter based method to tolerate voltage emergencies, including three key techniques: 1) Analyze the architecture-level masking of voltage emergencies during program execution; 2) Propose a metric intermittent vulnerability factor for intermittent timing faults (IV F itf ) to quantitatively estimate the vulnerability of microprocessor structures (load/store queue and register file) to voltage emergencies; 3) Propose a substantial-impact-filter based method to handle voltage emergencies. Experimental results demonstrate our approach gains back nearly 57% of the performance loss compared with the once-occur-then-rollback approach.

6 citations


Book
20 Jun 2011
TL;DR: This paper surveys the electrical and layout perspectives of SiP, and first introduces package technologies, and then presents SiP design flow and design exploration.
Abstract: The unquenched thirst for higher levels of electronic systems integration and higher performance goals has produced a plethora of design and business challenges that are threatening the success enjoyed so far as modeled by Moore's law. To tackle these challenges and meet the design needs of consumer electronics products such as those of cell phones, audio/video players, digital cameras that are composed of a number of different technologies, vertical system integration has emerged as a required technology to reduce the system board space and height in addition to the overall time-to-market and design cost. System-in-package (SiP) is a system integration technology that achieves the aforementioned needs in a scalable and cost-effective way, where multiple dies, passive components, and discrete devices are assembled, often vertically, in a package. This paper surveys the electrical and layout perspectives of SiP. It first introduces package technologies, and then presents SiP design flow and design exploration. Finally, the paper discusses details of beyond-die signal and power integrity and physical implementation such as I/O (input/output cell) placement and routing for redistribution layer, escape, and substrate.

5 citations


Proceedings ArticleDOI
20 Nov 2011
TL;DR: The proposed technique replaces not fully-occupied LUTs with corresponding functional equivalent classes, which can improve the reliability while preserve the functionality of the design.
Abstract: As the feature size of FPGA shrinks to nanometers, SRAM-based FPGAs are more vulnerable to soft errors. During logic synthesis, reliability of the design can be improved by introducing logic masking effect. In this work, we observe that there are a lot of not-fully occupied look-up tables (LUTs) after logic synthesis. Hence, we propose a functional equivalent class based soft error mitigation scheme to exploit free LUT entries in the circuit. The proposed technique replaces not fully-occupied LUTs with corresponding functional equivalent classes, which can improve the reliability while preserve the functionality of the design. Experimental results show that, compared with the baseline ABC mapper, the proposed technique can reduce the soft error rate by 21%, and the critical-path delay increase is only 4.25%.

5 citations


Proceedings ArticleDOI
14 Mar 2011
TL;DR: This work evaluates the possibility of a suspect fault to be the actual fault using both of the new metric and the traditional metric explanation capability, and shows that 98.8% of the top-ranked suspect faults hit the actual faults.
Abstract: With the exponential growth in the number of transistors, not only test data volume and test application time may increase, but also multiple faults may exist in one chip. Test compaction has been a de-facto design-for-testability technique to reduce the test cost. However, the compacted test responses make multiple-fault diagnosis rather difficult. When there is no space compactor, the most likely suspect fault is considered producing the failing responses most similar to the failing responses observed from the automatic test equipment. But when compactor exists, those suspect faults may no longer have the same high possibility of being the actual faults. To address this problem, we introduce a novel metric explanation necessity. By using both of the new metric and the traditional metric explanation capability, we evaluate the possibility of a suspect fault to be the actual fault. For ISCAS'89 and ITC'99 benchmark circuits equipped with extreme space compactors, experimental results show that 98.8% of the top-ranked suspect faults hit the actual faults, outperforming a previous work by 11.3%.

4 citations


Journal ArticleDOI
TL;DR: In this paper, a permanent fault recovery approach using a domain partition model is proposed to improve the reliability of FPGA-based reconfigurable systems and increased MTTF by up to 18.87%.
Abstract: Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.

Journal ArticleDOI
TL;DR: This paper proposes a capture-power-aware test compression scheme that is able to keep capture- power under a safe limit with low test compression ratio loss and experimental results on benchmark circuits validate the effectiveness of the proposed solution.

Proceedings ArticleDOI
27 Jun 2011
TL;DR: This paper presents a transparent dynamic binding (TDB) mechanism that reduces the global masters-lave consistency maintenance to the scale of the private caches, and satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility.
Abstract: Aggressive technology scaling causes chip multiprocessors increasingly error-prone. Core-level fault-tolerant approaches bind two cores to implement redundant execution and error detection. However, along with more cores integrated into one chip, existing static and dynamic binding schemes suffer from the scalability problem when considering the violation effects caused by external write operations. In this paper, we present a transparent dynamic binding (TDB) mechanism to address the issue. Learning from static binding schemes, we involve the private caches to hold identical data blocks, thus we reduce the global masters-lave consistency maintenance to the scale of the private caches. With our fault-tolerant cache coherence protocol, TDB satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility. Experimental results show that, for a set of parallel workloads, the overall performance of our TDB scheme is very close to that of baseline fault-tolerant systems, outperforming dynamic core coupling by 9.2%, 10.4%, 18% and 37.1% when considering 4, 8, 16 and 32 cores respectively.

Journal ArticleDOI
TL;DR: Experimental results confirm that the proposed approach can significantly reduce scan shift power with low wire length overhead and can also be reduced by the proposed distance of EWTM (DEWTM) metric.
Abstract: Test power of VLSI systems has become a challenging issue nowadays. The scan shift power dominates the average test power and restricts clock frequency of the shift phase, leading to excessive thermal accumulation and long test time. This paper proposes a scan chain design technique to solve the above problems. Based on weighted transition metric (WTM), the proposed extended WTM (EWTM) that is utilized to guide the scan chain design algorithm can estimate the scan shift power in both the shift-in and shift-out phases. Moreover, the wire length overhead of the proposed scan chain design can also be reduced by the proposed distance of EWTM (DEWTM) metric. Experimental results confirm that the proposed approach can significantly reduce scan shift power with low wire length overhead.

Proceedings ArticleDOI
01 Oct 2011
TL;DR: A novel method to build reconfigurable architectures based on graph mining which aims to extract the common subgraphs among different benchmarks and a tool flow is proposed to convert benchmarks to data flow graphs and extract thecommon sub graphs.
Abstract: In this paper, we present a novel method to build reconfigurable architectures. Because an RTL description of a circuit can be converted to a data flow graph (DFG), our method is based on graph mining which aims to extract the common subgraphs among different benchmarks. A tool flow is proposed to convert benchmarks to data flow graphs and extract the common subgraphs. Benchmarks in the field of Error Checking and Correcting (ECC) are selected in the experiment to demonstrate that our method is correct and practical.

Journal ArticleDOI
TL;DR: The modular design methodology and scaleable design-for-testability (DFT) structure is used to achieve low test power, at the same time, an improved test pattern generation method is studied to reduce test power further more.
Abstract: This paper describes the low power test challenges and features of a multi-core processor, Godson-T, which contains 16 identical cores. Since the silicon design technology scales to ultra deep submicron and even nanometers, the complexity and cost of testing is growing up, and the test power of such designs is extremely curious, especially for multicore processors. In this paper, we use the modular design methodology and scaleable design-for-testability (DFT) structure to achieve low test power, at the same time, an improved test pattern generation method is studied to reduce test power further more. The experimental results from the real chip show that the test power and test time are well balanced while achieving acceptable test coverage and cost.