Showing papers by "Yu Hu published in 2011"

PDF

Open Access

Proceedings Article•DOI•

IPF: In-Place X-Filling to Mitigate Soft Errors in SRAM-Based FPGAs

[...]

Zhe Feng¹, Naifeng Jing², Gengsheng Chen³, Yu Hu⁴, Lei He¹ - Show less +1 more•Institutions (4)

University of California, Los Angeles¹, Shanghai Jiao Tong University², Fudan University³, University of Alberta⁴

05 Sep 2011

TL;DR: This work shows that a large portion (40%-60% for the circuits in the authors' experiments) of the total used LUT configuration bits are don't care bits, and proposes to decide the logic values of don's care bits such that soft errors are reduced.

...read moreread less

Abstract: SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to Single Event Upsets (SEUs). We show that a large portion (40%-60% for the circuits in our experiments) of the total used LUT configuration bits are don't care bits, and propose to decide the logic values of don't care bits such that soft errors are reduced. Our approaches are efficient and do not change LUT level placement and routing. Therefore, they are suitable for design closure. For the ten largest combinational MCNC benchmark circuits mapped for 6-LUTs, our approaches obtain 20% chip level Mean Time To Failure (MTTF) improvements, compared to the baseline mapped by Berkeley ABC mapper. They obtain 3× more chip level MTTF improvements and are 128× faster when compared to the existing best in-place IPD algorithm.

...read moreread less

15 citations

Journal Article•DOI•

In-Place FPGA Retiming for Mitigation of Variational Single-Event Transient Faults

[...]

Wenyao Xu¹, Jia Wang², Yu Hu³, Ju-Yueh Lee¹, Fang Gong¹, Lei He¹, Majid Sarrafzadeh¹ - Show less +3 more•Institutions (3)

University of California, Los Angeles¹, Illinois Institute of Technology², University of Alberta³

19 May 2011-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: This paper is the first in-depth study on FPGA retiming for SET mitigation and increases mean-time-to-failure (MTTF) by 78% for variational SETs with a 10-min runtime limit while preserving the clock frequency on ISCAS89 benchmark circuits.

...read moreread less

Abstract: For anti-fuse or flash-memory-based field-programmable gate arrays (FPGAs), single-event transient (SET)-induced faults are significantly more pronounced than single-event upsets (SEUs). While most existing work studies SEU, this paper proposes a retiming algorithm for mitigating variational SETs (i.e., SETs with different durations and strengths). Considering the reshaping effect of an SET pulse caused by broadening and attenuation during its propagation, SET-aware retiming (SaR) redistributes combinational paths via post layout retiming and minimizes the possibility that an SET pulse is latched. The SaR problem is formulated as an integer linear programming (ILP) problem and solved efficiently by a progressive ILP approach. In contrast to existing SET-mitigation techniques, the proposed SaR does not change the FPGA architecture or the layout of an FPGA application. Instead, it reconfigures the connection between a flip-flop and an LUT within a programmable logic block. Experimental results show that SaR increases mean-time-to-failure (MTTF) by 78% for variational SETs with a 10-min runtime limit while preserving the clock frequency on ISCAS89 benchmark circuits. To the best of our knowledge, this paper is the first in-depth study on FPGA retiming for SET mitigation.

...read moreread less

14 citations

Proceedings Article•DOI•

Enhancement of incremental design for FPGAs using circuit similarity

[...]

Xiaoyu Shi¹, Dahua Zeng¹, Yu Hu¹, Guohui Lin¹, Osmar R. Zaïane¹ - Show less +1 more•Institutions (1)

University of Alberta¹

14 Mar 2011

TL;DR: This paper presents an efficient algorithm to detect the global topological similarity between two circuits, IDUCS, which simply inserts a plugin for circuit similarity detection, and therefore preserves the “push-button” feature, significantly simplifying the engineering complexity of incremental tasks.

...read moreread less

Abstract: This paper presents an efficient algorithm to detect the global topological similarity between two circuits. By applying the proposed circuit similarity algorithm in an incremental design flow, IDUCS (incremental design using circuit similarity), the design and optimization effort in the previous design iterations is automatically captured and can be used to guide the next design iteration. IDUCS is able to identify the similarity between the original netlist and the modified one with aggressive resynthesis, which might destroy the naming and local structures of the original netlist. This is superior to the existing design preservation approaches such as naming and local topological matching. Furthermore, IDUCS simply inserts a plugin for circuit similarity detection, and therefore preserves the “push-button” feature, significantly simplifying the engineering complexity of incremental tasks. As a case study, we perform the proposed IDUCS process to generate the placement for a logically resynthesized netlist based on the placement of the original netlist and the circuit similarity between the original and the modified logic-level netlists. The experimental results show our IDUCS-based placement is 28X faster than versatile place and route (VPR) with comparable wire length and estimated critical delay.

...read moreread less

11 citations

Proceedings Article•DOI•

Acceleration of Multi-agent Simulation on FPGAs

[...]

Lintao Cui¹, Jing Chen¹, Yu Hu¹, Jinjun Xiong², Zhe Feng³, Lei He³ - Show less +2 more•Institutions (3)

University of Alberta¹, IBM², University of California, Los Angeles³

05 Sep 2011

TL;DR: This paper proposes an FPGA-based framework for massive-scale grid-based multi-agent simulation and achieves a speedup of 290x with two million agents, compared to the C implementation.

...read moreread less

Abstract: Multi-agent simulation (MAS) is a widely used paradigm for modeling and simulating real world complex system, ranging from ant colony foraging to online trading. The performance of existing MAS software, however, suffers when simulating massive-scale multi-agent systems on traditional serial processing processors. In this paper, we propose an FPGA-based framework for massive-scale grid-based MAS. Memory interleaving, parallel tasks partition, and computing pipeline are adopted to improve system throughput. A classical MAS benchmark, Conway's Game of Life, is used as a case study to illustrate how to map grid-based models to our MAS framework. We implemented it on a Xilinx Virtex-5 FPGA board and achieved a speedup of 290x with two million agents, compared to the C implementation.

...read moreread less

9 citations

Proceedings Article•DOI•

Cross-layer optimized placement and routing for FPGA soft error mitigation

[...]

Keheng Huang¹, Yu Hu¹, Xiaowei Li¹•Institutions (1)

Chinese Academy of Sciences¹

14 Mar 2011

TL;DR: This work proposes a cross-layer optimized placement and routing algorithm to reduce the soft error rate by incorporating the application level and the physical level factor together and shows that it can reduce the SER by 14% with no area and performance overhead.

...read moreread less

Abstract: As the feature size of FPGA shrinks to nanometers, soft errors increasingly become an important concern for SRAM-based FPGAs. Without consideration of the application level impact, existing reliability-oriented placement and routing approaches analyze soft error rate (SER) only at the physical level, consequently completing the design with suboptimal soft error mitigation. Our analysis shows that the statistical variation of the application level factor is significant. Hence in this work, we first propose a cube-based analysis to efficiently and accurately evaluate the application level factor. And then we propose a cross-layer optimized placement and routing algorithm to reduce the SER by incorporating the application level and the physical level factor together. Experimental results show that, the average difference of the application level factor between our cube-based method and Monte Carlo golden simulation is less than 0.01. Moreover, compared with the baseline VPR placement and routing technique, the cross-layer optimized placement and routing algorithm can reduce the SER by 14% with no area and performance overhead.

...read moreread less

7 citations

Patent•

Fault diagnosis system and fault diagnosis method for integrated circuit

[...]

Yu Hu, Xiaowei Li, Jing Ye

16 Nov 2011

TL;DR: In this paper, the authors proposed a fault diagnosis system for combination logic faults, which can be used to generate a plurality of random fault models without any area and wiring cost under the condition that new diagnosis vectors do not need to be loaded, and the traditional diagnosis processes of the combination logic fault are not changed.

...read moreread less

Abstract: The invention relates to a fault diagnosis system and a fault diagnosis method. The fault diagnosis method is used for diagnosing fault positions in a digital integrated circuit and comprises the following steps: step 1: establishing a fault tuple equivalent tree capable of interpreting the failure vector for each failure vector; step 2: marking the latent faults in the fault tuple equivalent tree; step 3: according to the marking results of the latent faults in the fault tuple equivalent tree, selecting the most possible fault occurrence position from each latent fault, and adding the position to the final candidate fault position set; and step 4: deleting the fault tuples which are equivalent to the faults in the final candidate fault position set or can be interpreted by the faults in the final candidate fault position set from the fault tuple equivalent tree. The system and the method of the invention can be used for diagnosing combination logic faults generating a plurality of random fault models without any area and wiring cost under the condition that new diagnosis vectors do not need to be loaded, and the traditional diagnosis processes of the combination logic faults are not changed.

...read moreread less

6 citations

Proceedings Article•DOI•

A cost-effective substantial-impact-filter based method to tolerate voltage emergencies

[...]

Songjun Pan¹, Yu Hu¹, Xing Hu¹, Xiaowei Li¹•Institutions (1)

Chinese Academy of Sciences¹

14 Mar 2011

TL;DR: A substantial-impact-filter based method to tolerate voltage emergencies, including a metric intermittent vulnerability factor for intermittent timing faults (IV Fitf) to quantitatively estimate the vulnerability of microprocessor structures (load/store queue and register file) to voltage emergencies.

...read moreread less

Abstract: Supply voltage fluctuation caused by inductive noises has become a critical problem in microprocessor design. A voltage emergency occurs when supply voltage variation exceeds the acceptable voltage margin, jeopardizing the microprocessor reliability. Existing techniques assume all voltage emergencies would definitely lead to incorrect program execution and prudently activate rollbacks or flushes to recover, and consequently incur high performance overhead. We observe that not all voltage emergencies result in external visible errors, which can be exploited to avoid unnecessary protection. In this paper, we propose a substantial-impact-filter based method to tolerate voltage emergencies, including three key techniques: 1) Analyze the architecture-level masking of voltage emergencies during program execution; 2) Propose a metric intermittent vulnerability factor for intermittent timing faults (IV F itf ) to quantitatively estimate the vulnerability of microprocessor structures (load/store queue and register file) to voltage emergencies; 3) Propose a substantial-impact-filter based method to handle voltage emergencies. Experimental results demonstrate our approach gains back nearly 57% of the performance loss compared with the once-occur-then-rollback approach.

...read moreread less

6 citations

Book•

System-In-Package: Electrical and Layout Perspectives

[...]

Lei He¹, Shauki Elassaad², Yiyu Shi³, Yu Hu⁴, Wei Yao¹ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, Stanford University², Missouri University of Science and Technology³, University of Alberta⁴

20 Jun 2011

TL;DR: This paper surveys the electrical and layout perspectives of SiP, and first introduces package technologies, and then presents SiP design flow and design exploration.

...read moreread less

Abstract: The unquenched thirst for higher levels of electronic systems integration and higher performance goals has produced a plethora of design and business challenges that are threatening the success enjoyed so far as modeled by Moore's law. To tackle these challenges and meet the design needs of consumer electronics products such as those of cell phones, audio/video players, digital cameras that are composed of a number of different technologies, vertical system integration has emerged as a required technology to reduce the system board space and height in addition to the overall time-to-market and design cost. System-in-package (SiP) is a system integration technology that achieves the aforementioned needs in a scalable and cost-effective way, where multiple dies, passive components, and discrete devices are assembled, often vertically, in a package. This paper surveys the electrical and layout perspectives of SiP. It first introduces package technologies, and then presents SiP design flow and design exploration. Finally, the paper discusses details of beyond-die signal and power integrity and physical implementation such as I/O (input/output cell) placement and routing for redistribution layer, escape, and substrate.

...read moreread less

5 citations

Proceedings Article•DOI•

Exploiting Free LUT Entries to Mitigate Soft Errors in SRAM-based FPGAs

[...]

Keheng Huang¹, Yu Hu², Xiaowei Li¹, Hua Gengxin, Hongjin Liu³, Bo Liu³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Alberta², Harbin Engineering University³

20 Nov 2011

TL;DR: The proposed technique replaces not fully-occupied LUTs with corresponding functional equivalent classes, which can improve the reliability while preserve the functionality of the design.

...read moreread less

Abstract: As the feature size of FPGA shrinks to nanometers, SRAM-based FPGAs are more vulnerable to soft errors. During logic synthesis, reliability of the design can be improved by introducing logic masking effect. In this work, we observe that there are a lot of not-fully occupied look-up tables (LUTs) after logic synthesis. Hence, we propose a functional equivalent class based soft error mitigation scheme to exploit free LUT entries in the circuit. The proposed technique replaces not fully-occupied LUTs with corresponding functional equivalent classes, which can improve the reliability while preserve the functionality of the design. Experimental results show that, compared with the baseline ABC mapper, the proposed technique can reduce the soft error rate by 21%, and the critical-path delay increase is only 4.25%.

...read moreread less

5 citations

Proceedings Article•DOI•

On diagnosis of multiple faults using compacted responses

[...]

Jing Ye¹, Yu Hu¹, Xiaowei Li¹•Institutions (1)

Chinese Academy of Sciences¹

14 Mar 2011

TL;DR: This work evaluates the possibility of a suspect fault to be the actual fault using both of the new metric and the traditional metric explanation capability, and shows that 98.8% of the top-ranked suspect faults hit the actual faults.

...read moreread less

Abstract: With the exponential growth in the number of transistors, not only test data volume and test application time may increase, but also multiple faults may exist in one chip. Test compaction has been a de-facto design-for-testability technique to reduce the test cost. However, the compacted test responses make multiple-fault diagnosis rather difficult. When there is no space compactor, the most likely suspect fault is considered producing the failing responses most similar to the failing responses observed from the automatic test equipment. But when compactor exists, those suspect faults may no longer have the same high possibility of being the actual faults. To address this problem, we introduce a novel metric explanation necessity. By using both of the new metric and the traditional metric explanation capability, we evaluate the possibility of a suspect fault to be the actual fault. For ISCAS'89 and ITC'99 benchmark circuits equipped with extreme space compactors, experimental results show that 98.8% of the top-ranked suspect faults hit the actual faults, outperforming a previous work by 11.3%.

...read moreread less

4 citations

Journal Article•DOI•

A Domain Partition Model Approach to the Online Fault Recovery of FPGA-Based Reconfigurable Systems

[...]

Lihong Shang¹, Mi Zhou, Yu Hu², Erfu Yang•Institutions (2)

Beihang University¹, Chinese Academy of Sciences²

01 Jan 2011-IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

TL;DR: In this paper, a permanent fault recovery approach using a domain partition model is proposed to improve the reliability of FPGA-based reconfigurable systems and increased MTTF by up to 18.87%.

...read moreread less

Abstract: Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.

...read moreread less

Journal Article•DOI•

Capture-power-aware test data compression using selective encoding

[...]

Jia Li¹, Xiao Liu², Yubin Zhang², Yu Hu³, Xiaowei Li³, Qiang Xu² - Show less +2 more•Institutions (3)

Tsinghua University¹, The Chinese University of Hong Kong², Chinese Academy of Sciences³

01 Jun 2011-Integration

TL;DR: This paper proposes a capture-power-aware test compression scheme that is able to keep capture- power under a safe limit with low test compression ratio loss and experimental results on benchmark circuits validate the effectiveness of the proposed solution.

...read moreread less

Proceedings Article•DOI•

Transparent dynamic binding with fault-tolerant cache coherence protocol for chip multiprocessors

[...]

Shan Shuchang¹, Yu Hu¹, Xiaowei Li¹•Institutions (1)

Chinese Academy of Sciences¹

27 Jun 2011

TL;DR: This paper presents a transparent dynamic binding (TDB) mechanism that reduces the global masters-lave consistency maintenance to the scale of the private caches, and satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility.

...read moreread less

Abstract: Aggressive technology scaling causes chip multiprocessors increasingly error-prone. Core-level fault-tolerant approaches bind two cores to implement redundant execution and error detection. However, along with more cores integrated into one chip, existing static and dynamic binding schemes suffer from the scalability problem when considering the violation effects caused by external write operations. In this paper, we present a transparent dynamic binding (TDB) mechanism to address the issue. Learning from static binding schemes, we involve the private caches to hold identical data blocks, thus we reduce the global masters-lave consistency maintenance to the scale of the private caches. With our fault-tolerant cache coherence protocol, TDB satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility. Experimental results show that, for a set of parallel workloads, the overall performance of our TDB scheme is very close to that of baseline fault-tolerant systems, outperforming dynamic core coupling by 9.2%, 10.4%, 18% and 37.1% when considering 4, 8, 16 and 32 cores respectively.

...read moreread less

Journal Article•DOI•

Scan chain design for shift power reduction in scan-based testing

[...]

Jia Li¹, Yu Hu², Xiaowei Li²•Institutions (2)

Tsinghua University¹, Chinese Academy of Sciences²

28 Feb 2011-Science in China Series F: Information Sciences

TL;DR: Experimental results confirm that the proposed approach can significantly reduce scan shift power with low wire length overhead and can also be reduced by the proposed distance of EWTM (DEWTM) metric.

...read moreread less

Abstract: Test power of VLSI systems has become a challenging issue nowadays. The scan shift power dominates the average test power and restricts clock frequency of the shift phase, leading to excessive thermal accumulation and long test time. This paper proposes a scan chain design technique to solve the above problems. Based on weighted transition metric (WTM), the proposed extended WTM (EWTM) that is utilized to guide the scan chain design algorithm can estimate the scan shift power in both the shift-in and shift-out phases. Moreover, the wire length overhead of the proposed scan chain design can also be reduced by the proposed distance of EWTM (DEWTM) metric. Experimental results confirm that the proposed approach can significantly reduce scan shift power with low wire length overhead.

...read moreread less

Proceedings Article•DOI•

A method to build reconfigurable architectures by extracting common subgraphs

[...]

Tianyun Zhang¹, Rui Zhang¹, Lingli Wang¹, Yu Hu²•Institutions (2)

Fudan University¹, University of Alberta²

01 Oct 2011

TL;DR: A novel method to build reconfigurable architectures based on graph mining which aims to extract the common subgraphs among different benchmarks and a tool flow is proposed to convert benchmarks to data flow graphs and extract thecommon sub graphs.

...read moreread less

Abstract: In this paper, we present a novel method to build reconfigurable architectures. Because an RTL description of a circuit can be converted to a data flow graph (DFG), our method is based on graph mining which aims to extract the common subgraphs among different benchmarks. A tool flow is proposed to convert benchmarks to data flow graphs and extract the common subgraphs. Benchmarks in the field of Error Checking and Correcting (ECC) are selected in the experiment to demonstrate that our method is correct and practical.

...read moreread less

Journal Article•DOI•

A Case Study: Low Power Design-for-Testability Features of a Multi-Core Processor Godson-T

[...]

Da Wang¹, Dongrui Fan¹, Yu Hu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2011-Advanced Materials Research

TL;DR: The modular design methodology and scaleable design-for-testability (DFT) structure is used to achieve low test power, at the same time, an improved test pattern generation method is studied to reduce test power further more.

...read moreread less

Abstract: This paper describes the low power test challenges and features of a multi-core processor, Godson-T, which contains 16 identical cores. Since the silicon design technology scales to ultra deep submicron and even nanometers, the complexity and cost of testing is growing up, and the test power of such designs is extremely curious, especially for multicore processors. In this paper, we use the modular design methodology and scaleable design-for-testability (DFT) structure to achieve low test power, at the same time, an improved test pattern generation method is studied to reduce test power further more. The experimental results from the real chip show that the test power and test time are well balanced while achieving acceptable test coverage and cost.

...read moreread less