scispace - formally typeset
Search or ask a question
Author

Timothy Tsai

Other affiliations: Alcatel-Lucent, Bell Labs, Sun Microsystems  ...read more
Bio: Timothy Tsai is an academic researcher from Nvidia. The author has contributed to research in topics: Fault injection & Fault tolerance. The author has an hindex of 24, co-authored 49 publications receiving 2632 citations. Previous affiliations of Timothy Tsai include Alcatel-Lucent & Bell Labs.

Papers
More filters
Journal ArticleDOI
TL;DR: This work uses hardware methods to evaluate low-level error detection and masking mechanisms, and software methods to test higher level mechanisms to evaluate the dependability of computer systems.
Abstract: Fault injection is important to evaluating the dependability of computer systems. Researchers and engineers have created many novel methods to inject faults, which can be implemented in both hardware and software. The contrast between the hardware and software methods lies mainly in the fault injection points they can access, the cost and the level of perturbation. Hardware methods can inject faults into chip pins and internal components, such as combinational circuits and registers that are not software-addressable. On the other hand, software methods are convenient for directly producing changes at the software-state level. Thus, we use hardware methods to evaluate low-level error detection and masking mechanisms, and software methods to test higher level mechanisms. Software methods are less expensive, but they also incur a higher perturbation overhead because they execute software on the target system.

876 citations

Proceedings ArticleDOI
12 Nov 2017
TL;DR: It is found that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design, and two efficient protection techniques are proposed.
Abstract: Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems. Traditional methods for building resilient systems, e.g., Triple Modular Redundancy (TMR), are agnostic of the DNN algorithm and the DNN accelerator's architecture. Hence, these traditional resilience approaches incur high overheads, which makes them challenging to deploy. In this paper, we experimentally evaluate the resilience characteristics of DNN systems (i.e., DNN software running on specialized accelerators). We find that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design. Based on our observations, we propose two efficient protection techniques for DNN systems.

414 citations

Proceedings Article
18 Jun 2000
TL;DR: Two new methods to detect and handle buffer overflow vulnerabilities in process stacks are presented that work with any existing pre-compiled executable and can be used transparently per-process as well as on a system-wide basis.
Abstract: The exploitation of buffer overflow vulnerabilities in process stacks constitutes a significant portion of security attacks. We present two new methods to detect and handle such attacks. In contrast to previous work, the new methods work with any existing pre-compiled executable and can be used transparently per-process as well as on a system-wide basis. The first method intercepts all calls to library functions known to be vulnerable. A substitute version of the corresponding function implements the original functionality, but in a manner that ensures that any buffer overflows are contained within the current stack frame. The second method uses binary modification of the process memory to force verification of critical elements of stacks before use. We have implemented both methods on Linux as dynamically loadable libraries and shown that both libraries detect several known attacks. The performance overhead of these libraries range from negligible to 15%.

373 citations

Proceedings ArticleDOI
Siva Kumar Sastry Hari1, Timothy Tsai1, Mark Stephenson1, Stephen W. Keckler1, Joel Emer1 
24 Apr 2017
TL;DR: This paper presents an error injection-based methodology and tool called SASSIFI to study the soft error resilience of massively parallel applications running on state-of-the-art NVIDIA GPUs.
Abstract: As GPUs become more pervasive in both scalable high-performance computing systems and safety-critical embedded systems, evaluating and analyzing their resilience to soft errors caused by high-energy particle strikes will grow increasingly important. GPU designers must develop tools and techniques to understand the effect of these soft errors on applications. This paper presents an error injection-based methodology and tool called SASSIFI to study the soft error resilience of massively parallel applications running on state-of-the-art NVIDIA GPUs. Our approach uses a low-level assembly-language instrumentation tool called SASSI to profile and inject errors. SASSI provides efficiency by allowing instrumentation code to execute entirely on the GPU and provides the ability to inject into different architecture-visible state. For example, SASSIFI can inject errors in general-purpose registers, GPU memory, condition code registers, and predicate registers. SASSIFI can also inject errors into addresses and register indices. In this paper, we describe the SASSIFI tool, its capabilities, and present experiments to illustrate some of the analyses SASSIFI can be used to perform.

117 citations

Proceedings ArticleDOI
25 Jun 1996
TL;DR: The benchmark shows that Prototype B suffers fewer catastrophic incidents than Prototype A under the same workload conditions and fault injection method, however Prototype B also suffers more performance degradation in the presence of faults, which might be an important concern for time-critical applications.
Abstract: This paper presents a benchmark for dependable systems The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a tool that (1) generates synthetic workloads that produce a high level of CPU, memory, and I/O activity and (2) injects CPU, memory, and I/O faults according to an injection strategy The benchmark has been installed on two TMR-based prototype machines: TMR Prototype A and TMR Prototype B An implementation for a third prototype, is based on a duplex architecture, is in progress The results demonstrate the utility of the benchmark in comparing the system-level fault tolerance of these machines and in providing insight into their design In particular the benchmark shows that Prototype B suffers fewer catastrophic incidents than Prototype A under the same workload conditions and fault injection method However Prototype B also suffers more performance degradation in the presence of faults, which might be an important concern for time-critical applications

105 citations


Cited by
More filters
Proceedings Article
01 Jan 2005
TL;DR: TaintCheck as mentioned in this paper performs dynamic taint analysis by performing binary rewriting at run time, which can reliably detect most types of exploits and produces no false positives for any of the many different programs that were tested.
Abstract: Software vulnerabilities have had a devastating effect on the Internet. Worms such as CodeRed and Slammer can compromise hundreds of thousands of hosts within hours or even minutes, and cause millions of dollars of damage [26, 43]. To successfully combat these fast automatic Internet attacks, we need fast automatic attack detection and filtering mechanisms. In this paper we propose dynamic taint analysis for automatic detection of overwrite attacks, which include most types of exploits. This approach does not need source code or special compilation for the monitored program, and hence works on commodity software. To demonstrate this idea, we have implemented TaintCheck, a mechanism that can perform dynamic taint analysis by performing binary rewriting at run time. We show that TaintCheck reliably detects most types of exploits. We found that TaintCheck produced no false positives for any of the many different programs that we tested. Further, we describe how TaintCheck could improve automatic signature generation in

1,557 citations

Journal ArticleDOI
TL;DR: This work uses hardware methods to evaluate low-level error detection and masking mechanisms, and software methods to test higher level mechanisms to evaluate the dependability of computer systems.
Abstract: Fault injection is important to evaluating the dependability of computer systems. Researchers and engineers have created many novel methods to inject faults, which can be implemented in both hardware and software. The contrast between the hardware and software methods lies mainly in the fault injection points they can access, the cost and the level of perturbation. Hardware methods can inject faults into chip pins and internal components, such as combinational circuits and registers that are not software-addressable. On the other hand, software methods are convenient for directly producing changes at the software-state level. Thus, we use hardware methods to evaluate low-level error detection and masking mechanisms, and software methods to test higher level mechanisms. Software methods are less expensive, but they also incur a higher perturbation overhead because they execute software on the target system.

876 citations

Proceedings ArticleDOI
07 Oct 2004
TL;DR: This work presents a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead and is transparent to users or application programmers.
Abstract: We present a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead. Dynamic information flow tracking protects programs against malicious software attacks by identifying spurious information flows from untrusted I/O and restricting the usage of the spurious information.Every security attack to take control of a program needs to transfer the program's control to malevolent code. In our approach, the operating system identifies a set of input channels as spurious, and the processor tracks all information flows from those inputs. A broad range of attacks are effectively defeated by checking the use of the spurious values as instructions and pointers.Our protection is transparent to users or application programmers; the executables can be used without any modification. Also, our scheme only incurs, on average, a memory overhead of 1.4% and a performance overhead of 1.1%.

811 citations

Proceedings ArticleDOI
27 Oct 2003
TL;DR: A new, general approach for safeguarding systems against any type of code-injection attack, by creating process-specific randomized instruction sets of the system executing potentially vulnerable software that can serve as a low-overhead protection mechanism, and can easily complement other mechanisms.
Abstract: We describe a new, general approach for safeguarding systems against any type of code-injection attack. We apply Kerckhoff's principle, by creating process-specific randomized instruction sets (e.g., machine instructions) of the system executing potentially vulnerable software. An attacker who does not know the key to the randomization algorithm will inject code that is invalid for that randomized processor, causing a runtime exception. To determine the difficulty of integrating support for the proposed mechanism in the operating system, we modified the Linux kernel, the GNU binutils tools, and the bochs-x86 emulator. Although the performance penalty is significant, our prototype demonstrates the feasibility of the approach, and should be directly usable on a suitable-modified processor (e.g., the Transmeta Crusoe).Our approach is equally applicable against code-injecting attacks in scripting and interpreted languages, e.g., web-based SQL injection. We demonstrate this by modifying the Perl interpreter to permit randomized script execution. The performance penalty in this case is minimal. Where our proposed approach is feasible (i.e., in an emulated environment, in the presence of programmable or specialized hardware, or in interpreted languages), it can serve as a low-overhead protection mechanism, and can easily complement other mechanisms.

779 citations

Proceedings Article
10 Jun 2002
TL;DR: This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C”s hallmark control over low-level details such as data representation and memory management.
Abstract: Cyclone is a safe dialect of C. It has been designed from the ground up to prevent the buffer overflows, format string attacks, and memory management errors that are common in C programs, while retaining C’s syntax and semantics. This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C’s hallmark control over low-level details such as data representation and memory management.

777 citations