Home
/
Authors
/
Jeng-Hau Lin

Author

Jeng-Hau Lin

Other affiliations: National Taiwan University

Bio: Jeng-Hau Lin is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Matrix exponential & Matrix decomposition. The author has an hindex of 8, co-authored 19 publications receiving 475 citations. Previous affiliations of Jeng-Hau Lin include National Taiwan University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

[...]

Ritchie Zhao¹, Weinan Song¹, Wentao Zhang, Tianwei Xing², Jeng-Hau Lin³, Mani Srivastava², Rajesh Gupta³, Zhiru Zhang¹ - Show less +4 more•Institutions (3)

Cornell University¹, University of California, Los Angeles², University of California, San Diego³

22 Feb 2017

TL;DR: The design of a BNN accelerator is presented that is synthesized from C++ to FPGA-targeted Verilog and outperforms existing FPGAs-based CNN accelerators in GOPS as well as energy and resource efficiency.

...read moreread less

Abstract: Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run on clusters of CPUs or GPUs. Studies into the FPGA acceleration of CNN workloads has achieved reductions in power and energy consumption. However, large GPUs outperform modern FPGAs in throughput, and the existence of compatible deep learning frameworks give GPUs a significant advantage in programmability. Recent research in machine learning demonstrates the potential of very low precision CNNs -- i.e., CNNs with binarized weights and activations. Such binarized neural networks (BNNs) appear well suited for FPGA implementation, as their dominant computations are bitwise logic operations and their memory requirements are reduced. A combination of low-precision networks and high-level design methodology may help address the performance and productivity gap between FPGAs and GPUs. In this paper, we present the design of a BNN accelerator that is synthesized from C++ to FPGA-targeted Verilog. The accelerator outperforms existing FPGA-based CNN accelerators in GOPS as well as energy and resource efficiency.

...read moreread less

379 citations

Journal Article•DOI•

Fast Methodology for Determining Eye Diagram Characteristics of Lossy Transmission Lines

[...]

Wei-Da Guo¹, Jeng-Hau Lin¹, Chien-Min Lin¹, Tian-Wei Huang¹, Ruey-Beei Wu¹ - Show less +1 more•Institutions (1)

National Taiwan University¹

18 Feb 2009-IEEE Transactions on Advanced Packaging

TL;DR: In this paper, a fast methodology that employs only two anti-polarity one-bit data patterns instead of the pseudo-random bit sequence as input sources to simulate the worst-case eye diagram was proposed.

...read moreread less

Abstract: As the speed of signal through an interconnection increases toward the multigigabit ranges, the effects of lossy transmission lines on the signal quality of printed circuit boards becomes a critical issue. To evaluate the eye diagram and thus the signal integrity in the modern digital systems, this paper proposes a fast methodology that employs only two anti-polarity one-bit data patterns instead of the pseudo-random bit sequence as input sources to simulate the worst-case eye diagram. Analytic expressions are derived for the impulse response of the lossy transmission lines due to the skin-effect loss, while the Kramers-Kronig relations are employed to deal with the noncausal problem related to the dielectric loss. Two design graphs that can be used to rapidly predict the eye diagram characteristics versus the conductive and dielectric losses are then constructed and based on which, the maximally usable length of transmission lines under a certain signal specification can be easily acquired. At last, the time-domain simulations and experiments are implemented to verify the exactitude of proposed concept.

...read moreread less

40 citations

Proceedings Article•DOI•

An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations

[...]

Xun Jiao¹, Mulong Luo², Jeng-Hau Lin¹, Rajesh Gupta¹•Institutions (2)

University of California, San Diego¹, Cornell University²

13 Nov 2017

TL;DR: This paper outlines the assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.

...read moreread less

Abstract: As a problem solving method, neural networks have shown broad applicability from medical applications, speech recognition, and natural language processing. This success has even led to implementation of neural network algorithms into hardware. In this paper, we explore two questions: (a) to what extent microelectronic variations affects the quality of results by neural networks; and (b) if the answer to first question represents an opportunity to optimize the implementation of neural network algorithms. Regarding first question, variations are now increasingly common in aggressive process nodes and typically manifest as an increased frequency of timing errors. Combating variations - due to process and/or operating conditions - usually results in increased guardbands in circuit and architectural design, thus reducing the gains from process technology advances. Given the inherent resilience of neural networks due to adaptation of their learning parameters, one would expect the quality of results produced by neural networks to be relatively insensitive to the rising timing error rates caused by increased variations. On the contrary, using two frequently used neural networks (MLP and CNN), our results show that variations can significantly affect the inference accuracy. This paper outlines our assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.

...read moreread less

33 citations

Proceedings Article•DOI•

MATEX: A Distributed Framework for Transient Simulation of Power Distribution Networks

[...]

Hao Zhuang¹, Shih-Hung Weng², Jeng-Hau Lin¹, Chung-Kuan Cheng¹•Institutions (2)

University of California, San Diego¹, Facebook²

14 Nov 2015-arXiv: Computational Engineering, Finance, and Science

TL;DR: In this paper, a distributed framework for transient simulation of power distribution networks (PDNs) is proposed, which utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit.

...read moreread less

Abstract: We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time stepping solver is performed without extra matrix re-factorizations. MATEX overcomes the stiff-ness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time stepping methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks.

...read moreread less

28 citations

Proceedings Article•DOI•

MATEX: A Distributed Framework for Transient Simulation of Power Distribution Networks

[...]

Hao Zhuang¹, Shih-Hung Weng², Jeng-Hau Lin¹, Chung-Kuan Cheng¹•Institutions (2)

University of California, San Diego¹, Facebook²

01 Jun 2014

TL;DR: MATEX overcomes the stiffness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of KrylovSubspace bases and highly accelerates the whole computation.

...read moreread less

Abstract: We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time stepping solver is performed without extra matrix re-factorizations. MATEX overcomes the stiffness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time stepping methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks.

...read moreread less

26 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Book•

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

[...]

Ieee Circuits

01 Jan 1982

729 citations

Journal Article•DOI•

Computer-aided analysis of electronic circuits: Algorithms and computational techniques

[...]

Y.L. Kuo¹, M.L. Liou•Institutions (1)

Bell Labs¹

01 Jun 1977

TL;DR: One of the books that can be recommended for new readers is computer aided analysis of electronic circuits algorithms and computational techniques, which is not kind of difficult book to read.

...read moreread less

Abstract: Preparing the books to read every day is enjoyable for many people. However, there are still many people who also don't like reading. This is a problem. But, when you can support others to start reading, it will be better. One of the books that can be recommended for new readers is computer aided analysis of electronic circuits algorithms and computational techniques. This book is not kind of difficult book to read. It can be read and understand by the new readers.

...read moreread less

621 citations

Proceedings Article•DOI•

A configurable cloud-scale DNN processor for real-time AI

[...]

Jeremy Fowers¹, Kalin Ovtcharov¹, Michael K. Papamichael¹, Todd Massengill¹, Ming Liu¹, Lo Daniel¹, Shlomi Alkalay¹, Michael Haselman¹, Logan Adams¹, Mahdi Ghandi¹, Stephen F. Heil¹, Prerak Patel¹, Adam Sapek¹, Gabriel Weisz¹, Lisa Woods¹, Sitaram Lanka¹, Steven K. Reinhardt¹, Adrian M. Caulfield¹, Eric S. Chung¹, Doug Burger¹ - Show less +16 more•Institutions (1)

Microsoft¹

02 Jun 2018

TL;DR: This paper describes the NPU architecture for Project Brainwave, a production-scale system for real-time AI, and achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art GPUs on large RNNs at a batch size of 1.5 teraflops.

...read moreread less

Abstract: Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models—aka ""real-time AI"". The growing demand for computationally expensive, state-of-the-art DNNs, coupled with diminishing performance gains of general-purpose architectures, has fueled an explosion of specialized Neural Processing Units (NPUs). NPUs for interactive services should satisfy two requirements: (1) execution of DNN models with low latency, high throughput, and high efficiency, and (2) flexibility to accommodate evolving state-of-the-art models (e.g., RNNs, CNNs, MLPs) without costly silicon updates. This paper describes the NPU architecture for Project Brainwave, a production-scale system for real-time AI. The Brainwave NPU achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art GPUs on large RNNs at a batch size of 1. The NPU attains this performance using a single-threaded SIMD ISA paired with a distributed microarchitecture capable of dispatching over 7M operations from a single instruction. The spatially distributed microarchitecture, scaled up to 96,000 multiply-accumulate units, is supported by hierarchical instruction decoders and schedulers coupled with thousands of independently addressable high-bandwidth on-chip memories, and can transparently exploit many levels of fine-grain SIMD parallelism. When targeting an FPGA, microarchitectural parameters such as native datapaths and numerical precision can be "synthesis specialized" to models at compile time, enabling atypically high FPGA performance competitive with hardened NPUs. When running on an Intel Stratix 10 280 FPGA, the Brainwave NPU achieves performance ranging from ten to over thirty-five teraflops, with no batching, on large, memory-intensive RNNs.

...read moreread less

498 citations

Journal Article•DOI•

Binary Neural Networks: A Survey

[...]

Haotong Qin¹, Ruihao Gong¹, Xianglong Liu¹, Xiao Bai¹, Jingkuan Song², Nicu Sebe³ - Show less +2 more•Institutions (3)

Beihang University¹, University of Electronic Science and Technology of China², University of Trento³

01 Sep 2020-Pattern Recognition

TL;DR: A comprehensive survey of algorithms proposed for binary neural networks, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error are presented.

...read moreread less

346 citations

Journal Article•DOI•

Pruning and quantization for deep neural network acceleration: A survey

[...]

Tailin Liang¹, John Glossner¹, Lei Wang¹, Shaobo Shi¹, Xiaotong Zhang¹ - Show less +1 more•Institutions (1)

University of Science and Technology Beijing¹

21 Oct 2021-Neurocomputing

TL;DR: A survey on two types of network compression: pruning and quantization is provided, which compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

...read moreread less

266 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

Collapse