Home
/
Authors
/
Jiao-Wei Huang

Author

Jiao-Wei Huang

Bio: Jiao-Wei Huang is an academic researcher from National Taiwan University. The author has contributed to research in topics: Low-power electronics & Non-uniform memory access. The author has an hindex of 2, co-authored 3 publications receiving 31 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Improved weight assignment for logic switching activity during at-speed test pattern generation

[...]

Meng-Fan Wu¹, H.-C. Pan¹, T.-H. Wang¹, Jiao-Wei Huang¹, Kun-Han Tsai², Wu-Tung Cheng² - Show less +2 more•Institutions (2)

National Taiwan University¹, Mentor Graphics²

18 Jan 2010

TL;DR: In this article, a new weight assignment scheme for logic switching activity was proposed, which enhances the IR-drop assessment capability of the existing weighted switching activity (WSA) model by including the power grid network structure information.

...read moreread less

Abstract: For two-pattern at-speed scan testing, the excessive power supply noise at the launch cycle may cause the circuit under test to malfunction, leading to yield loss. This paper proposes a new weight assignment scheme for logic switching activity; it enhances the IR-drop assessment capability of the existing weighted switching activity (WSA) model. By including the power grid network structure information, the proposed weight assignment better reflects the regional IR-drop impact of each switching event. For ATPG, such comprehensive information is crucial in determining whether a switching event burdens the IR-drop effect. Simulation results show that, compared with previous weight assignment schemes, the estimated regional IR-drop profiles better correlate with those generated by commercial tools.

...read moreread less

21 citations

Proceedings Article•DOI•

Hierarchical memory scheduling for multimedia MPSoCs

[...]

Ye-Jyun Lin¹, Chia-Lin Yang¹, Tay-Jyi Lin², Jiao-Wei Huang¹, Naehyuck Chang³ - Show less +1 more•Institutions (3)

National Taiwan University¹, National Chung Cheng University², Seoul National University³

07 Nov 2010

TL;DR: The experimental results show that the proposed scheduling policy improves system throughput by 21% compared to FR-FCFS (first-ready first-come-first-serve) on an MPSoC for mobile phones with QoS guarantee.

...read moreread less

Abstract: Optimizing memory system performance is critical for delivering high system performance for multimedia applications since they are usually memory intensive. As the number of IP cores in a multimedia MPSoC (Multi-Processor System-on-Chip) continues to increase, system performance will be eventually limited by the memory system. In this paper, we tackle the memory performance issue of multimedida MPSoCs through intelligent memory access scheduling. We observe that since memory resources are shared by all processing elements in an MPSoC, interferences among requests from different IP cores cause not only delay in memory accesses but also unfair DRAM accesses among IPs. Traditional memory scheduling policies that only emphasize on maximizing memory system throughput do not take into account these interferences. Therefore, in this paper, we propose a hierarchical memory scheduling policy to minimize interferences among requests. The experimental results show that the proposed scheduling policy improves system throughput by 21% compared to FR-FCFS (first-ready first-come-first-serve) on an MPSoC for mobile phones with QoS guarantee.

...read moreread less

8 citations

Proceedings Article•DOI•

Memory access aware power gating for MPSoCs

[...]

Ye-Jyun Lin¹, Chia-Lin Yang¹, Jiao-Wei Huang¹, Naehyuck Chang²•Institutions (2)

National Taiwan University¹, Seoul National University²

09 Mar 2012

TL;DR: A run-time mechanism is proposed that predict the memory stall cycles of an individual IP, and make the power gating decision based on the predicted memory latency and its break-even time, so that a power-gated IP can be woken up in advance to avoid performance degradation.

...read moreread less

Abstract: As technology continues to scale, reducing leakage is critical to achieve energy efficiency. Power gating can potentially save a significant part of leakage but it incurs both energy and performance penalties. Therefore, power gating decisions need to be made carefully. In the current low-power SoC design, an IP core is power gated when it is not operating. In this paper, we explore the IP idle time due to memory accesses for further leakage reduction. In MPSoCs, due to contention among concurrent memory accesses from different IP cores, memory stall cycles vary significantly, ranging from 10 to 600 cycles according to our experiments. We propose a run-time mechanism that predict the memory stall cycles of an individual IP, and make the power gating decision based on the predicted memory latency and its break-even time. With the predicted memory latency, a power-gated IP can be woken up in advance to avoid performance degradation. The experimental results show that our power management mechanism can achieve 25.3% leakage energy saving within 4% performance penalty.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Parallel application memory scheduling

[...]

Eiman Ebrahimi¹, Rustam Miftakhutdinov¹, Chris Fallin², Chang Joo Lee³, José A. Joao¹, Onur Mutlu², Yale N. Patt¹ - Show less +3 more•Institutions (3)

University of Texas at Austin¹, Carnegie Mellon University², Intel³

03 Dec 2011

TL;DR: This paper proposes a memory scheduling algorithm designed specifically for parallel applications, targeting two common synchronization primitives that cause inter-dependence of threads: locks and barriers, and shows that it speeds up a set of memory-intensive parallel applications by 12.6% compared to the best previous memory scheduling technique.

...read moreread less

Abstract: A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down by issuing memory requests that interfere in the shared memory subsystem. This inter-thread memory system interference can significantly degrade parallel application performance. Better memory request scheduling may mitigate such performance degradation. However, previously proposed memory scheduling algorithms for CMPs are designed for multi-programmed workloads where each core runs an independent application, and thus do not take into account the inter-dependent nature of threads in a parallel application. In this paper, we propose a memory scheduling algorithm designed specifically for parallel applications. Our approach has two main components, targeting two common synchronization primitives that cause inter-dependence of threads: locks and barriers. First, the runtime system estimates threads holding the locks that cause the most serialization as the set of limiter threads, which are prioritized by the memory scheduler. Second, the memory scheduler shuffles thread priorities to reduce the time threads take to reach the barrier.We show that our memory scheduler speeds up a set of memory-intensive parallel applications by 12.6% compared to the best previous memory scheduling technique.

...read moreread less

147 citations

Proceedings Article•DOI•

GemDroid: a framework to evaluate mobile platforms

[...]

Nachiappan Chidambaram Nachiappan¹, Praveen Yedlapalli¹, Niranjan Soundararajan², Mahmut Kandemir¹, Anand Sivasubramaniam¹, Chita R. Das¹ - Show less +2 more•Institutions (2)

Pennsylvania State University¹, Intel²

16 Jun 2014

TL;DR: GemDroid is designed by integrating the Android open-source emulator for facilitating execution of mobile applications, the GEM5 core simulator for analyzing the CPU and memory centric designs, and models for several IPs to collectively study their impact on system-level performance and power.

...read moreread less

Abstract: As the demand for feature-rich mobile systems such as smartphones and tablets has outpaced other computing systems and is expected to continue at a faster rate, it is projected that SoCs with tens of cores and hundreds of IPs (or accelerator) will be designed to provide unprecedented level of features and functionality in future. Design of such mobile systems with required QoS and power budgets along with other design constraints will be a daunting task for computer architects since any ad hoc, piece-meal solution is unlikely to result in an optimal design. This requires early exploration of the complete design space to understand the system-level design trade-offs. To the best of our knowledge, there is no such publicly available tool to conduct a holistic evaluation of mobile platforms consisting of cores, IPs and system software.This paper presents GemDroid, a comprehensive simulation infrastructure to address these concerns. GemDroid has been designed by integrating the Android open-source emulator for facilitating execution of mobile applications, the GEM5 core simulator for analyzing the CPU and memory centric designs, and models for several IPs to collectively study their impact on system-level performance and power. Analyzing a spectrum of applications with GemDroid, we observed that the memory subsystem is a vital cog in the mobile platform because, it needs to handle both core and IP traffic, which have very different characteristics. Consequently, we present a heterogeneous memory controller (HMC) design, where we divide the memory physically into two address regions, where the first region with one memory controller (MC) handles core-specific application data and the second region with another MC handles all IP related data. The proposed modifications to the memory controller design results in an average 25% reduction in execution time for CPU bound applications, up to 11% reduction in frame drops, and on average 17% reduction in CPU busy time for on-screen (IP bound) applications.

...read moreread less

35 citations

Proceedings Article•DOI•

VIP: virtualizing IP chains on handheld platforms

[...]

Nachiappan Chidambaram Nachiappan¹, Haibo Zhang¹, Jihyun Ryoo¹, Niranjan Soundararajan², Anand Sivasubramaniam¹, Mahmut Kandemir¹, Ravi Iyer², Chita R. Das¹ - Show less +4 more•Institutions (2)

Pennsylvania State University¹, Intel²

13 Jun 2015

TL;DR: This paper proposes a novel IP virtualization framework (VIP), involving three key ideas that allow several IPs to be chained together and made to appear to the software as a single device, thereby allowing better energy saving and utilization opportunities.

...read moreread less

Abstract: Energy-efficient user-interactive and display-oriented applications on handhelds rely heavily on multiple accelerators (termed IP cores) to meet their periodic frame processing needs. Further, these platforms are starting to host multiple applications concurrently on the multiple CPU cores. Unfortunately, today's hardware exposes an interface that forces the host software (Android drivers) to treat each IP core as an isolated device. Consequently, the host CPU has to get involved in the (i) processing of each frame, (ii) scheduling them to ensure timely progress through the IP cores to meet their QoS needs, and (iii) explicitly having to move data from one IP core to the next, with main memory serving as the common staging area. We show in this paper through measurements on a Nexus 7 platform that the frequent invocation of the CPU for processing these frames and the involvement of main memory as a data flow conduit, are serious limitations. Instead, we propose a novel IP virtualization framework (VIP), involving three key ideas that allow several IPs to be chained together and made to appear to the software as a single device. First, chaining of IPs avoids data transfer through the memory system, enhancing the throughput of flows through the IPs. Second, by using a burst-mode, the CPU can initiate the processing of several frames through the virtual IP chain, without getting involved (and interrupted) for each frame, thereby allowing better energy saving and utilization opportunities. Removing the CPU from this loop, requires alternate orchestration of frame flows to ensure QoS guarantees for each frame of each application. Our third enhancement in VIP creates several virtual paths, one for each flow, through these IP chains with the hardware scheduling the frames to enforce QoS guarantees despite any contention for resources along the way. Our experimental evaluations demonstrate the effectiveness of VIP on energy consumption and QoS for multiple applications.

...read moreread less

30 citations

Proceedings Article•DOI•

Power supply noise control in pseudo functional test

[...]

Tengteng Zhang¹, Duncan M. Walker¹•Institutions (1)

Texas A&M University¹

29 Apr 2013

TL;DR: A simulation-based X'Filling method, Bit-Flip, is proposed to maximize the power supply noise during PKLPG test and demonstrates that the method can significantly increase effective WSA while limiting the fill rate.

...read moreread less

Abstract: Pseudo functional K Longest Path Per Gate (KLPG) test (PKLPG) is proposed to generate delay tests that test the longest paths while having power supply noise similar to that seen during normal functional operation. Our experimental results show that PKLPG is more vulnerable to under-testing than traditional two-cycle transition fault test. In this work, a simulation-based X'Filling method, Bit-Flip, is proposed to maximize the power supply noise during PKLPG test. Given a set of partially-specified scan patterns, random filling is done and then an iterative procedure is invoked to flip some of the filled bits, to increase the effective weighted switching activity (WSA). Experimental results on both compacted and uncompacted test patterns are presented. The results demonstrate that our method can significantly increase effective WSA while limiting the fill rate.

...read moreread less

23 citations

Journal Article•DOI•

Design for Testability Support for Launch and Capture Power Reduction in Launch-Off-Shift and Launch-Off-Capture Testing

[...]

Samah Mohamed Saeed¹, Ozgur Sinanoglu²•Institutions (2)

New York University¹, New York University Abu Dhabi²

01 Mar 2014-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The proposed DfT support enables a design partitioning approach, where any given set of patterns, generated in a power-unaware manner, can be utilized to test the design regions one at a time, reducing both launch and capture power in a design-flow-compatible manner.

...read moreread less

Abstract: At-speed or even faster-than-at-speed testing of VLSI circuits aims for high-quality screening of the circuits by targeting performance-related faults. On one hand, a compact test set with highly effective patterns, each detecting multiple delay faults, is desirable for lower test costs. On the other hand, such patterns increase switching activity during launch and capture operations. Patterns optimized for quality and cost may thus end up violating peak-power constraints, resulting in yield loss, while pattern generation under low switching activity constraints may lead to loss in test quality and/or pattern count inflation. In this paper, we propose design for testability (DfT) support for enabling the use of a set of patterns optimized for cost and quality as is, yet in a low power manner; we develop three different DfT mechanisms, one for launch-off shift, one for launch-off capture, and one for mixed at-speed testing. The proposed DfT support enables a design partitioning approach, where any given set of patterns, generated in a power-unaware manner, can be utilized to test the design regions one at a time, reducing both launch and capture power in a design-flow-compatible manner. This way, the test pattern count and quality of the optimized test set can be preserved, while lowering the launch/capture power.

...read moreread less

20 citations

1
2
3
4
…
5
6
7

Collapse