Home
/
Authors
/
Mary Kiemb

Author

Mary Kiemb

Bio: Mary Kiemb is an academic researcher from Seoul National University. The author has contributed to research in topics: Design space exploration & Microarchitecture. The author has an hindex of 4, co-authored 6 publications receiving 237 citations.

Papers

PDF

Open Access

More filters

Posted Content•

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

[...]

Yoonjin Kim¹, Mary Kiemb¹, Chulsoo Park¹, Jinyong Jung¹, Kiyoung Choi¹ - Show less +1 more•Institutions (1)

Seoul National University¹

25 Oct 2007-arXiv: Hardware Architecture

TL;DR: In this article, the authors proposed a reconfigurable array architecture template and design space exploration flow for domain-specific optimization, which can reduce the hardware cost and the delay without any performance degradation for some application domains.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such reconfigurable array architecture template and design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient both in performance and area compared to existing reconfigurable architectures.

...read moreread less

91 citations

Proceedings Article•DOI•

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

[...]

Yoonjin Kim¹, Mary Kiemb¹, Chulsoo Park¹, Jinyong Jung¹, Kiyoung Choi¹ - Show less +1 more•Institutions (1)

Seoul National University¹

07 Mar 2005

TL;DR: A reconfigurable array architecture template and a design space exploration flow for domain-specific optimization are suggested and Experimental results show that this approach is much more efficient, in both performance and area, compared to existing reconfigured array architectures.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures aim to achieve goals of both high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore, the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such a reconfigurable array architecture template and a design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient, in both performance and area, compared to existing reconfigurable architectures.

...read moreread less

86 citations

Proceedings Article•DOI•

A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures

[...]

Minwook Ahn¹, Jonghee W. Yoon¹, Yunheung Paek¹, Yoonjin Kim¹, Mary Kiemb¹, Kiyoung Choi¹ - Show less +2 more•Institutions (1)

Seoul National University¹

06 Mar 2006

TL;DR: This work investigates the problem of automatically mapping applications onto a coarse-grained reconfigurable architecture and proposes an efficient algorithm to solve the problem and formalizes the mapping problem and shows that it is NP-complete.

...read moreread less

Abstract: In this work, we investigate the problem of automatically mapping applications onto a coarse-grained reconfigurable architecture and propose an efficient algorithm to solve the problem. We formalize the mapping problem and show that it is NP-complete. To solve the problem within a reasonable amount of time, we divide it into three subproblems: covering, partitioning and layout. Our empirical results demonstrate that our technique produces nearly as good performance as hand-optimized outputs for many kernels.

...read moreread less

50 citations

Efficient Design Space Exploration for Domain-Specific Optimization of Coarse-Grained Reconfigurable Architecture

[...]

Yoonjin Kim, Mary Kiemb, Kiyoung Choi

01 May 2005

9 citations

Proceedings Article•DOI•

Memory and architecture exploration with thread shifting for multithreaded processors in embedded systems

[...]

Mary Kiemb¹, Kiyoung Choi¹•Institutions (1)

Seoul National University¹

22 Sep 2004

TL;DR: A design space exploration algorithm, which considers both memory configuration and multithreaded architecture and a thread shifting technique, which shifts threads in compile time to minimize cache conflict is suggested.

...read moreread less

Abstract: In embedded multithreaded architectures, the performance enhancement relative to the base single-threaded architecture is highly dependent on the characteristics of the application and memory configuration. When the application is well parallelized, the multithreading performance may be good even with a small cache since the memory access latency can be hidden. However, if there are complicated dependencies between threads, they cause frequent cache conflicts, so the performance may not be improved. For that reason, not only processor architecture but also memory configuration should be customized to get an optimal solution of an embedded multithreaded system. We suggest a design space exploration algorithm, which considers both memory configuration and multithreaded architecture and a thread shifting technique, which shifts threads in compile time to minimize cache conflict.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

[...]

Hyunchul Park¹, Kevin Fan¹, Scott Mahlke¹, Taewook Oh², Hee-Seok Kim², Hong-seok Kim² - Show less +2 more•Institutions (2)

University of Michigan¹, Samsung²

25 Oct 2008

TL;DR: Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.

...read moreread less

Abstract: Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.

...read moreread less

196 citations

Proceedings Article•DOI•

EPIMap: using epimorphism to map applications on CGRAs

[...]

Mahdi Hamzeh¹, Aviral Shrivastava¹, Sarma Vrudhula¹•Institutions (1)

Arizona State University¹

03 Jun 2012

TL;DR: Experimental results on 14 important kernels extracted from well known benchmark programs show that using EPIMap can improve the performance of the kernels on CGRA by more than 2.8X on average, as compared to one of the best existing mapping algorithm, EMS.

...read moreread less

Abstract: Coarse-Grained Reconfigurable Architectures (CGRAs) are an attractive platform that promise simultaneous high-performance and high power-efficiency. One of the primary challenges in using CGRAs is to develop efficient compilers that can automatically and efficiently map applications to the CGRA. To this end, this paper makes several contributions: i) Using Re-computation for Resource Limitations: For the first time in CGRA compilers, we propose the use of re-computation as a solution for resource limitation problem. This extends the solutions space, and enables better mappings, ii) General Problem Formulation: A precise and general formulation of the application mapping problem on a CGRA is presented, and its computational complexity is established. iii) Extracting an Efficient Heuristic: Using the insights from the problem formulation, we design an effective global heuristic called EPIMap. EPIMap transforms the input specification (a directed graph) to an Epimorphic equivalent graph that satisfies the necessary conditions for mapping on to a CGRA, reducing the search space. Experimental results on 14 important kernels extracted from well known benchmark programs show that using EPIMap can improve the performance of the kernels on CGRA by more than 2.8X on average, as compared to one of the best existing mapping algorithm, EMS. EPIMap was able to achieve the theoretical best performance for 9 out of 14 benchmarks, while EMS could not achieve the theoretical best performance for any of the benchmarks. EPIMap achieves better mappings at acceptable increase in the compilation time.

...read moreread less

125 citations

Book Chapter•DOI•

Coarse-Grained Reconfigurable Array Architectures

[...]

Bjorn De Sutter¹, Praveen Raghavan², Andy Lambrechts²•Institutions (2)

Ghent University¹, IMEC²

01 Jan 2013

TL;DR: The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code.

...read moreread less

Abstract: Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code.

...read moreread less

67 citations

Proceedings Article•DOI•

Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

[...]

Taewook Oh¹, Bernhard Egger¹, Hyunchul Park², Scott Mahlke²•Institutions (2)

Samsung¹, University of Michigan²

19 Jun 2009

TL;DR: A recurrence cycle-aware scheduling technique for CGRAs is introduced and it is shown that the technique achieves better quality schedules than schedulers based on simulated annealing at a 170-fold speed increase.

...read moreread less

Abstract: In high-end embedded systems, coarse-grained reconfigurable architectures (CGRA) continue to replace traditional ASIC designs. CGRAs offer high performance at a low power consumption, yet provide flexibility through programmability. In this paper we introduce a recurrence cycle-aware scheduling technique for CGRAs. Our modulo scheduler groups operations belonging to a recurrence cycle into a clustered node and then computes a scheduling order for those clustered nodes. Deadlocks that arise when two or more recurrence cycles depend on each other are resolved by using heuristics that favor recurrence cycles with long recurrence delays. While with previous work one had to sacrifice either a fast compilation speed in order to get good quality results, or vice versa, this is not necessary anymore with the proposed recurrence cycle-aware scheduling technique. We have implemented the proposed method into our in-house CGRA chip and compiler solution and show that the technique achieves better quality schedules than schedulers based on simulated annealing at a 170-fold speed increase.

...read moreread less

54 citations

Proceedings Article•DOI•

Power-conscious configuration cache structure and code mapping for coarse-grained reconfigurable architecture

[...]

Yoonjin Kim¹, Il-hyun Park¹, Kiyoung Choi¹, Yunheung Paek¹•Institutions (1)

Seoul National University¹

04 Oct 2006

TL;DR: This paper shows how power is consumed in a typical coarse-grained reconfigurable architecture and suggests a power-conscious configuration cache structure and code mapping technique, which reduce power consumption without performance degradation.

...read moreread less

Abstract: Coarse-grained reconfigurable architecture aims to achieve both performance and flexibility. However, power consumption is no less important for the reconfigurable architecture to be used as a competitive processing core in embedded systems. In this paper, we show how power is consumed in a typical coarse-grained reconfigurable architecture. Based on the power breakdown data, we suggest a power-conscious configuration cache structure and code mapping technique, which reduce power consumption without performance degradation. Experimental results show that the proposed approach saves much power even with reduced configuration cache size.

...read moreread less

54 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Collapse