Search or ask a question

Showing papers by "Kemal Ebcioglu published in 2002"

PDF

Open Access

Journal Article•DOI•

Unroll-based copy elimination for enhanced pipeline scheduling

[...]

Suhyun Kim¹, Soo-Mook Moon¹, Jinpyo Park¹, Kemal Ebcioglu²•Institutions (2)

Seoul National University¹, IBM²

01 Sep 2002-IEEE Transactions on Computers

TL;DR: A code transformation technique based on loop unrolling which makes many renaming copy instructions that cannot be coalescible due to interferences coalescible, and enables EPS to avoid a serious slowdown from latency handling and resource pressure, while keeping its variable II and other advantages.

...read moreread less

Abstract: Enhanced pipeline scheduling (EPS) is a software pipelining technique which can achieve a variable initiation interval (II) for loops with control flow via its code motion pipelining. EPS, however, leaves behind many renaming copy instructions that cannot be coalesced due to interferences. These copies take resources and, more seriously, they may cause a stall if they rename a multilatency instruction whose latency is longer than the II aimed for by EPS. This paper proposes a code transformation technique based on loop unrolling which makes those copies coalescible. Two unique features of the technique are its method of determining the precise unroll amount, based on an idea of extended live ranges, and its insertion of special bookkeeping copies at loop exits. The proposed technique enables EPS to avoid a serious slowdown from latency handling and resource pressure, while keeping its variable II and other advantages. In fact, renaming through copies, followed by unroll-based copy elimination, is EPS's solution to the cross-iteration register overwrite problem in software pipelining. It works for loops with arbitrary control flow that EPS must deal with, as well as for straightline loops. Our empirical study performed on a VLIW testbed with a two-cycle load latency shows that 86 percent of the otherwise uncoalescible copies in innermost loops become coalescible when unrolled 2.2 times on average. In addition, it is demonstrated that the unroll amount obtained is precise and the most efficient. The unrolled version of the VLIW code includes fewer no-op VLIW caused by stalls, improving the performance by a geometric mean of 18 percent on a 16-ALU machine.

...read moreread less

8 citations

Book Chapter•DOI•

A Register File Architecture and Compilation Scheme for Clustered ILP Processors

[...]

Krishnan K. Kailas¹, Manoj Franklin², Kemal Ebcioglu¹•Institutions (2)

IBM¹, University of Maryland, College Park²

27 Aug 2002

TL;DR: This work presents a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction.

...read moreread less

Abstract: In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into on-chip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.

...read moreread less

8 citations

Proceedings Article•

Proceedings of the 16th international conference on Supercomputing

[...]

Kemal Ebcioglu¹, Keshav Pingali², Alexandru Nicolau³•Institutions (3)

IBM¹, Cornell University², University of California³

22 Jun 2002

TL;DR: An excellent technical program consisting of 31 papers selected from 144 submissions; the technical areas represented by these papers range from applications all the way to low-power architectures.

...read moreread less

Abstract: On behalf of the ICS'02 Organizing Committee, we are pleased to welcome you to New York City for the 16th ACM International Conference on Supercomputing. Over the years, ICS has built a tradition of bringing you the very best in research and experience in the exciting area of high-performance computing.This year is no exception. We have an excellent technical program consisting of 31 papers selected from 144 submissions; the technical areas represented by these papers range from applications all the way to low-power architectures. Three outstanding speakers have kindly consented to give keynote addresses: Tetsuya Sato (Director-General, Earth Simulator Center, Japan), Alfred Z. Spector (Vice President, Services and Software, IBM Research), and David Kuck (Intel Fellow, Director, KAI Software Lab). Rounding off the technical program are three panel discussions, five tutorials, and four workshops on exciting current topics in the field of high-performance computing, such as self-healing, and adaptive system.

...read moreread less

3 citations