scispace - formally typeset
Search or ask a question

Showing papers by "Kemal Ebcioglu published in 2002"


Journal ArticleDOI
TL;DR: A code transformation technique based on loop unrolling which makes many renaming copy instructions that cannot be coalescible due to interferences coalescible, and enables EPS to avoid a serious slowdown from latency handling and resource pressure, while keeping its variable II and other advantages.
Abstract: Enhanced pipeline scheduling (EPS) is a software pipelining technique which can achieve a variable initiation interval (II) for loops with control flow via its code motion pipelining. EPS, however, leaves behind many renaming copy instructions that cannot be coalesced due to interferences. These copies take resources and, more seriously, they may cause a stall if they rename a multilatency instruction whose latency is longer than the II aimed for by EPS. This paper proposes a code transformation technique based on loop unrolling which makes those copies coalescible. Two unique features of the technique are its method of determining the precise unroll amount, based on an idea of extended live ranges, and its insertion of special bookkeeping copies at loop exits. The proposed technique enables EPS to avoid a serious slowdown from latency handling and resource pressure, while keeping its variable II and other advantages. In fact, renaming through copies, followed by unroll-based copy elimination, is EPS's solution to the cross-iteration register overwrite problem in software pipelining. It works for loops with arbitrary control flow that EPS must deal with, as well as for straightline loops. Our empirical study performed on a VLIW testbed with a two-cycle load latency shows that 86 percent of the otherwise uncoalescible copies in innermost loops become coalescible when unrolled 2.2 times on average. In addition, it is demonstrated that the unroll amount obtained is precise and the most efficient. The unrolled version of the VLIW code includes fewer no-op VLIW caused by stalls, improving the performance by a geometric mean of 18 percent on a 16-ALU machine.

8 citations


Book ChapterDOI
27 Aug 2002
TL;DR: This work presents a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction.
Abstract: In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into on-chip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.

8 citations


Proceedings Article
22 Jun 2002
TL;DR: An excellent technical program consisting of 31 papers selected from 144 submissions; the technical areas represented by these papers range from applications all the way to low-power architectures.
Abstract: On behalf of the ICS'02 Organizing Committee, we are pleased to welcome you to New York City for the 16th ACM International Conference on Supercomputing. Over the years, ICS has built a tradition of bringing you the very best in research and experience in the exciting area of high-performance computing.This year is no exception. We have an excellent technical program consisting of 31 papers selected from 144 submissions; the technical areas represented by these papers range from applications all the way to low-power architectures. Three outstanding speakers have kindly consented to give keynote addresses: Tetsuya Sato (Director-General, Earth Simulator Center, Japan), Alfred Z. Spector (Vice President, Services and Software, IBM Research), and David Kuck (Intel Fellow, Director, KAI Software Lab). Rounding off the technical program are three panel discussions, five tutorials, and four workshops on exciting current topics in the field of high-performance computing, such as self-healing, and adaptive system.

3 citations