Home
/
Authors
/
Kemal Ebcioglu

Author

Kemal Ebcioglu

Other affiliations: University at Buffalo

Bio: Kemal Ebcioglu is an academic researcher from IBM. The author has contributed to research in topics: Very long instruction word & Compiler. The author has an hindex of 28, co-authored 68 publications receiving 4605 citations. Previous affiliations of Kemal Ebcioglu include University at Buffalo.

Papers published on a yearly basis

2023
2022
2021
2016
2014
2012
2007
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1984

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

X10: an object-oriented approach to non-uniform cluster computing

[...]

Philippe Charles¹, Christian Grothoff², Vijay Saraswat¹, Christopher Michael Donawa¹, Allan H. Kielstra¹, Kemal Ebcioglu¹, Christoph von Praun¹, Vivek Sarkar¹ - Show less +4 more•Institutions (2)

IBM¹, University of California, Los Angeles²

12 Oct 2005

TL;DR: A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.

...read moreread less

Abstract: It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places}; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java™ languages.

...read moreread less

1,469 citations

Proceedings Article•DOI•

DAISY: dynamic compilation for 100% architectural compatibility

[...]

Kemal Ebcioglu¹, Erik R. Altman¹•Institutions (1)

IBM¹

01 May 1997

TL;DR: The architectural requirements for such a VLIW, to deal with issues including self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O are discussed.

...read moreread less

Abstract: Although VLIW architectures offer the advantages of simplicity of design and high issue rates, a major impediment to their use is that they are not compatible with the existing software base. We describe new simple hardware features for a VLIW machine we call DAISY (DynamicallyArchitectedInstructionSet fromYorktown). DAISY is specifically intended to emulate existing architectures, so that all existing software for an old architecture (including operating system kernel code) runs without changes on the VLIW. Each time a new fragment of code is executed for the first time, the code is translated to VLIW primitives, parallelized and saved in a portion of main memory not visible to the old architecture, by a Virtual Machine Monitor (software) residing in read only memory. Subsequent executions of the same fragment do not require a translation (unless cast out). We discuss the architectural requirements for such a VLIW, to deal with issues including self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O. We have implemented the dynamic parallelization algorithms for the PowerPC architecture. The initial results show high degrees of instruction level parallelism with reasonable translation overhead and memory usage.

...read moreread less

406 citations

Journal Article•DOI•

Programming by sketching for bit-streaming programs

[...]

Armando Solar-Lezama¹, Rodric Rabbah², Rastislav Bodik¹, Kemal Ebcioglu³•Institutions (3)

University of California, Berkeley¹, Massachusetts Institute of Technology², IBM³

12 Jun 2005

TL;DR: StreamBit is developed as a sketching methodology for the important class of bit-streaming programs (e.g., coding and cryptography), which allows a programmer to write clean and portable reference code, and then obtain a high-quality implementation by simply sketching the outlines of the desired implementation.

...read moreread less

Abstract: This paper introduces the concept of programming with sketches, an approach for the rapid development of high-performance applications. This approach allows a programmer to write clean and portable reference code, and then obtain a high-quality implementation by simply sketching the outlines of the desired implementation. Subsequently, a compiler automatically fills in the missing details while also ensuring that a completed sketch is faithful to the input reference code. In this paper, we develop StreamBit as a sketching methodology for the important class of bit-streaming programs (e.g., coding and cryptography).A sketch is a partial specification of the implementation, and as such, it affords several benefits to programmer in terms of productivity and code robustness. First, a sketch is easier to write compared to a complete implementation. Second, sketching allows the programmer to focus on exploiting algorithmic properties rather than on orchestrating low-level details. Third, a sketch-aware compiler rejects "buggy" sketches, thus improving reliability while allowing the programmer to quickly evaluate sophisticated implementation ideas.We evaluated the productivity and performance benefits of our programming methodology in a user-study, where a group of novice StreamBit programmers competed with a group of experienced C programmers on implementing a cipher. We learned that, given the same time budget, the ciphers developed in StreamBit ran 2.5x faster than ciphers coded in C. We also produced implementations of DES and Serpent that were competitive with hand optimized implementations available in the public domain.

...read moreread less

286 citations

Journal Article•DOI•

Dynamic binary translation and optimization

[...]

Kemal Ebcioglu¹, Erik R. Altman¹, Michael K. Gschwind¹, Sumedh W. Sathaye¹•Institutions (1)

IBM¹

01 Jun 2001-IEEE Transactions on Computers

TL;DR: Different design trade-offs in the DAISY system and their impact on final system performance are reported, and the results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.

...read moreread less

Abstract: We describe a VLIW architecture designed specifically as a target for dynamic compilation of an existing instruction set architecture. This design approach offers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY (Dynamically Architected Instruction Set from Yorktown). The dynamic compiler exploits runtime profile information to optimize translations so as to extract instruction level parallelism. This paper reports different design trade-offs in the DAISY system and their impact on final system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.

...read moreread less

152 citations

Proceedings Article•

An Expert System for Harmonizing Four-Part Chorales

[...]

Kemal Ebcioglu

01 Jan 1986

TL;DR: The economy and elegance of the formal representation underlying these musical styles (which are not in the least less respectable than traditional styles of music), may often have an aesthetic appeal in and of themselves as discussed by the authors.

...read moreread less

Abstract: Quite a few trends in algorithmic composition today are based on a streamlined formalism, for example, in the form of random generation of note attributes using elegant statistical distributions (Xenakis 1971), terse and powerful formal grammars (Jones 1981), elegant mathematical models (Kendall 1981; Vaggione 1984), or generalizations of serial composition procedures (Laske 1981). The economy and elegance of the formal representation underlying these musical styles (which are not in the least less respectable than traditional styles of music), may often have an aesthetic appeal in and of themselves. On the other hand, traditional music and most of modern music, which are usually composed without a computer, do not seem to permit such economical representations. In the traditional style, the typical basic training the composer has to go through in harmony, strict counterpoint, fugue,

...read moreread less

149 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

LLVM: a compilation framework for lifelong program analysis & transformation

[...]

Chris Lattner¹, Vikram Adve¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

Abstract: We describe LLVM (low level virtual machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in static single assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently. The LLVM compiler framework and code representation together provide a combination of key capabilities that are important for practical, lifelong analysis and transformation of programs. To our knowledge, no existing compilation approach provides all these capabilities. We describe the design of the LLVM representation and compiler framework, and evaluate the design in three ways: (a) the size and effectiveness of the representation, including the type information it provides; (b) compiler performance for several interprocedural problems; and (c) illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

4,841 citations

The Landscape of Parallel Computing Research: A View from Berkeley

[...]

Krste Asanovic, Ras Bodik, Bryan Catanzaro, Joseph Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Plishker, John Shalf, Samuel Williams, Katherine Yelick - Show less +7 more

18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

Abstract: Author(s): Asanovic, K; Bodik, R; Catanzaro, B; Gebis, J; Husbands, P; Keutzer, K; Patterson, D; Plishker, W; Shalf, J; Williams, SW | Abstract: The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. • Instead of traditional benchmarks, use 13 “Dwarfs” to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) • “Autotuners” should play a larger role than conventional compilers in translating parallel programs. • To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. • To be successful, programming models should be independent of the number of processors. • To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. 1 The Landscape of Parallel Computing Research: A View From Berkeley • Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. • Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. • To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.

...read moreread less

2,262 citations

Journal Article•

Julia: A Fresh Approach to Numerical Computing

[...]

Stefan Karpinski, Viral B. Shah, Jeffrey Werner Bezanson, Alan Edelman

01 Feb 2017-Siam Journal on Control and Optimization

TL;DR: The Julia programming language and its design is introduced---a dance between specialization and abstraction, which recognizes what remains the same after computation, and which is best left untouched as they have been built by the experts.

...read moreread less

1,730 citations

Proceedings Article•DOI•

Evaluating MapReduce for Multi-core and Multiprocessor Systems

[...]

C. Ranger¹, R. Raghuraman¹, A. Penmetsa¹, Gary Bradski¹, Christos Kozyrakis¹ - Show less +1 more•Institutions (1)

Stanford University¹

10 Feb 2007

TL;DR: It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.

...read moreread less

Abstract: This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code

...read moreread less

1,058 citations

Journal Article•DOI•

Software pipelining: an effective scheduling technique for VLIW machines

[...]

Monica S. Lam¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 1988

TL;DR: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors, and proposes a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block.

...read moreread less

Abstract: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipelining is that optimal performance can be achieved with compact object code.This paper extends previous results of software pipelining in two ways: First, this paper shows that by using an improved algorithm, near-optimal performance can be obtained without specialized hardware. Second, we propose a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block. With this scheme, all innermost loops, including those containing conditional statements, can be software pipelined. It also diminishes the start-up cost of loops with small number of iterations. Hierarchical reduction complements the software pipelining technique, permitting a consistent performance improvement be obtained.The techniques proposed have been validated by an implementation of a compiler for Warp, a systolic array consisting of 10 VLIW processors. This compiler has been used for developing a large number of applications in the areas of image, signal and scientific processing.

...read moreread less

936 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse