Home
/
Authors
/
James D. Stevens

Author

James D. Stevens

University of Illinois at Urbana–Champaign

Bio: James D. Stevens is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Load balancing (computing) & Job scheduler. The author has an hindex of 1, co-authored 4 publications receiving 651 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Wire routing by optimizing channel assignment within large apertures

[...]

Akihiro Hashimoto¹, James D. Stevens¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Jun 1971

TL;DR: The purpose of this paper is to introduce a new wire routing method for two layer printed circuit boards based on the newly developed channel assignment algorithm and requires many via holes.

...read moreread less

Abstract: The purpose of this paper is to introduce a new wire routing method for two layer printed circuit boards. This technique has been developed at the University of Illinois Center for Advanced Computation and has been programmed in ALGOL for a B5500 computer. The routing method is based on the newly developed channel assignment algorithm and requires many via holes. The primary goals of the method are short execution time and high wireability. Actual design specifications for ILLIAC IV Control Unit boards have been used to test the feasibility of the routing technique. Tests have shown that this algorithm is very fast and can handle large boards.

...read moreread less

655 citations

Journal Article•DOI•

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

[...]

James D. Stevens¹, Andreas Klöckner¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

03 Jun 2020-International Journal of High Performance Computing Applications

TL;DR: This work presents an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion.

...read moreread less

Abstract: The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, develo...

...read moreread less

1 citations

Posted Content•

A Unified, Hardware-Fitted, Cross-GPU Performance Model

[...]

James D. Stevens, Andreas Klöckner

18 Apr 2016-arXiv: Performance

TL;DR: A mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system are presented, and these counts are applied in a simple, linear model of kernel run time.

...read moreread less

Abstract: We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run time. We use a series of `performance-instructive' kernels to fit the parameters of a unified model to the performance characteristics of GPU hardware from multiple hardware generations and vendors. We evaluate the predictive power of the model on a broad array of computational kernels relevant to scientific computing. In terms of the geometric mean, our simple, vendor- and GPU-type-independent model achieves relative accuracy comparable to that of previously published work using hardware specific models.

...read moreread less

1 citations

Journal Article•DOI•

A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling

[...]

James D. Stevens¹, Andreas Klöckner¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

21 Apr 2019-arXiv: Performance

TL;DR: In this paper, the authors present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs.

...read moreread less

Abstract: The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance, parallel applications. In today's increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. Our approach empowers a user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define a model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. To evaluate our approach, we demonstrate both linear and nonlinear models; each example models execution times for multiple variants of a particular computation: two matrix multiplication variants, four Discontinuous Galerkin (DG) differentiation operation variants, and two 2-D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this customizable approach as a response to a central question in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes manual collection of kernel or hardware statistics.

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The NP-completeness column: An ongoing guide

[...]

David S. Johnson¹•Institutions (1)

Bell Labs¹

01 Dec 1986-Journal of Algorithms

TL;DR: This is the fourteenth edition of a quarterly column that provides continuing coverage of new developments in the theory of NP-completeness, and readers who have results they would like mentioned (NP-hardness, PSPACE- hardness, polynomialtime-solvability, etc.), or open problems they wouldlike publicized, should send them to David S. Johnson.

...read moreread less

857 citations

Journal Article•DOI•

Efficient Algorithms for Channel Routing

[...]

Takeshi Yoshimura¹, Ernest S. Kuh¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1982-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Two new algorithms merge nets instead of assigning horizontal tracks to individual nets to route a specified net list between two rows of terminals across a two-layer channel in the layout design of LSI chips.

...read moreread less

Abstract: In the layout design of LSI chips, channel routing is one of the key problems. The problem is to route a specified net list between two rows of terminals across a two-layer channel. Nets are routed with horizontal segments on one layer and vertical segments on the other. Connections between two layers are made through via holes. Two new algorithms are proposed. These algorithms merge nets instead of assigning horizontal tracks to individual nets. The algorithms were coded in Fortran and implemented on a VAX 11/780 computer. Experimental results are quite encouraging. Both programs generated optimal solutions in 6 out of 8 cases, using examples in previously published papers. The computation times of the algorithms for a typical channel (300 terminals, 70 nets) are 1.0 and 2.1 s, respectively.

...read moreread less

539 citations

Proceedings Article•DOI•

A “DOGLEG” channel router

[...]

David N. Deutsch¹•Institutions (1)

Bell Labs¹

28 Jun 1976

TL;DR: The routing algorithm presented here was developed as part of LTX, a computer-aided design system for integrated circuit layout and was implemented on an HP-2100 minicomputer.

...read moreread less

Abstract: This paper presents an algorithm for interconnecting two sets of terminals across an intervening channel. It is assumed that the routing is done on two distinct levels with all horizontal paths being assigned to one level and all vertical paths to the other. Connections between the levels are made through contact windows. A single net may result in many horizontal and vertical segments. Experimental results indicate that this algorithm is very successful in routing channels that contain severe constraints. Usually, the routing is accomplished within one track of the mathematical lower bound. The routing algorithm presented here was developed as part of LTX, a computer-aided design system for integrated circuit layout and was implemented on an HP-2100 minicomputer. A typical channel (300 terminals, 100 nets) can be routed in less than 5 seconds. Routing results are presented both for polycell chips under development at Bell Laboratories and for examples that exist in the published literature. For the latter, reductions of 10% in the wiring area were typical.

...read moreread less

364 citations

Proceedings Article•DOI•

REAL: A Program for REgister ALlocation

[...]

Fadi J. Kurdahi¹, Alice C. Parker¹•Institutions (1)

University of Southern California¹

01 Oct 1987

TL;DR: The REAL REgister ALlocation program uses a track assignment algorithm taken from channel routing called the Left Edge algorithm to process designs output from MAHA and Sehwa, and is thought to be optimal for non-pipelined designs with no conditional branches.

...read moreread less

Abstract: This paper describes the REAL REgister ALlocation program. REAL uses a track assignment algorithm taken from channel routing called the Left Edge algorithm. REAL is optimal for non-pipelined designs with no conditional branches. It is thought that REAL is also optimal for designs with conditional branches, pipelined or not. Experimental results are included in the report, which illustrate the optimal solutions found by REAL. REAL is part of the ADAM Advanced Design AutoMation system, and will be used to process designs output from MAHA and Sehwa.

...read moreread less

324 citations

Proceedings Article•DOI•

A "Greedy" Channel Router

[...]

Ronald L. Rivest¹, Charles M. Fiduccia¹, Charles M. Fiduccia²•Institutions (2)

Massachusetts Institute of Technology¹, General Electric²

01 Jan 1982

TL;DR: A new, “greedy”, channel-router that always succeeds, usually using no more than one track more than required by channel density, and may be forced in rare cases to make a few connections "off the end” of the channel.

...read moreread less

Abstract: We present a new, "greedy", channel-router that is quick, simple, and highly effective. It always succeeds, usually using no more than one track more than required by channel density. (It may be forced in rare cases to make a few connections "off the end" of the channel, in order to succeed.) It assumes that all pins and wiring lie on a common grid, and that vertical wires are on one layer, horizontal on another. The greedy router wires up the channel in a left-to-right, column-by-column manner, wiring each column completely before starting the next. Within each column the router tries to maximize the utility of the wiring produced, using simple, "greedy" heuristics. It may place a net on more than one track for a few columns, and "collapse" the net to a single track later on, using a vertical jog. It may also use a jog to move a net to a track closer to its pin in some future column. The router may occasionally add a new track to the channel, to avoid "getting stuck".

...read moreread less

291 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

Collapse