scispace - formally typeset
Search or ask a question
Author

James D. Stevens

Bio: James D. Stevens is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Load balancing (computing) & Job scheduler. The author has an hindex of 1, co-authored 4 publications receiving 651 citations.

Papers
More filters
Proceedings ArticleDOI
28 Jun 1971
TL;DR: The purpose of this paper is to introduce a new wire routing method for two layer printed circuit boards based on the newly developed channel assignment algorithm and requires many via holes.
Abstract: The purpose of this paper is to introduce a new wire routing method for two layer printed circuit boards. This technique has been developed at the University of Illinois Center for Advanced Computation and has been programmed in ALGOL for a B5500 computer. The routing method is based on the newly developed channel assignment algorithm and requires many via holes. The primary goals of the method are short execution time and high wireability. Actual design specifications for ILLIAC IV Control Unit boards have been used to test the feasibility of the routing technique. Tests have shown that this algorithm is very fast and can handle large boards.

655 citations

Journal ArticleDOI
TL;DR: This work presents an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion.
Abstract: The ability to model, analyze, and predict execution time of computations is an important building block that supports numerous efforts, such as load balancing, benchmarking, job scheduling, develo...

1 citations

Posted Content
TL;DR: A mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system are presented, and these counts are applied in a simple, linear model of kernel run time.
Abstract: We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run time. We use a series of `performance-instructive' kernels to fit the parameters of a unified model to the performance characteristics of GPU hardware from multiple hardware generations and vendors. We evaluate the predictive power of the model on a broad array of computational kernels relevant to scientific computing. In terms of the geometric mean, our simple, vendor- and GPU-type-independent model achieves relative accuracy comparable to that of previously published work using hardware specific models.

1 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs.
Abstract: The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance, parallel applications. In today's increasingly heterogeneous computing environment, this task must be accomplished efficiently across multiple architectures, including massively parallel coprocessors like GPUs. To address this challenge, we present an approach for constructing customizable, cross-machine performance models for GPU kernels, including a mechanism to automatically and symbolically gather performance-relevant kernel operation counts, a tool for formulating mathematical models using these counts, and a customizable parameterized collection of benchmark kernels used to calibrate models to GPUs in a black-box fashion. Our approach empowers a user to manage trade-offs between model accuracy, evaluation speed, and generalizability. A user can define a model and customize the calibration process, making it as simple or complex as desired, and as application-targeted or general as desired. To evaluate our approach, we demonstrate both linear and nonlinear models; each example models execution times for multiple variants of a particular computation: two matrix multiplication variants, four Discontinuous Galerkin (DG) differentiation operation variants, and two 2-D five-point finite difference stencil variants. For each variant, we present accuracy results on GPUs from multiple vendors and hardware generations. We view this customizable approach as a response to a central question in GPU performance modeling: how can we model GPU performance in a cost-explanatory fashion while maintaining accuracy, evaluation speed, portability, and ease of use, an attribute we believe precludes manual collection of kernel or hardware statistics.

Cited by
More filters
Journal ArticleDOI
David S. Johnson1
TL;DR: This is the fourteenth edition of a quarterly column that provides continuing coverage of new developments in the theory of NP-completeness, and readers who have results they would like mentioned (NP-hardness, PSPACE- hardness, polynomialtime-solvability, etc.), or open problems they wouldlike publicized, should send them to David S. Johnson.

857 citations

Journal ArticleDOI
TL;DR: Two new algorithms merge nets instead of assigning horizontal tracks to individual nets to route a specified net list between two rows of terminals across a two-layer channel in the layout design of LSI chips.
Abstract: In the layout design of LSI chips, channel routing is one of the key problems. The problem is to route a specified net list between two rows of terminals across a two-layer channel. Nets are routed with horizontal segments on one layer and vertical segments on the other. Connections between two layers are made through via holes. Two new algorithms are proposed. These algorithms merge nets instead of assigning horizontal tracks to individual nets. The algorithms were coded in Fortran and implemented on a VAX 11/780 computer. Experimental results are quite encouraging. Both programs generated optimal solutions in 6 out of 8 cases, using examples in previously published papers. The computation times of the algorithms for a typical channel (300 terminals, 70 nets) are 1.0 and 2.1 s, respectively.

539 citations

Proceedings ArticleDOI
David N. Deutsch1
28 Jun 1976
TL;DR: The routing algorithm presented here was developed as part of LTX, a computer-aided design system for integrated circuit layout and was implemented on an HP-2100 minicomputer.
Abstract: This paper presents an algorithm for interconnecting two sets of terminals across an intervening channel. It is assumed that the routing is done on two distinct levels with all horizontal paths being assigned to one level and all vertical paths to the other. Connections between the levels are made through contact windows. A single net may result in many horizontal and vertical segments. Experimental results indicate that this algorithm is very successful in routing channels that contain severe constraints. Usually, the routing is accomplished within one track of the mathematical lower bound. The routing algorithm presented here was developed as part of LTX, a computer-aided design system for integrated circuit layout and was implemented on an HP-2100 minicomputer. A typical channel (300 terminals, 100 nets) can be routed in less than 5 seconds. Routing results are presented both for polycell chips under development at Bell Laboratories and for examples that exist in the published literature. For the latter, reductions of 10% in the wiring area were typical.

364 citations

Proceedings ArticleDOI
01 Oct 1987
TL;DR: The REAL REgister ALlocation program uses a track assignment algorithm taken from channel routing called the Left Edge algorithm to process designs output from MAHA and Sehwa, and is thought to be optimal for non-pipelined designs with no conditional branches.
Abstract: This paper describes the REAL REgister ALlocation program. REAL uses a track assignment algorithm taken from channel routing called the Left Edge algorithm. REAL is optimal for non-pipelined designs with no conditional branches. It is thought that REAL is also optimal for designs with conditional branches, pipelined or not. Experimental results are included in the report, which illustrate the optimal solutions found by REAL. REAL is part of the ADAM Advanced Design AutoMation system, and will be used to process designs output from MAHA and Sehwa.

324 citations

Proceedings ArticleDOI
01 Jan 1982
TL;DR: A new, “greedy”, channel-router that always succeeds, usually using no more than one track more than required by channel density, and may be forced in rare cases to make a few connections "off the end” of the channel.
Abstract: We present a new, "greedy", channel-router that is quick, simple, and highly effective. It always succeeds, usually using no more than one track more than required by channel density. (It may be forced in rare cases to make a few connections "off the end" of the channel, in order to succeed.) It assumes that all pins and wiring lie on a common grid, and that vertical wires are on one layer, horizontal on another. The greedy router wires up the channel in a left-to-right, column-by-column manner, wiring each column completely before starting the next. Within each column the router tries to maximize the utility of the wiring produced, using simple, "greedy" heuristics. It may place a net on more than one track for a few columns, and "collapse" the net to a single track later on, using a vertical jog. It may also use a jog to move a net to a track closer to its pin in some future column. The router may occasionally add a new track to the channel, to avoid "getting stuck".

291 citations