Home
/
Authors
/
David Tarjan

Author

David Tarjan

Other affiliations: University of Virginia

Bio: David Tarjan is an academic researcher from Nvidia. The author has contributed to research in topics: Thread (computing) & Register file. The author has an hindex of 23, co-authored 37 publications receiving 6771 citations. Previous affiliations of David Tarjan include University of Virginia.

Topics: Thread (computing), Register file, Cache, Memory data register, Stack register ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Rodinia: A benchmark suite for heterogeneous computing

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Sang-Ha Lee¹, Kevin Skadron¹ - Show less +3 more•Institutions (1)

University of Virginia¹

04 Oct 2009

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

2,697 citations

Proceedings Article•DOI•

Temperature-aware microarchitecture

[...]

Kevin Skadron¹, Mircea R. Stan¹, Wei Huang¹, Sivakumar Velusamy¹, Karthik Sankaranarayanan¹, David Tarjan¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 May 2003

TL;DR: HotSpot is described, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.

...read moreread less

Abstract: With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package's capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies.This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finite-element simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): "temperature-tracking" frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM.

...read moreread less

1,252 citations

Journal Article•DOI•

Temperature-aware microarchitecture: Modeling and implementation

[...]

Kevin Skadron¹, Mircea R. Stan¹, Karthik Sankaranarayanan¹, Wei Huang¹, Sivakumar Velusamy¹, David Tarjan¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 Mar 2004-ACM Transactions on Architecture and Code Optimization

TL;DR: HotSpot is described, an accurate yet fast and practical model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, that sensor imprecision has a substantial impact on the performance of DTM, and that the inclusion of lateral resistances for thermal diffusion is important for accuracy.

...read moreread less

Abstract: With cooling costs rising exponentially, designing cooling solutions for worst-case power dissipation is prohibitively expensive. Chips that can autonomously modify their execution and power-dissipation characteristics permit the use of lower-cost cooling solutions while still guaranteeing safe temperature regulation. Evaluating techniques for this dynamic thermal management (DTM), however, requires a thermal model that is practical for architectural studies.This paper describes HotSpot, an accurate yet fast and practical model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finite-element simulation. The paper also introduces several effective methods for DTM: "temperature-tracking" frequency scaling, "migrating computation" to spare hardware units, and a "hybrid" policy that combines fetch gating with dynamic voltage scaling. The latter two achieve their performance advantage by exploiting instruction-level parallelism, showing the importance of microarchitecture research in helping control the growth of cooling costs.Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, that sensor imprecision has a substantial impact on the performance of DTM, and that the inclusion of lateral resistances for thermal diffusion is important for accuracy.

...read moreread less

786 citations

Journal Article•DOI•

A performance study of general-purpose applications on graphics processors using CUDA

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Kevin Skadron¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper uses NVIDIA's C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU.

...read moreread less

660 citations

Proceedings Article•DOI•

Dynamic warp subdivision for integrated branch and memory divergence tolerance

[...]

Jiayuan Meng¹, David Tarjan¹, Kevin Skadron¹•Institutions (1)

University of Virginia¹

19 Jun 2010

TL;DR: This paper introduces dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space, and improves latency hiding and memory level parallelism (MLP).

...read moreread less

Abstract: SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget. However, throughput is reduced when a set of threads operating in lockstep (a warp) are stalled due to long latency memory accesses. The resulting idle cycles are extremely costly. Multi-threading can hide latencies by interleaving the execution of multiple warps, but deep multi-threading using many warps dramatically increases the cost of the register files (multi-threading depth x SIMD width), and cache contention can make performance worse. Instead, intra-warp latency hiding should first be exploited. This allows threads that are ready but stalled by SIMD restrictions to use these idle cycles and reduces the need for multi-threading among warps. This paper introduces dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space. Independent scheduling entities allow divergent branch paths to interleave their execution, and allow threads that hit to run ahead. The result is improved latency hiding and memory level parallelism (MLP). We evaluate the technique on a coherent cache hierarchy with private L1 caches and a shared L2 cache. With an area overhead of less than 1%, experiments with eight data-parallel benchmarks show our technique improves performance on average by 1.7X.

...read moreread less

300 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Rodinia: A benchmark suite for heterogeneous computing

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Sang-Ha Lee¹, Kevin Skadron¹ - Show less +3 more•Institutions (1)

University of Virginia¹

04 Oct 2009

...read moreread less

2,697 citations

Proceedings Article•DOI•

Temperature-aware microarchitecture

[...]

Kevin Skadron¹, Mircea R. Stan¹, Wei Huang¹, Sivakumar Velusamy¹, Karthik Sankaranarayanan¹, David Tarjan¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 May 2003

...read moreread less

1,252 citations

Journal Article•DOI•

HotSpot: a compact thermal modeling methodology for early-stage VLSI design

[...]

Wei Huang¹, S. Ghosh¹, Sivakumar Velusamy¹, Karthik Sankaranarayanan¹, Kevin Skadron¹, Mircea R. Stan¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 May 2006-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The HotSpot compact thermal modeling approach is especially well suited for preregister transfer level (RTL) and presynthesis thermal analysis and is able to provide detailed static and transient temperature information across the die and the package, as it is also computationally efficient.

...read moreread less

Abstract: This paper presents HotSpot-a modeling methodology for developing compact thermal models based on the popular stacked-layer packaging scheme in modern very large-scale integration systems. In addition to modeling silicon and packaging layers, HotSpot includes a high-level on-chip interconnect self-heating power and thermal model such that the thermal impacts on interconnects can also be considered during early design stages. The HotSpot compact thermal modeling approach is especially well suited for preregister transfer level (RTL) and presynthesis thermal analysis and is able to provide detailed static and transient temperature information across the die and the package, as it is also computationally efficient.

...read moreread less

985 citations

Journal Article•DOI•

Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey

[...]

Longlong Jing¹, Yingli Tian¹•Institutions (1)

City University of New York¹

01 Nov 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided.

...read moreread less

Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

...read moreread less

876 citations

Journal Article•DOI•

Temperature-aware microarchitecture: Modeling and implementation

[...]