Home
/
Authors
/
Daniel Reiter Horn

Author

Daniel Reiter Horn

Bio: Daniel Reiter Horn is an academic researcher from Stanford University. The author has contributed to research in topics: General-purpose computing on graphics processing units & Metaverse. The author has an hindex of 7, co-authored 7 publications receiving 2260 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Brook for GPUs: stream computing on graphics hardware

[...]

Ian Buck¹, Tim Foley¹, Daniel Reiter Horn¹, Jeremy Sugerman¹, Kayvon Fatahalian¹, Mike Houston¹, Pat Hanrahan¹ - Show less +3 more•Institutions (1)

Stanford University¹

01 Aug 2004

TL;DR: This paper presents Brook for GPUs, a system for general-purpose computation on programmable graphics hardware that abstracts and virtualizes many aspects of graphics hardware, and presents an analysis of the effectiveness of the GPU as a compute engine compared to the CPU.

...read moreread less

Abstract: In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

...read moreread less

1,288 citations

Proceedings Article•DOI•

Sequoia: programming the memory hierarchy

[...]

Kayvon Fatahalian¹, Daniel Reiter Horn¹, Timothy James Knight¹, Larkhoon Leem¹, Mike Houston¹, Ji Young Park¹, Mattan Erez¹, Manman Ren¹, Alex Aiken¹, William J. Dally¹, Pat Hanrahan¹ - Show less +7 more•Institutions (1)

Stanford University¹

11 Nov 2006

TL;DR: This work has implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrates efficient performance running Sequoia programs on both of these platforms.

...read moreread less

Abstract: We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms.

...read moreread less

482 citations

Proceedings Article•DOI•

Interactive k-d tree GPU raytracing

[...]

Daniel Reiter Horn¹, Jeremy Sugerman¹, Mike Houston¹, Pat Hanrahan¹•Institutions (1)

Stanford University¹

30 Apr 2007

TL;DR: This work ports Foley et al.'s kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs' branching and looping abilities, and introduces three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart.

...read moreread less

Abstract: Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing on GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs' branching and looping abilities. We introduce three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart. Our optimized implementation achieves 15 - 18 million primary rays per second and 16 - 27 million shadow rays per second on our test scenes.Our system also takes advantage of GPUs' strengths at rasterization and shading to offer a mode where rasterization replaces eye ray scene intersection, and primary hits and local shading are produced with standard Direct3D code. For 1024x1024 renderings of our scenes with shadows and Phong shading, we achieve 12-18 frames per second. Finally, we investigate the efficiency of our implementation relative to the computational resources of our GPUs and also compare it against conventional CPUs and the Cell processor, which both have been shown to raytrace well.

...read moreread less

269 citations

Proceedings Article•DOI•

ClawHMMER: A Streaming HMMer-Search Implementatio

[...]

Daniel Reiter Horn¹, Mike Houston¹, Pat Hanrahan¹•Institutions (1)

Stanford University¹

12 Nov 2005

TL;DR: This work presents a streaming algorithm for evaluating an HMM’s Viterbi probability and refine it for the specific HMM used in biological sequence search and demonstrates that this streaming algorithm on graphics processors can outperform available CPU implementations.

...read moreread less

Abstract: The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search large databases. Many current and upcoming architectures offering large amounts of compute power are designed with data-parallel execution and streaming in mind. We present a streaming algorithm for evaluating an HMMs Viterbi probability and refine it for the specific HMM used in biological sequence search. We implement our streaming algorithm in the Brook language, allowing us to execute the algorithm on graphics processors. We demonstrate that this streaming algorithm on graphics processors can outperform available CPU implementations. We also demonstrate this implementation running on a 16 node graphics cluster.

...read moreread less

112 citations

Proceedings Article•DOI•

LightShop: interactive light field manipulation and rendering

[...]

Daniel Reiter Horn¹, Billy Chen²•Institutions (2)

Stanford University¹, Microsoft²

30 Apr 2007

TL;DR: LightShop is a system that allows a user to interactively manipulate, composite and render multiple light fields, and shows applications in digital photography and demonstrates how to integrate light fields into a modern space-flight game using LightShop.

...read moreread less

Abstract: Light fields can be used to represent an object's appearance with a high degree of realism. However, unlike their geometric counterparts, these image-based representations lack user control for manipulating them. We present a system that allows a user to interactively manipulate, composite and render multiple light fields. LightShop is a modular system consisting of three parts: 1) a set of functions that allow a user to model a scene containing multiple light fields, 2) a ray-shading language that describes how an image should be constructed from a set of light fields, and 3) a real-time light field rendering system in OpenGL that can plug into existing 3D engines as a GLSL shader.We show applications in digital photography and we demonstrate how to integrate light fields into a modern space-flight game using LightShop.

...read moreread less

111 citations

Cited by

PDF

Open Access

More filters

Book•

Accelerated Profile HMM Searches

[...]

Sean R. Eddy¹•Institutions (1)

Janelia Farm Research Campus¹

01 May 2015

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.

...read moreread less

Abstract: Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

...read moreread less

4,492 citations

Proceedings Article•DOI•

Scalable parallel programming with CUDA

[...]

John R. Nickolls¹, Ian Buck¹, Michael Garland¹, Kevin Skadron²•Institutions (2)

Nvidia¹, University of Virginia²

11 Aug 2008

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

Abstract: The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

...read moreread less

2,216 citations

Journal Article•DOI•

A Survey of General-Purpose Computation on Graphics Hardware

[...]

John D. Owens¹, David Luebke², Naga K. Govindaraju³, Mark J. Harris², Jens Krüger⁴, Aaron Lefohn, Timothy John Purcell² - Show less +3 more•Institutions (4)

University of California, Davis¹, Nvidia², Microsoft³, Technische Universität München⁴

01 Mar 2007-Computer Graphics Forum

TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.

...read moreread less

Abstract: The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware. This survey should be of particular interest to researchers who are interested in using the latest GPGPU applications in their systems of interest.

...read moreread less

1,998 citations

Journal Article•DOI•

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

[...]

Andreas W. Götz¹, Mark J. Williamson¹, Dong Xu¹, Duncan Poole², Scott M. Le Grand², Ross C. Walker¹ - Show less +2 more•Institutions (2)

University of California, San Diego¹, Nvidia²

15 Apr 2012-Journal of Chemical Theory and Computation

TL;DR: An implementation of generalized Born implicit solvent all-atom classical molecular dynamics within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs) and shows performance that is on par with, and in some cases exceeds, that of traditional supercomputers.

...read moreread less

Abstract: We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison to simulations on conventional CPU clusters. The implementation supports three different precision models in which the contributions to the forces are calculated in single precision floating point arithmetic but accumulated in double precision (SPDP), or everything is computed in single precision (SPSP) or double precision (DPDP). In addition to performance, we have focused on understanding the implications of the different precision models on the outcome of implicit solvent MD simulations. We show results for a range of tests including the accuracy of single point force evaluations and energy conservation as well as structural properties pertainining to protein dynamics. The numerical noise due to rounding errors within the SPSP precision model is sufficiently large to lead to an accumulation of errors which can result in unphysical trajectories for long time scale simulations. We recommend the use of the mixed-precision SPDP model since the numerical results obtained are comparable with those of the full double precision DPDP model and the reference double precision CPU implementation but at significantly reduced computational cost. Our implementation provides performance for GB simulations on a single desktop that is on par with, and in some cases exceeds, that of traditional supercomputers.

...read moreread less

1,645 citations

GPU Computing

[...]

John D. Owens¹, Mike Houston, David Luebke, Simon Green, John E. Stone, James C. Phillips - Show less +2 more•Institutions (1)

University of California, Davis¹

01 May 2008

TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.

...read moreread less

Abstract: The graphics processing unit (GPU) has become an integral part of today's mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart. The GPU's rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications.

...read moreread less

1,570 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse