Home
/
Authors
/
Donald Nguyen

Author

Donald Nguyen

Other affiliations: International Council for the Exploration of the Sea

Bio: Donald Nguyen is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Data structure & Solver. The author has an hindex of 15, co-authored 25 publications receiving 1321 citations. Previous affiliations of Donald Nguyen include International Council for the Exploration of the Sea.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A lightweight infrastructure for graph analytics

[...]

Donald Nguyen¹, Andrew Lenharth¹, Keshav Pingali¹•Institutions (1)

University of Texas at Austin¹

03 Nov 2013

TL;DR: This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.

...read moreread less

Abstract: Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system.We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.

...read moreread less

541 citations

Journal Article•DOI•

The tao of parallelism in algorithms

[...]

Keshav Pingali¹, Donald Nguyen¹, Milind Kulkarni², Martin Burtscher³, M. Amber Hassaan¹, Rashid Kaleem¹, Tsung-Hsien Lee¹, Andrew Lenharth¹, Roman Manevich¹, Mario Méndez-Lojo¹, Dimitrios Prountzos¹, Xin Sui¹ - Show less +8 more•Institutions (3)

University of Texas at Austin¹, Purdue University², Texas State University³

04 Jun 2011

TL;DR: It is suggested that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

...read moreread less

Abstract: For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

...read moreread less

380 citations

Proceedings Article•DOI•

Machine learning-based prefetch optimization for data center applications

[...]

Shih-Wei Liao¹, Tzu-Han Hung², Donald Nguyen³, Chin-Yen Chou¹, Chia-Heng Tu¹, Hucheng Zhou⁴ - Show less +2 more•Institutions (4)

National Taiwan University¹, Princeton University², University of Texas at Austin³, Tsinghua University⁴

14 Nov 2009

TL;DR: A tuning framework is developed which attempts to predict the optimal configuration based on hardware performance counters and achieves performance within 1% of the best performance of any single configuration for the same set of applications.

...read moreread less

Abstract: Performance tuning for data centers is essential and complicated. It is important since a data center comprises thousands of machines and thus a single-digit performance improvement can significantly reduce cost and power consumption. Unfortunately, it is extremely difficult as data centers are dynamic environments where applications are frequently released and servers are continually upgraded.In this paper, we study the effectiveness of different processor prefetch configurations, which can greatly influence the performance of memory system and the overall data center. We observe a wide performance gap when comparing the worst and best configurations, from 1.4% to 75.1%, for 11 important data center applications. We then develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. The framework achieves performance within 1% of the best performance of any single configuration for the same set of applications.

...read moreread less

81 citations

Proceedings Article•DOI•

Structure-driven optimizations for amorphous data-parallel programs

[...]

Mario Méndez-Lojo¹, Donald Nguyen¹, Dimitrios Prountzos¹, Xin Sui¹, M. Amber Hassaan¹, Milind Kulkarni², Martin Burtscher¹, Keshav Pingali¹ - Show less +4 more•Institutions (2)

University of Texas at Austin¹, Purdue University²

09 Jan 2010

TL;DR: This paper shows that many irregular algorithms have structure that can be exploited and presents three key optimizations that take advantage of algorithmic structure to reduce speculative overheads and describes the implementation of these optimizations in the Galois system and presents experimental results to demonstrate their benefits.

...read moreread less

Abstract: Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic parallel execution can be substantial. In this paper, we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads. We describe the implementation of these optimizations in the Galois system and present experimental results to demonstrate their benefits. To the best of our knowledge, this is the first system to exploit algorithmic structure to optimize the execution of irregular programs.

...read moreread less

59 citations

Journal Article•DOI•

Parallel graph analytics

[...]

Andrew Lenharth¹, Donald Nguyen¹, Keshav Pingali¹•Institutions (1)

University of Texas at Austin¹

26 Apr 2016-Communications of The ACM

TL;DR: Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics to solve the challenge of integrating NoSQL data stores to manage distributed systems.

...read moreread less

Abstract: Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics.

...read moreread less

48 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Web of Human Sexual Contacts

[...]

Fredrik Liljeros¹, Christofer Edling¹, Luís A. Nunes Amaral², H. Eugene Stanley², Yvonne Åberg¹ - Show less +1 more•Institutions (2)

Stockholm University¹, Boston University²

25 Jun 2001-arXiv: Statistical Mechanics

TL;DR: In this article, the authors analyze data on the sexual behavior of a random sample of individuals, and find that the cumulative distributions of the number of sexual partners during the twelve months prior to the survey decays as a power law with similar exponents for females and males.

...read moreread less

Abstract: Many ``real-world'' networks are clearly defined while most ``social'' networks are to some extent subjective. Indeed, the accuracy of empirically-determined social networks is a question of some concern because individuals may have distinct perceptions of what constitutes a social link. One unambiguous type of connection is sexual contact. Here we analyze data on the sexual behavior of a random sample of individuals, and find that the cumulative distributions of the number of sexual partners during the twelve months prior to the survey decays as a power law with similar exponents $\alpha \approx 2.4$ for females and males. The scale-free nature of the web of human sexual contacts suggests that strategic interventions aimed at preventing the spread of sexually-transmitted diseases may be the most efficient approach.

...read moreread less

1,476 citations

What is Twitter

[...]

Rizal Setya Perdana

01 Jan 2013

1,098 citations

Proceedings Article•DOI•

Ligra: a lightweight graph processing framework for shared memory

[...]

Julian Shun¹, Guy E. Blelloch¹•Institutions (1)

Carnegie Mellon University¹

23 Feb 2013

TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

...read moreread less

Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

...read moreread less

816 citations

Journal Article•DOI•

Trends in big data analytics

[...]

Karthik Kambatla¹, Giorgios Kollias², Vipin Kumar³, Ananth Grama¹•Institutions (3)

Purdue University¹, IBM², University of Minnesota³

01 Jul 2014-Journal of Parallel and Distributed Computing

TL;DR: An overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics are provided.

...read moreread less

699 citations

Journal Article•DOI•

The Art of Multiprocessor Programming

[...]

D.M. Hutton

17 Oct 2008-Kybernetes

590 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse