scispace - formally typeset
Search or ask a question
Author

Donald Nguyen

Bio: Donald Nguyen is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Data structure & Solver. The author has an hindex of 15, co-authored 25 publications receiving 1321 citations. Previous affiliations of Donald Nguyen include International Council for the Exploration of the Sea.

Papers
More filters
Proceedings ArticleDOI
03 Nov 2013
TL;DR: This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.
Abstract: Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system.We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.

541 citations

Journal ArticleDOI
04 Jun 2011
TL;DR: It is suggested that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.
Abstract: For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

380 citations

Proceedings ArticleDOI
14 Nov 2009
TL;DR: A tuning framework is developed which attempts to predict the optimal configuration based on hardware performance counters and achieves performance within 1% of the best performance of any single configuration for the same set of applications.
Abstract: Performance tuning for data centers is essential and complicated. It is important since a data center comprises thousands of machines and thus a single-digit performance improvement can significantly reduce cost and power consumption. Unfortunately, it is extremely difficult as data centers are dynamic environments where applications are frequently released and servers are continually upgraded.In this paper, we study the effectiveness of different processor prefetch configurations, which can greatly influence the performance of memory system and the overall data center. We observe a wide performance gap when comparing the worst and best configurations, from 1.4% to 75.1%, for 11 important data center applications. We then develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. The framework achieves performance within 1% of the best performance of any single configuration for the same set of applications.

81 citations

Proceedings ArticleDOI
09 Jan 2010
TL;DR: This paper shows that many irregular algorithms have structure that can be exploited and presents three key optimizations that take advantage of algorithmic structure to reduce speculative overheads and describes the implementation of these optimizations in the Galois system and presents experimental results to demonstrate their benefits.
Abstract: Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic parallel execution can be substantial. In this paper, we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads. We describe the implementation of these optimizations in the Galois system and present experimental results to demonstrate their benefits. To the best of our knowledge, this is the first system to exploit algorithmic structure to optimize the execution of irregular programs.

59 citations

Journal ArticleDOI
TL;DR: Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics to solve the challenge of integrating NoSQL data stores to manage distributed systems.
Abstract: Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics.

48 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, the authors analyze data on the sexual behavior of a random sample of individuals, and find that the cumulative distributions of the number of sexual partners during the twelve months prior to the survey decays as a power law with similar exponents for females and males.
Abstract: Many ``real-world'' networks are clearly defined while most ``social'' networks are to some extent subjective. Indeed, the accuracy of empirically-determined social networks is a question of some concern because individuals may have distinct perceptions of what constitutes a social link. One unambiguous type of connection is sexual contact. Here we analyze data on the sexual behavior of a random sample of individuals, and find that the cumulative distributions of the number of sexual partners during the twelve months prior to the survey decays as a power law with similar exponents $\alpha \approx 2.4$ for females and males. The scale-free nature of the web of human sexual contacts suggests that strategic interventions aimed at preventing the spread of sexually-transmitted diseases may be the most efficient approach.

1,476 citations

01 Jan 2013

1,098 citations

Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

816 citations

Journal ArticleDOI
TL;DR: An overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics are provided.

699 citations

Journal ArticleDOI

590 citations