Home
/
Authors
/
Darren Erik Vengroff

Author

Darren Erik Vengroff

Bio: Darren Erik Vengroff is an academic researcher from Brown University. The author has contributed to research in topics: Spatial analysis & Information system. The author has an hindex of 7, co-authored 8 publications receiving 694 citations. Previous affiliations of Darren Erik Vengroff include Duke University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

External-memory graph algorithms

[...]

Yi-Jen Chiang¹, Michael T. Goodrich², Edward F. Grove³, Roberto Tamassia¹, Darren Erik Vengroff¹, Jeffrey Scott Vitter³ - Show less +2 more•Institutions (3)

Brown University¹, Johns Hopkins University², Duke University³

22 Jan 1995

TL;DR: A collection of new techniques for designing and analyzing external-memory algorithms for graph problems and illustrating how these techniques can be applied to a wide variety of speci c problems are presented.

...read moreread less

Abstract: We present a collection of new techniques for designing and analyzing e cient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of speci c problems. Our results include: Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the \proximate neighbors" problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for e ciently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. Time-forward processing. We present a general technique for evaluating circuits (or \circuit-like" computations) in external memory. We also use this in a deterministic list ranking algorithm. Department of Computer Science, Box 1910, Brown University, Providence, RI 02912{1910. y Supported in part by the National Science Foundation, by the U.S. Army Research O ce, and by the Advanced Research

...read moreread less

304 citations

Proceedings Article•

Indexing for Data Models with Constraints and Classes.

[...]

Paris C. Kanellakis¹, Sridhar Ramaswamy¹, Darren Erik Vengroff², Jeffrey Scott Vitter²•Institutions (2)

Brown University¹, Duke University²

01 Jan 1993

TL;DR: In this article, a semi-dynamic data structure for indexing in constraint data models is presented, which has optimal worst-case space of O(n/B) pages and optimal query I/O time O(log_B n + t/B), where t is the size of the output of a query.

...read moreread less

Abstract: We examine I/O-efficient data structures that provide indexing support for new data models. The database languages of these models include concepts from constraint programming (e.g., relational tuples are generalized to conjunctions of constraints) and from object-oriented programming (e.g., objects are organized in class hierarchies). Let $n$ be the size of the database, $c$ the number of classes, $B$ the secondary storage page size, and $t$ the size of the output of a query. Indexing by one attribute in the constraint data model (for a fairly general type of constraints) is equivalent to external dynamic interval management, which is a special case of external dynamic 2-dimensional range searching. We present a semi-dynamic data structure for this problem which has optimal worst-case space $O(n/B)$ pages and optimal query I/O time $O(\log_B n + t/B)$ and has $O(\log_B n + (\log^2_B n) / B)$ amortized insert I/O time. If the order of the insertions is random then the expected number of I/O operations needed to perform insertions is reduced to $O(\log_B n)$. Indexing by one attribute and by class name in an object-oriented model, where objects are organized as a forest hierarchy of classes, is also a special case of external dynamic 2-dimensional range searching. Based on this observation we first identify a simple algorithm with good worst-case performance for the class indexing problem. Using the forest structure of the class hierarchy and techniques from the constraint indexing problem, we improve its query I/O time from $O(\log_2 c \log_B n +t/B)$ to $O(\log_B n + t/B + \log_2 B)$.

...read moreread less

147 citations

Journal Article•DOI•

External-Memory Algorithms for Processing Line Segments in Geographic Information Systems

[...]

Lars Arge¹, Darren Erik Vengroff, Jeffrey Scott Vitter²•Institutions (2)

Aarhus University¹, Purdue University²

12 Jan 1996-BRICS Report Series

TL;DR: This paper develops efficient new external-memory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red-blue line segment intersection reporting, and general line segments intersection reporting.

...read moreread less

Abstract: In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this paper we develop efficient new external-memory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red-blue line segment intersection reporting, and general line segment intersection reporting. In GIS systems, the first three problems are useful for rendering and modeling, and the latter two are frequently used for overlaying maps and extracting informationfrom them.

...read moreread less

60 citations

Journal Article•DOI•

Indexing for Data Models with Constraints and Classes

[...]

Paris C. Kanellakis¹, Sridhar Ramaswamy¹, Darren Erik Vengroff², Jeffrey Scott Vitter²•Institutions (2)

Brown University¹, Duke University²

01 Jan 1994

TL;DR: This work identifies a simple algorithm with good worst-case performance for the class indexing problem and improves its query I/O time from $O(\log_2 c \log_B n +t/B)$ to $O(log_b n + t/B + \log-2 B)$.

...read moreread less

52 citations

Journal Article•DOI•

External-Memory Algorithms for Processing Line Segments in Geographic Information Systems

[...]

Lars Arge¹, Darren Erik Vengroff, Jeffrey Scott Vitter²•Institutions (2)

Aarhus University¹, Purdue University²

01 Jan 2007-Algorithmica

TL;DR: This paper develops efficient external-memory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red--blue line segment intersection reporting, and general line segments intersection reporting.

...read moreread less

Abstract: In the design of algorithms for large-scale applications it is essential to consider the problem of minimizing I/O communication. Geographical information systems (GIS) are good examples of such large-scale applications as they frequently handle huge amounts of spatial data. In this paper we develop efficient external-memory algorithms for a number of important problems involving line segments in the plane, including trapezoid decomposition, batched planar point location, triangulation, red--blue line segment intersection reporting, and general line segment intersection reporting. In GIS systems the first three problems are useful for rendering and modeling, and the latter two are frequently used for overlaying maps and extracting information from them.

...read moreread less

44 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Multidimensional access methods

[...]

Volker Gaede¹, Oliver Günther²•Institutions (2)

Imperial College London¹, Humboldt State University²

01 Jun 1998-ACM Computing Surveys

TL;DR: The class of point access methods, which are used to search sets of points in two or more dimensions, are presented and a discussion of theoretical and experimental results concerning the relative performance of various approaches are discussed.

...read moreread less

Abstract: Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More than ten years of spatial database research have resulted in a great variety of multidimensional access methods to support such operations. We give an overview of that work. After a brief survey of spatial data management in general, we first present the class of point access methods, which are used to search sets of points in two or more dimensions. The second part of the paper is devoted to spatial access methods to handle extended objects, such as rectangles or polyhedra. We conclude with a discussion of theoretical and experimental results concerning the relative performance of various approaches.

...read moreread less

1,758 citations

Journal Article•DOI•

Constraint logic programming : A survey

[...]

Joxan Jaffar, Michael J. Maher

01 May 1994-Journal of Logic Programming

TL;DR: This survey of CLP is to give a systematic description of the major trends in terms of common fundamental concepts and the three main parts cover the theory, implementation issues, and programming for applications.

...read moreread less

Abstract: Constraint Logic Programming (CLP) is a merger of two declarative paradigms: constraint solving and logic programming. Although a relatively new field, CLP has progressed in several quite different directions. In particular, the early fundamental concepts have been adapted to better serve in different areas of applications. In this survey of CLP, a primary goal is to give a systematic description of the major trends in terms of common fundamental concepts. The three main parts cover the theory, implementation issues, and programming for applications.

...read moreread less

1,571 citations

Report•DOI•

Large-scale Graph Computation on Just a PC

[...]

Aapo Kyrola

01 May 2014

TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.

...read moreread less

Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

907 citations

Proceedings Article•DOI•

GraphChi: large-scale graph computation on just a PC

[...]

Aapo Kyrola¹, Guy E. Blelloch¹, Carlos Guestrin²•Institutions (2)

Carnegie Mellon University¹, University of Washington²

08 Oct 2012

TL;DR: GraphChi as mentioned in this paper is a disk-based system for computing efficiently on graphs with billions of edges, using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method.

...read moreread less

Abstract: Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts.In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives.By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

874 citations

Journal Article•DOI•

External memory algorithms and data structures: dealing with massive data

[...]

Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jun 2001-ACM Computing Surveys

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.

...read moreread less

Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

...read moreread less

751 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102

Collapse