Home
/
Authors
/
Darren Erik Vengroff

Author

Darren Erik Vengroff

Other affiliations: University of Delaware

Bio: Darren Erik Vengroff is an academic researcher from Duke University. The author has contributed to research in topics: Range searching & Context (language use). The author has an hindex of 6, co-authored 6 publications receiving 682 citations. Previous affiliations of Darren Erik Vengroff include University of Delaware.

Papers

PDF

Open Access

More filters

External-Memory Graph Algorithms

[...]

Yi-Feng Chian, Michael T. Goodrich, Edward Grove¹, Roberto Tamassia, Darren Erik Vengroff, Jeffrey Scott Vitter - Show less +2 more•Institutions (1)

Max Planck Society¹

01 Jan 1995

313 citations

Proceedings Article•DOI•

External-memory computational geometry

[...]

Michael T. Goodrich¹, Jyh-Jong Tsay, Darren Erik Vengroff, Jeffrey Scott Vitter•Institutions (1)

Johns Hopkins University¹

03 Nov 1993

TL;DR: New techniques for designing efficient algorithms for computational geometry problems that are too large to be solved in internal memory are given and these algorithms are the first known optimal algorithms for a wide range of two-level and hierarchical multilevel memory models, including parallel models.

...read moreread less

Abstract: In this paper we give new techniques for designing efficient algorithms for computational geometry problems that are too large to be solved in internal memory. We use these techniques to develop optimal and practical algorithms for a number of important large-scale problems. We discuss our algorithms primarily in the context of single processor/single disk machines, a domain in which they are not only the first known optimal results but also of tremendous practical value. Our methods also produce the first known optimal algorithms for a wide range of two-level and hierarchical multilevel memory models, including parallel models. The algorithms are optimal both in terms of I/O cost and internal computation. >

...read moreread less

217 citations

Proceedings Article•DOI•

Efficient 3-D range searching in external memory

[...]

Darren Erik Vengroff¹, Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jul 1996

TL;DR: The authors' is the first 3-D range search data structure that simultaneously achieves both a base-B logarithmic search overhead and a fully blocked output component (namely, K/B), and provides three dimensional results comparable to those provided by [8, 10] for the two dimensional case.

...read moreread less

Abstract: We present a new approach to designing data structures for the important problem of externalmemory range searching in two and three dimensions. We construct data structures for answering range queries in O ( (log log log~ N) logB N + K/B) 1/0 operations, where N is the number of points in the data structure, B is the 1/0 block size, and K is the number of points in the answer to the query. We base our data structures on the novel concept of B-approximate boundaries, which are manifolds that partition space into regions based on the output size of queries at points within the space. Our data structures answer a longstanding open problem by providing three dimensional results comparable to those provided by [8, 10] for the two dimensional case, though completely new techniques are used. Ours is the first 3-D range search data structure that simultaneously achieves both a base-B logarithmic search overhead (namely, (log log log~ N) logB N) and a fully blocked output component (namely, K/B). This gives us an *Supported in part by the U.S. Army Research Office under grant DAAH04–93–G–O076 and by the National Science Foundation under grant DMR–9217290. Portions of this work were conducted while visiting the University of Michigan. +Supported in part by the National Science Foundation under grant CCR–9522047, and by the U.S. Army Research Office under grant DAAH04-93-G-O076. Permission to make digitellhard copies of all or pan of thk material for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice ia given that copyright ia by permission of the ACM, Inc. To copy olherwiae, to reoublish. to ooat on servers or to radktribute to tists, requiraa specific perrsr fee. STOC’96, Philadelphia PA, USA 01996 ACM ()-89791-785-5/96/05. .$3.50 Jeffrey Scott Vittert Duke University

...read moreread less

68 citations

Proceedings Article•DOI•

Indexing for data models with constraints and classes (extended abstract)

[...]

Paris C. Kanellakis, Sridhar Ramaswamy, Darren Erik Vengroff, Jeffrey Scott Vitter

01 Aug 1993

TL;DR: A simple algorithm with good worst-case performance for the class indexing problem is identified and its query I/O time is improved using the forest structure of the class hierarchy and techniques from the constraintindexing problem.

...read moreread less

Abstract: We examine I/O-efficient data structures that provide indexing support for new data models. The database languages of these models include concepts from constraint programming (e.g., relational tuples are generalized to conjunctions of constraints) and from object-oriented programming (e.g., objects are organized in class hierarchies). Let n be the size of the database, c the number of classes, B the secondary storage page size, and t the size of the output of a query. Indexing by one attribute in the constraint data model (for a fairly general type of constraints) is equivalent to external dynamic interval management, which is a special case of external dynamic 2-dimensional range searching. We present a semi-dynamic data structure for this problem which has optimal worst-case space O(n/B) pages and optimal query I/O time O(logBn+t/B) and has O(logBn+(log2Bn)/B) amortized insert I/O time. If the order of the insertions is random then the expected number of I/O operations needed to perform insertions is reduced to O(logBn). Indexing by one attribute and by class name in an object-oriented model, where objects are organized as a forest hierarchy of classes, is also a special case of external dynamic 2-dimensional range searching. Based on this observation we first identify a simple algorithm with good worst-case performance for the class indexing problem. Using the forest structure of the class hierarchy and techniques from the constraint indexing problem, we improve its query I/O time from O(log2c logBn + t/B) to O(logB + log2B).

...read moreread less

68 citations

Proceedings Article•

External-Memory Computational Geometry (Preliminary Version)

[...]

Michael T. Goodrich¹, Jyh-Jong Tsay², Darren Erik Vengroff³, Jeffrey Scott Vitter³•Institutions (3)

Johns Hopkins University¹, National Chung Cheng University², Duke University³

01 Jan 1993

TL;DR: New techniques for designing optimal algorithms for computational geometry problems that are too large to be solved in internal memory are given and these algorithms are optimal both in terms of I/O cost and internal computation.

...read moreread less

Abstract: In this paper we give new techniques for designing e cient algorithms for computational geometry problems that are too large to be solved in internal memory. We use these techniques to develop optimal and practical algorithms for a number of important largescale problems. We discuss our algorithms primarily in the context of single processor/single disk machines, a domain in which they are not only the rst known optimal results but also of tremendous practical value. Our methods also produce the rst known optimal algorithms for a wide range of two-level and hierarchical multilevel memory models, including parallel models. The algorithms are optimal both in terms of I/O cost and internal computation.

...read moreread less

13 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Multidimensional access methods

[...]

Volker Gaede¹, Oliver Günther²•Institutions (2)

Imperial College London¹, Humboldt State University²

01 Jun 1998-ACM Computing Surveys

TL;DR: The class of point access methods, which are used to search sets of points in two or more dimensions, are presented and a discussion of theoretical and experimental results concerning the relative performance of various approaches are discussed.

...read moreread less

Abstract: Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More than ten years of spatial database research have resulted in a great variety of multidimensional access methods to support such operations. We give an overview of that work. After a brief survey of spatial data management in general, we first present the class of point access methods, which are used to search sets of points in two or more dimensions. The second part of the paper is devoted to spatial access methods to handle extended objects, such as rectangles or polyhedra. We conclude with a discussion of theoretical and experimental results concerning the relative performance of various approaches.

...read moreread less

1,758 citations

Report•DOI•

Large-scale Graph Computation on Just a PC

[...]

Aapo Kyrola

01 May 2014

TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.

...read moreread less

Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

...read moreread less

907 citations

Journal Article•DOI•

External memory algorithms and data structures: dealing with massive data

[...]

Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jun 2001-ACM Computing Surveys

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.

...read moreread less

Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

...read moreread less

751 citations

Journal Article•DOI•

A Survey on PageRank Computing

[...]

Pavel Berkhin¹•Institutions (1)

Yahoo!¹

01 Jan 2005-Internet Mathematics

TL;DR: The theoretical foundations of the PageRank formulation are examined, the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability.

...read moreread less

Abstract: This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRan...

...read moreread less

479 citations

Proceedings Article•DOI•

On indexing mobile objects

[...]

George Kollios¹, Dimitrios Gunopulos¹, Vassilis J. Tsotras¹•Institutions (1)

University of California, Riverside¹

01 May 1999

TL;DR: A lower bound on the number of I/O’s needed to answer the d-dimensional problem is given and a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time is given.

...read moreread less

Abstract: We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the l-dimensional case, we give (i) a dynamic, external memory algorithm with guaranteed worst case performance and linear space and (ii) a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time. We also give an algorithm with guaranteed logarithmic query time for a restricted version of the problem. We present extensions of our techniques to two dimensions. In addition we give a lower bound on the number of I/O’s needed to answer the d-dimensional problem. Initial experimental results and comparisons to traditional indexing approaches are also included.

...read moreread less

413 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

Collapse