Showing papers by "Joseph M. Hellerstein published in 1997"

PDF

Open Access

Proceedings Article•DOI•

[...]

Joseph M. Hellerstein¹, Peter J. Haas², Helen J. Wang¹•Institutions (2)

University of California, Berkeley¹, IBM²

01 Jun 1997

TL;DR: In this article, the authors propose an online aggregation interface that allows users to both observe the progress of their aggregation queries and control execution on the fly, and present a suite of techniques that extend a database system to meet these requirements.

...read moreread less

Abstract: Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.

...read moreread less

1,109 citations

Journal Article•

The New Jersey Data Reduction Report.

[...]

Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik - Show less +8 more

01 Jan 1997-IEEE Data(base) Engineering Bulletin

245 citations

Proceedings Article•DOI•

Concurrency and recovery in generalized search trees

[...]

Marcel Kornacker¹, Chandrasekaran Mohan², Joseph M. Hellerstein¹•Institutions (2)

University of California, Berkeley¹, IBM²

01 Jun 1997

TL;DR: This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read isolation outside the context of B-trees.

...read moreread less

Abstract: This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read. The algorithms are developed in the context of the Generalized Search Tree (GiST) data structure, an index structure supporting an extensible set of queries and data types. Although developed in a GiST context, the algorithms are generally applicable to many tree-based access methods. The concurrency control protocol is based on an extension of the link technique originally developed for B-trees, and completely avoids holding node locks during I/Os. Repeatable read isolation is achieved with a novel combination of predicate locks and two-phase locking of data records. To our knowledge, this is the first time that isolation issues have been addressed outside the context of B-trees. A discussion of the fundamental structural differences between B-trees and more general tree structures like GiSTs explains why the algorithms developed here deviate from their B-tree counterparts. An implementation of GiSTs emulating B-trees in DB2/Common Server is underway.

...read moreread less

169 citations

Proceedings Article•DOI•

High-performance sorting on networks of workstations

[...]

Andrea C. Arpaci-Dusseau¹, Remzi H. Arpaci-Dusseau¹, David E. Culler¹, Joseph M. Hellerstein¹, David A. Patterson¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

01 Jun 1997

TL;DR: The performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW), finds that parallel sorting on a NOW is competitive to sorting on the large-scale SMPs that have traditionally held the performance records.

...read moreread less

Abstract: We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale SMPs that have traditionally held the performance records. On a 64-node cluster, we sort 6.0 GB in just under one minute, while a 32-node cluster finishes the Datamation benchmark in 2.41 seconds.Our implementations can be applied to a variety of disk, memory, and processor configurations; we highlight salient issues for tuning each component of the system. We evaluate the use of commodity operating systems and hardware for parallel sorting. We find existing OS primitives for memory management and file access adequate. Due to aggregate communication and disk bandwidth requirements, the bottleneck of our system is the workstation I/O bus.

...read moreread less

165 citations

Proceedings Article•DOI•

On the analysis of indexing schemes

[...]

Joseph M. Hellerstein¹, Elias Koutsoupias², Christos H. Papadimitriou¹•Institutions (2)

University of California, Berkeley¹, University of California, Los Angeles²

01 May 1997

TL;DR: A framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy and access overhead is defined.

...read moreread less

Abstract: We consider the problem of indexing general database workloads (combinations of data sets and sets of potential queries). We define a framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy (how many times each item in the data set is stored), and access overhead (how many times more blocks than necessary does a query retrieve). Using this framework we present some initial results, showing upper and lower bounds and trade-offs between them in the case of multi-dimensional range queries and set queries.

...read moreread less

153 citations

The rd-tree: an index structure for sets

[...]

Joseph M. Hellerstein, Avi Pfeffer

01 Jan 1997

TL;DR: This paper describes the RD-Tree, an index structure for set-valued attributes that is an adaptation of the R-Tree that exploits a natural analogy between spatial objects and sets.

...read moreread less

Abstract: The implementation of complex types in Object-Relational database systems requires the development of efficient access methods. In this paper we describe the RD-Tree, an index structure for set-valued attributes. The RD-Tree is an adaptation of the R-Tree that exploits a natural analogy between spatial objects and sets. A particular engineering difficulty arises in representing the keys in an RD-Tree. We propose several different representations, and describe the tradeoffs of using each. An implementation and validation of this work is underway in the SHORE object repository.

...read moreread less

52 citations

Proceedings Article•

Towards an analysis of indexing schemes

[...]

Joseph M. Hellerstein, Christos H. Papadimitriou, Elias Koutsoupias

01 Jan 1997

19 citations

Journal Article•

Online Processing Redux.

[...]

Joseph M. Hellerstein

01 Jan 1997-IEEE Data(base) Engineering Bulletin

TL;DR: It is argued that online processing for large queries requires redesigning major portions of a database system, and a mass-market approach for designing and measuring data-intensive processing is proposed.

...read moreread less

Abstract: The term \online" has become an all-too-common addendum to database system names of the day. In this article we reexamine the notion of processing queries online. We distinguish between online processing and preprocessing, and argue that online processing for large queries requires redesigning major portions of a database system. We highlight pressing applications for truly online processing, and sketch ongoing research in these applications at Berkeley. We also outline basic techniques for running long queries online. We close by reevaluating the typical measurements of cost/performance for online systems, and propose a mass-market approach for designing and measuring data-intensive processing.

...read moreread less

16 citations

Proceedings Article•

Towards a theory of indexability

[...]

Joseph M. Hellerstein, Elias Koutsoupias, Christos H. Papadimitriou

01 Jan 1997

2 citations

Proceedings Article•

Towards a Crystal Ball for Data Retrieval.

[...]

Joseph M. Hellerstein

01 Jan 1997

TL;DR: This paper proposes changing the black-box model to one of a "crystal ball", in which users are given feedback on their queries as they run, so that they can predict the utility of their query results, control the behavior of the queries on the y, and better understand the operation of the system.

...read moreread less

Abstract: Information Systems { both databases and textsearch programs { are typically architected as \black boxes": a user submits a request, the system performs an unknown sequences of operations, and after some time an answer set is returned. Two trends are conspiring to make such architectures undesirable. First, users of these systems are often quite naive, and unsure of what they are doing. Second, the queries submitted to these systems are taking increasing amounts of time to complete. These trends together lead to a frustrating experience for users: they are unsure if their inputs are appropriate, and the cost of an inappropriate input is often a long wait followed by a useless or misleading result. In this paper we propose changing the black-box model to one of a \crystal ball", in which users are given feedback on their queries as they run, so that they can predict the utility of their query results, control the behavior of the queries on the y, and better understand the operation of the system. We highlight some initial work in this vein, and describe opportunities for similar e orts in new applications.

...read moreread less

1 citations