scispace - formally typeset
Search or ask a question
Author

Bruce G. Lindsay

Other affiliations: GlobalFoundries
Bio: Bruce G. Lindsay is an academic researcher from IBM. The author has contributed to research in topics: Database design & Relational database management system. The author has an hindex of 48, co-authored 123 publications receiving 8925 citations. Previous affiliations of Bruce G. Lindsay include GlobalFoundries.


Papers
More filters
Journal ArticleDOI
Chandrasekaran Mohan1, Don Haderle1, Bruce G. Lindsay1, Hamid Pirahesh1, Peter Schwarz1 
TL;DR: ARIES as discussed by the authors is a database management system applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems.
Abstract: DB2TM, IMS, and TandemTM systems. ARIES is applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems. ARIES has been implemented, to varying degrees, in IBM's OS/2TM Extended Edition Database Manager, DB2, Workstation Data Save Facility/VM, Starburst and QuickSilver, and in the University of Wisconsin's EXODUS and Gamma database machine.

1,083 citations

Journal ArticleDOI
TL;DR: The recovery subsystem of an experimental data management system is described and evaluated and the DO-UNDO-REDO protocol allows new recoverable types and operations to be added to the recovery system.
Abstract: The recovery subsystem of an experimental data management system is described and evaluated. The transactmn concept allows application programs to commit, abort, or partially undo their effects. The DO-UNDO-REDO protocol allows new recoverable types and operations to be added to the recovery system Apphcation programs can record data m the transaction log to facilitate application-specific recovery. Transaction undo and redo are based on records kept in a transaction log. The checkpoint mechanism is based on differential fries (shadows). The recovery log is recorded on disk rather than tape.

575 citations

Journal ArticleDOI
01 Sep 2001
TL;DR: The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit and show the superiority of having the relational engines use what is called an “outer union plan” to generate the content of an XML document.
Abstract: XML is rapidly emerging as a standard for exchanging business data on the World Wide Web. For the foreseeable future, however, most business data will continue to be stored in relational database systems. Consequently, if XML is to fulfill its potential, some mechanism is needed to publish relational data as XML documents. Towards that goal, one of the major challenges is finding a way to efficiently structure and tag data from one or more tables as a hierarchical XML document. Different alternatives are possible depending on when this processing takes place and how much of it is done inside the relational engine. In this paper, we characterize and study the performance of these alternatives. Among other things, we explore the use of new scalar and aggregate functions in SQL for constructing complex XML documents directly in the relational engine. We also explore different execution plans for generating the content of an XML document. The results of an experimental study show that constructing XML documents inside the relational engine can have a significant performance benefit. Our results also show the superiority of having the relational engine use what we call an “outer union plan” to generate the content of an XML document.

365 citations

Proceedings ArticleDOI
01 Jun 1998
TL;DR: New algorithms for computing approximate quantiles of large datasets in a single pass are presented, and the main memory requirements are smaller than those reported by an order of magnitude.
Abstract: We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude.We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter.We present the algorithms, their theoretical analysis and simulation results on different datasets.

340 citations

Journal ArticleDOI
TL;DR: This paper concentrates primarily on the description of the R* commit protocols, Presumed Abort (PA) and Presumed Commit (PC), which are extensions of the well-known, two-phase (2P) commit protocol.
Abstract: This paper deals with the transaction management aspects of the R* distributed database system. It concentrates primarily on the description of the R* commit protocols, Presumed Abort (PA) and Presumed Commit (PC). PA and PC are extensions of the well-known, two-phase (2P) commit protocol. PA is optimized for read-only transactions and a class of multisite update transactions, and PC is optimized for other classes of multisite update transactions. The optimizations result in reduced intersite message traffic and log writes, and, consequently, a better response time. The paper also discusses R*'s approach toward distributed deadlock detection and resolution.

318 citations


Cited by
More filters
Proceedings Article
07 Sep 1999
TL;DR: Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Abstract: The nearestor near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the \curse of dimensionality." That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in k-d trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than brute-force linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should su ce for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points Supported by NAVY N00014-96-1-1221 grant and NSF Grant IIS-9811904. Supported by Stanford Graduate Fellowship and NSF NYI Award CCR-9357849. Supported by ARO MURI Grant DAAH04-96-1-0007, NSF Grant IIS-9811904, and NSF Young Investigator Award CCR9357849, with matching funds from IBM, Mitsubishi, Schlumberger Foundation, Shell Foundation, and Xerox Corporation. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999. from the database so as to ensure that the probability of collision is much higher for objects that are close to each other than for those that are far apart. We provide experimental evidence that our method gives signi cant improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition. Experimental results also indicate that our scheme scales well even for a relatively large number of dimensions (more than 50).

3,705 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

2,933 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system, and proposes three order encoding methods that can be used to represent XML order in the relational data model, and also proposes algorithms for translating ordered XPath expressions into SQL using these encoding methods.
Abstract: XML is quickly becoming the de facto standard for data exchange over the Internet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred" XML documents into relations, and translate XML queries into SQL queries over these relations. However, a key issue with such an approach, which has largely been ignored in the research literature, is how (and whether) the ordered XML data model can be efficiently supported by the unordered relational data model. This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system. This is accomplished by encoding order as a data value. We propose three order encoding methods that can be used to represent XML order in the relational data model, and also propose algorithms for translating ordered XPath expressions into SQL using these encoding methods. Finally, we report the results of an experimental study that investigates the performance of the proposed order encoding methods on a workload of ordered XML queries and updates.

2,402 citations

Book
01 Aug 1990
TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Abstract: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. Coverage of emerging topics such as data streams and cloud computing Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

2,395 citations

Proceedings ArticleDOI
TL;DR: The main algorithms for color texture, shape and sketch query that are presented, show example query results, and discuss future directions are presented.
Abstract: In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

2,127 citations