scispace - formally typeset
Search or ask a question
Author

Philip A. Bernstein

Bio: Philip A. Bernstein is an academic researcher from Microsoft. The author has contributed to research in topics: Database schema & Concurrency control. The author has an hindex of 72, co-authored 248 publications receiving 28365 citations. Previous affiliations of Philip A. Bernstein include Wang Institute of Graduate Studies & Harvard University.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents an overview of the SDD-1 design and its solutions to the above problems.
Abstract: The declining cost of computer hardware and the increasing data processing needs of geographically dispersed organizations have led to substantial interest in distributed data management. SDD-1 is a distributed database management system currently being developed by Computer Corporation of America. Users interact with SDD-1 precisely as if it were a nondistributed database system because SDD-1 handles all issues arising from the distribution of data. These issues include distributed concurrency control, distributed query processing, resiliency to component failure, and distributed directory management. This paper presents an overview of the SDD-1 design and its solutions to the above problems.This paper is the first of a series of companion papers on SDD-1 (Bernstein and Shipman [2], Bernstein et al. [4], and Hammer and Shipman [14]).

253 citations

Journal ArticleDOI
TL;DR: The concurrency control strategy of SDD-1 guarantees database consistency in the face of distribution and replication of portions of data distributed throughout a network.
Abstract: This paper presents the concurrency control strategy of SDD-1. SDD-1, a System for Distributed Databases, is a prototype distributed database system being developed by Computer Corporation of America. In SDD-1, portions of data distributed throughout a network may be replicated at multiple sites. The SDD-1 concurrency control guarantees database consistency in the face of such distribution and replication.This paper is one of a series of companion papers on SDD-1 [4, 10, 12, 21].

249 citations

Journal ArticleDOI
TL;DR: It is shown why locking mechanisms lead to correct operation, it is shown that two proposed mechanisms for distributed environments are special cases of locking, and a new version of lockdng is presented that alows more concurrency than past methods.
Abstract: An arbitrary interleaved execution of transactions in a database system can lead to an inconsistent database state. A number of synchronization mechanisms have been proposed to prevent such spurious behavior. To gain insight into these mechanisms, we analyze them in a simple centralized system that permits one read operation and one write operation per transaction. We show why locking mechanisms lead to correct operation, we show that two proposed mechanisms for distributed environments are special cases of locking, and we present a new version of lockdng that alows more concurrency than past methods. We also examine conflict graph analysis, the method used in the SDD-1 distributed database system, we prove its correctness, and we show that it can be used to substantially improve the performance of almost any synchronization mechanisn.

248 citations

Book ChapterDOI
13 Sep 1978
TL;DR: This paper is intended to serve the two-fold purpose of introducing the main issues and theorems of formal database semantics to the uninitiated, and to clarify the terminology of the field.
Abstract: Formal database semantics has concentrated on dependency constraints, such as functional and multivalued dependencies, and on normal forms for relations. Unfortunately, much of this work has been inaccessible to researchers outside this field, due to the unfamiliar formalism in which the work is couched. In addition, the lack of a single set of definitions has confused the relationships among certain results. This paper is intended to serve the two-fold purpose of introducing the main issues and theorems of formal database semantics to the uninitiated, and to clarify the terminology of the field.

246 citations

Journal ArticleDOI
TL;DR: This paper characterizes the queries for which full reducer exist and presents an efficient algorithm for constructing full reducers where they do exist and considers “natural” semijoin operator, which is used in the SDD-1 distributed database system.
Abstract: A semijoin is a relational operator that is used to reduce the cost of processing queries in the SDD-1 distributed database system, the RAP database machine, and similar systems. Semijoin is used in these systems as part of a query pre-processing phase; its function is to “reduce” the database by delimiting those portions of the database that contain data relevant to the query. For some queries, there exist sequences of semijoins that “fully reduce” the database; those sequences delimit the exact portions of the database needed to answer the query in the sense that if any less data were delimited then the query would produce a different answer. Such sequences are called full reducers.This paper characterizes the queries for which full reducers exist and presents an efficient algorithm for constructing full reducers where they do exist.This paper extends the results of Bernstein and Chiu [J. Assoc. Comput. Mach., 28 (1981), pp. 25–40] by considering a more powerful semijoin operator. We consider “natural” ...

240 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Book ChapterDOI
11 Nov 2007
TL;DR: The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data.
Abstract: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human-andmachine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.

4,828 citations

Proceedings ArticleDOI
14 Oct 2007
TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

4,349 citations

Book
01 Jan 1996
TL;DR: This book familiarizes readers with important problems, algorithms, and impossibility results in the area, and teaches readers how to reason carefully about distributed algorithms-to model them formally, devise precise specifications for their required behavior, prove their correctness, and evaluate their performance with realistic measures.
Abstract: In Distributed Algorithms, Nancy Lynch provides a blueprint for designing, implementing, and analyzing distributed algorithms. She directs her book at a wide audience, including students, programmers, system designers, and researchers. Distributed Algorithms contains the most significant algorithms and impossibility results in the area, all in a simple automata-theoretic setting. The algorithms are proved correct, and their complexity is analyzed according to precisely defined complexity measures. The problems covered include resource allocation, communication, consensus among distributed processes, data consistency, deadlock detection, leader election, global snapshots, and many others. The material is organized according to the system model-first by the timing model and then by the interprocess communication mechanism. The material on system models is isolated in separate chapters for easy reference. The presentation is completely rigorous, yet is intuitive enough for immediate comprehension. This book familiarizes readers with important problems, algorithms, and impossibility results in the area: readers can then recognize the problems when they arise in practice, apply the algorithms to solve them, and use the impossibility results to determine whether problems are unsolvable. The book also provides readers with the basic mathematical tools for designing new algorithms and proving new impossibility results. In addition, it teaches readers how to reason carefully about distributed algorithms-to model them formally, devise precise specifications for their required behavior, prove their correctness, and evaluate their performance with realistic measures. Table of Contents 1 Introduction 2 Modelling I; Synchronous Network Model 3 Leader Election in a Synchronous Ring 4 Algorithms in General Synchronous Networks 5 Distributed Consensus with Link Failures 6 Distributed Consensus with Process Failures 7 More Consensus Problems 8 Modelling II: Asynchronous System Model 9 Modelling III: Asynchronous Shared Memory Model 10 Mutual Exclusion 11 Resource Allocation 12 Consensus 13 Atomic Objects 14 Modelling IV: Asynchronous Network Model 15 Basic Asynchronous Network Algorithms 16 Synchronizers 17 Shared Memory versus Networks 18 Logical Time 19 Global Snapshots and Stable Properties 20 Network Resource Allocation 21 Asynchronous Networks with Process Failures 22 Data Link Protocols 23 Partially Synchronous System Models 24 Mutual Exclusion with Partial Synchrony 25 Consensus with Partial Synchrony

4,340 citations

Journal ArticleDOI
01 Dec 2001
TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

3,693 citations