Home
/
Authors
/
Philip A. Bernstein

Author

Philip A. Bernstein

Other affiliations: Wang Institute of Graduate Studies, Harvard University, University of Toronto

Bio: Philip A. Bernstein is an academic researcher from Microsoft. The author has contributed to research in topics: Database schema & Concurrency control. The author has an hindex of 72, co-authored 248 publications receiving 28365 citations. Previous affiliations of Philip A. Bernstein include Wang Institute of Graduate Studies & Harvard University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1994
1993
1991
1990
1989
1987
1986
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1954
1899

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Using Semi-Joins to Solve Relational Queries

[...]

Philip A. Bernstein¹, Dah-Ming W. Chiu¹•Institutions (1)

Harvard University¹

01 Jan 1981-Journal of the ACM

TL;DR: The exact class of relational queries that can be solved using semi-joins is shown and it is shown that queries outside of this class may not even be partially solvable using "short" semi-join programs.

...read moreread less

Abstract: The semi-join is a relational algebraic operation that selects a set of tuples in one relation that match one or more tuples of another relation on the joining domains. Semi-joins have been used as a basic ingredient in query processing strategies for a number of hardware and software database systems. However, not all queries can be solved entirely using semi-joins. In this paper the exact class of relational queries that can be solved using semi-joins is shown. It is also shown that queries outside of this class may not even be partially solvable using "short" semi-join programs. In addition, a linear-time membership test for this class is presented.

...read moreread less

468 citations

Journal Article•DOI•

Computational problems related to the design of normal form relational schemas

[...]

Catriel Beeri¹, Philip A. Bernstein²•Institutions (2)

Hebrew University of Jerusalem¹, Harvard University²

01 Mar 1979-ACM Transactions on Database Systems

TL;DR: It is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP-complete and are therefore probably not amenable to fast algorithmic solutions.

...read moreread less

Abstract: Problems related to functional dependencies and the algorithmic design of relational schemas are examined. Specifically, the following results are presented: (1) a tree model of derivations of functional dependencies from other functional dependencies; (2) a linear-time algorithm to test if a functional dependency is in the closure of a set of functional dependencies; (3) a quadratic-time implementation of Bernstein's third normal form schema synthesis algorithm.Furthermore, it is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP-complete and are therefore probably not amenable to fast algorithmic solutions.

...read moreread less

461 citations

Journal Article•DOI•

Synthesizing third normal form relations from functional dependencies

[...]

Philip A. Bernstein¹•Institutions (1)

University of Toronto¹

01 Dec 1976-ACM Transactions on Database Systems

TL;DR: An effective procedure for performing a relational scheme synthesis is presented and the schema that results from this procedure is proved to be in Codd's third normal form and to contain the fewest possible number of relations.

...read moreread less

Abstract: It has been proposed that the description of a relational database can be formulated as a set of functional relationships among database attributes. These functional relationships can then be used to synthesize algorithmically a relational scheme. It is the purpose of this paper to present an effective procedure for performing such a synthesis. The schema that results from this procedure is proved to be in Codd's third normal form and to contain the fewest possible number of relations. Problems with earlier attempts to construct such a procedure are also discussed.

...read moreread less

450 citations

Proceedings Article•

Data Management for Peer-to-Peer Computing: A Vision

[...]

Philip A. Bernstein, Fausto Giunchiglia, Anastasios Kementsietsidis, John Mylopoulos, Luciano Serafini, Ilya Zaihrayeu - Show less +2 more

01 Jan 2002

TL;DR: This work motivates special database problems introduced by peer-to-peer computing and proposes the Local Relational Model (LRM) to solve some of them and summarizes a formalization of LRM.

...read moreread less

Abstract: We motivate special database problems introduced by peer-to-peer computing and propose the Local Relational Model (LRM) to solve some of them As well, we summarize a formalization of LRM, present an architecture for a prototype implementation, and discuss open research questions

...read moreread less

419 citations

Proceedings Article•DOI•

Corpus-based schema matching

[...]

Jayant Madhavan¹, Philip A. Bernstein², AnHai Doan³, Alon Halevy¹•Institutions (3)

University of Washington¹, Microsoft², University of Illinois at Urbana–Champaign³

05 Apr 2005

TL;DR: In this article, a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better, and they show experimental results that demonstrate corpus-based matching outperforms direct matching in multiple domains.

...read moreread less

Abstract: Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.

...read moreread less

400 citations

1
2
3
4
5
6
…
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Book Chapter•DOI•

DBpedia: a nucleus for a web of open data

[...]

Sören Auer¹, Christian Bizer², Georgi Kobilarov², Jens Lehmann³, Richard Cyganiak², Zachary G. Ives¹ - Show less +2 more•Institutions (3)

University of Pennsylvania¹, Free University of Berlin², Leipzig University³

11 Nov 2007

TL;DR: The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data.

...read moreread less

Abstract: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human-andmachine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.

...read moreread less

4,828 citations

Proceedings Article•DOI•

Dynamo: amazon's highly available key-value store

[...]

Giuseppe deCandia¹, Deniz Hastorun¹, Madan Mohan Rao Jampani¹, Gunavardhan Kakulapati¹, Avinash Lakshman¹, Alex Pilchin¹, Swaminathan Sivasubramanian¹, Peter Sven Vosshall¹, Werner Vogels¹ - Show less +5 more•Institutions (1)

Amazon.com¹

14 Oct 2007

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

...read moreread less

Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

...read moreread less

4,349 citations

Book•

Distributed algorithms

[...]

Nancy Lynch

01 Jan 1996

TL;DR: This book familiarizes readers with important problems, algorithms, and impossibility results in the area, and teaches readers how to reason carefully about distributed algorithms-to model them formally, devise precise specifications for their required behavior, prove their correctness, and evaluate their performance with realistic measures.

...read moreread less

Abstract: In Distributed Algorithms, Nancy Lynch provides a blueprint for designing, implementing, and analyzing distributed algorithms. She directs her book at a wide audience, including students, programmers, system designers, and researchers. Distributed Algorithms contains the most significant algorithms and impossibility results in the area, all in a simple automata-theoretic setting. The algorithms are proved correct, and their complexity is analyzed according to precisely defined complexity measures. The problems covered include resource allocation, communication, consensus among distributed processes, data consistency, deadlock detection, leader election, global snapshots, and many others. The material is organized according to the system model-first by the timing model and then by the interprocess communication mechanism. The material on system models is isolated in separate chapters for easy reference. The presentation is completely rigorous, yet is intuitive enough for immediate comprehension. This book familiarizes readers with important problems, algorithms, and impossibility results in the area: readers can then recognize the problems when they arise in practice, apply the algorithms to solve them, and use the impossibility results to determine whether problems are unsolvable. The book also provides readers with the basic mathematical tools for designing new algorithms and proving new impossibility results. In addition, it teaches readers how to reason carefully about distributed algorithms-to model them formally, devise precise specifications for their required behavior, prove their correctness, and evaluate their performance with realistic measures. Table of Contents 1 Introduction 2 Modelling I; Synchronous Network Model 3 Leader Election in a Synchronous Ring 4 Algorithms in General Synchronous Networks 5 Distributed Consensus with Link Failures 6 Distributed Consensus with Process Failures 7 More Consensus Problems 8 Modelling II: Asynchronous System Model 9 Modelling III: Asynchronous Shared Memory Model 10 Mutual Exclusion 11 Resource Allocation 12 Consensus 13 Atomic Objects 14 Modelling IV: Asynchronous Network Model 15 Basic Asynchronous Network Algorithms 16 Synchronizers 17 Shared Memory versus Networks 18 Logical Time 19 Global Snapshots and Stable Properties 20 Network Resource Allocation 21 Asynchronous Networks with Process Failures 22 Data Link Protocols 23 Partially Synchronous System Models 24 Mutual Exclusion with Partial Synchrony 25 Consensus with Partial Synchrony

...read moreread less

4,340 citations

Journal Article•DOI•

A survey of approaches to automatic schema matching

[...]

Erhard Rahm¹, Philip A. Bernstein²•Institutions (2)

Leipzig University¹, Microsoft²

01 Dec 2001

TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

3,693 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse