scispace - formally typeset
Search or ask a question
Author

Avi Silberschatz

Other affiliations: Bell Labs, Alcatel-Lucent
Bio: Avi Silberschatz is an academic researcher from Yale University. The author has contributed to research in topics: Database design & Data analysis. The author has an hindex of 24, co-authored 45 publications receiving 3361 citations. Previous affiliations of Avi Silberschatz include Bell Labs & Alcatel-Lucent.

Papers
More filters
Journal ArticleDOI
01 Aug 2009
TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Abstract: The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private "clouds". At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis.There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them well-suited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

955 citations

Posted Content
TL;DR: This article proposes research into several important new directions for database management systems, driven by the Internet and increasing amounts of scientific and sensor data.
Abstract: A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of most complex software systems. It recommends that database researchers increase focus on: integration of text, data, code, and streams; fusion of information from heterogeneous data sources; reasoning about uncertain data; unsupervised data mining for interesting correlations; information privacy; and self-adaptation and repair.

208 citations

Journal ArticleDOI
01 Dec 1990
TL;DR: Achievements in database research underpin fundamental advances in communications systems, transportation and logistics, financial management, knowledge-based systems, accessibility to scientific literature, and a host of other civilian and defense applications.
Abstract: The history of database system research in the U.S. is one of exceptional productivity and startling economic impact. Barely twenty years old as a basic science research field, database research conducted with Federal support in the nation's universities and in its industrial research laboratories has fueled an information services industry estimated at $10 billion per year in the U.S. alone. This industry has grown at an average rate of 20 percent per year since 1965 and is continuing to expand at this rate. Achievements in database research underpin fundamental advances in communications systems, transportation and logistics, financial management, knowledge-based systems, accessibility to scientific literature, and a host of other civilian and defense applications. They also serve as the foundation for considerable progress in basic science in various fields ranging from computing to biology.

203 citations

Journal ArticleDOI
TL;DR: The history of database system research in the US is one of exceptional productivity and startling economic impact Barely twenty years old as a basic science research field, database research conducted with Federal support in the nation's universities and in its industrial research laboratories has fueled an information services industry estimated at $10 billion per year in the USA alone.
Abstract: The history of database system research in the US is one of exceptional productivity and startling economic impact Barely twenty years old as a basic science research field, database research conducted with Federal support in the nation's universities and in its industrial research laboratories has fueled an information services industry estimated at $10 billion per year in the US alone This industry has grown at an average rate of 20 percent per year since 1965 and is continuing to expand at this rate Achievements in database research underpin fundamental advances in communications systems, transportation and logistics, financial management, knowledge-based systems, accessibility to scientific literature, and a host of other civilian and defense applications They also serve as the foundation for considerable progress in basic science in various fields ranging from computing to biology

197 citations

Proceedings ArticleDOI
26 Mar 2000
TL;DR: Novel algorithms for discovering physical topology in heterogeneous (i.e., multi-vendor) IP networks are presented, which rely on standard SNMP MIB information that is widely supported by modern IP network elements and require no modifications to the operating system software running on elements or hosts.
Abstract: Knowledge of the up-to-date physical topology of an IP network is crucial to a number of critical network management tasks, including reactive and proactive resource management, event correlation, and root-cause analysis. Given the dynamic nature of today's IP networks, keeping track of topology information manually is a daunting (if not impossible) task. Thus, effective algorithms for automatically discovering physical network topology are necessary. Earlier work has typically concentrated on either: (a) discovering logical (i.e., layer-3) topology, which implies that the connectivity of all layer-2 elements (e.g., switches and bridges) is ignored; or (b) proprietary solutions targeting specific product families. In this paper, we present novel algorithms for discovering physical topology in heterogeneous (i.e., multi-vendor) IP networks. Our algorithms rely on standard SNMP MIB information that is widely supported by modern IP network elements and require no modifications to the operating system software running on elements or hosts. We have implemented the algorithms presented in this paper in the context of a topology discovery tool that has been tested on Lucent's own research network. The experimental results clearly validate our approach, demonstrating that our tool can consistently discover the accurate physical network topology in time that is roughly quadratic in the number of network elements.

183 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

2,141 citations

Proceedings Article
11 Sep 1995
TL;DR: This paper presents an efficient algorithm for mining association rules that is fundamentally different from known algorithms and not only reduces the I/O overhead significantly but also has lower CPU overhead for most cases.
Abstract: Mining for a.ssociation rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to previous algorithms, our algorithm not only reduces the I/O overhead significantly but also has lower CPU overhead for most cases. We have performed extensive experiments and compared the performance of our algorithm with one of the best existing algorithms. It was found that for large databases, the CPU overhead was reduced by as much as a factor of four and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases.

1,822 citations

Journal ArticleDOI
TL;DR: The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
Abstract: Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

1,669 citations

Journal ArticleDOI
TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Abstract: Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

1,427 citations

Proceedings ArticleDOI
08 Oct 2012
TL;DR: This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty, critical to supporting external consistency and a variety of powerful features.
Abstract: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

1,366 citations