scispace - formally typeset
Search or ask a question
Author

Sanjay Agrawal

Bio: Sanjay Agrawal is an academic researcher from Microsoft. The author has contributed to research in topics: Database design & Database tuning. The author has an hindex of 29, co-authored 42 publications receiving 4102 citations.

Papers
More filters
Proceedings ArticleDOI
07 Aug 2002
TL;DR: DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed.
Abstract: Internet search engines have popularized the keyword-based search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based searches in relational databases. DBXplorer has been implemented using a commercial relational database and Web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system, including results of extensive experimental evaluation.

818 citations

Proceedings Article
10 Sep 2000
TL;DR: This paper presents an end-to-end solution to the problem of selecting materialized views and indexes for SQL databases, and describes results of extensive experimental evaluation that demonstrate the effectiveness of the techniques.
Abstract: Automatically selecting an appropriate set of materialized views and indexes for SQL databases is a non-trivial task. A judicious choice must be cost-driven and influenced by the workload experienced by the system. Although there has been work in materialized view selection in the context of multidimensional (OLAP) databases, no past work has looked at the problem of building an industry-strength tool for automated selection of materialized views and indexes for SQL workloads. In this paper, we present an end-to-end solution to the problem of selecting materialized views and indexes. We describe results of extensive experimental evaluation that demonstrate the effectiveness of our techniques. Our solution is implemented as part of a tuning wizard that ships with Microsoft SQL Server 2000.

690 citations

Proceedings ArticleDOI
13 Jun 2004
TL;DR: This paper presents novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account and implements it on Microsoft SQL Server.
Abstract: In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and their underlying tables partitioned identically so as to make common operations such as backup/restore easier. While partitioning is important, incorporating partitioning makes the problem of automating physical design much harder since: (a) The choices of partitioning can strongly interact with choices of indexes and materialized views. (b) A large new space of physical design alternatives must be considered. (c) Manageability requirements impose a new constraint on the problem. In this paper, we present novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account. We have implemented our techniques and evaluated it on Microsoft SQL Server. Our experiments highlight: (a) the importance of taking an integrated approach to automated physical design and (b) the scalability of our techniques.

447 citations

Proceedings Article
01 Jan 2003
TL;DR: The challenges and several approaches to enable ranking in databases, including adaptations of known techniques from information retrieval, are discussed and results of preliminary experiments are presented.
Abstract: Ranking and returning the most relevant results of a query is a popular paradigm in Information Retrieval. We discuss challenges and investigate several approaches to enable ranking in databases, including adaptations of known techniques from information retrieval. We present results of preliminary experiments.

279 citations

Book ChapterDOI
01 Aug 2004
TL;DR: This chapter provides an overview of Database Tuning Advisor's (DTA's) novel functionality, the rationale for its architecture, and demonstrates DTA's quality and scalability on large customer workloads.
Abstract: Publisher Summary This chapter provides an overview of Database Tuning Advisor's (DTA's) novel functionality, the rationale for its architecture, and demonstrates DTA's quality and scalability on large customer workloads The DTA is part of Microsoft SQL Server 2005 It is an automated physical database design tool that significantly advances the state-of-the-art in several ways First, the DTA is capable of providing an integrated physical design recommendation for horizontal partitioning, indexes, and materialized views Second, unlike today's physical design tools that focus solely on performance, the DTA also supports the capability for a database administrator (DBA) to specify manageability requirements while optimizing for performance Third, the DTA is able to scale to large databases and workloads using several novel techniques including: workload compression, reduced statistics creation, and exploiting test server to reduce load on production server Finally, the DTA greatly enhances scriptability and customization through the use of a public XML schema for input and output

262 citations


Cited by
More filters
Journal ArticleDOI
01 Dec 2001
TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.
Abstract: The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.

1,642 citations

Journal ArticleDOI
31 Aug 2004
TL;DR: It is shown that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods, and an optimization algorithm is described that can compute efficiently most queries.
Abstract: We describe a system that supports arbitrarily complex SQL queries on probabilistic databases. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

1,113 citations

Journal ArticleDOI
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

912 citations

Book ChapterDOI
20 Aug 2002
TL;DR: It is proved that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema and the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete.
Abstract: DISCOVER operates on relational databases and facilitates information discovery on them by allowing its user to issue keyword queries without any knowledge of the database schema or of SQL. DISCOVER returns qualified joining networks of tuples, that is, sets of tuples that are associated because they join on their primary and foreign keys and collectively contain all the keywords of the query. DISCOVER proceeds in two steps. First the Candidate Network Generator generates all candidate networks of relations, that is, join expressions that generate the joining networks of tuples. Then the Plan Generator builds plans for the efficient evaluation of the set of candidate networks, exploiting the opportunities to reuse common subexpressions of the candidate networks. We prove that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema. We prove that the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete. We provide a greedy algorithm and we show that it provides near-optimal plan execution time cost. Our experimentation also provides hints on tuning the greedy algorithm.

892 citations

Proceedings ArticleDOI
09 Jun 2003
TL;DR: The XRANK system is presented, designed to handle the novel features of XML keyword search, which naturally generalizes a hyperlink based HTML search engine such as Google and can be used to query a mix of HTML and XML documents.
Abstract: We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.

857 citations