Home
/
Authors
/
Sanjay Agrawal

Author

Sanjay Agrawal

Bio: Sanjay Agrawal is an academic researcher from Microsoft. The author has contributed to research in topics: Database design & Database tuning. The author has an hindex of 29, co-authored 42 publications receiving 4102 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

DBXplorer: a system for keyword-based search over relational databases

[...]

Sanjay Agrawal¹, Surajit Chaudhuri¹, Gautam Das¹•Institutions (1)

Microsoft¹

07 Aug 2002

TL;DR: DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed.

...read moreread less

Abstract: Internet search engines have popularized the keyword-based search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based searches in relational databases. DBXplorer has been implemented using a commercial relational database and Web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system, including results of extensive experimental evaluation.

...read moreread less

818 citations

Proceedings Article•

Automated Selection of Materialized Views and Indexes in SQL Databases

[...]

Sanjay Agrawal¹, Surajit Chaudhuri¹, Vivek Narasayya¹•Institutions (1)

Microsoft¹

10 Sep 2000

TL;DR: This paper presents an end-to-end solution to the problem of selecting materialized views and indexes for SQL databases, and describes results of extensive experimental evaluation that demonstrate the effectiveness of the techniques.

...read moreread less

Abstract: Automatically selecting an appropriate set of materialized views and indexes for SQL databases is a non-trivial task. A judicious choice must be cost-driven and influenced by the workload experienced by the system. Although there has been work in materialized view selection in the context of multidimensional (OLAP) databases, no past work has looked at the problem of building an industry-strength tool for automated selection of materialized views and indexes for SQL workloads. In this paper, we present an end-to-end solution to the problem of selecting materialized views and indexes. We describe results of extensive experimental evaluation that demonstrate the effectiveness of our techniques. Our solution is implemented as part of a tuning wizard that ships with Microsoft SQL Server 2000.

...read moreread less

690 citations

Proceedings Article•DOI•

Integrating vertical and horizontal partitioning into automated physical database design

[...]

Sanjay Agrawal¹, Vivek Narasayya¹, Beverly Yang²•Institutions (2)

Microsoft¹, Stanford University²

13 Jun 2004

TL;DR: This paper presents novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account and implements it on Microsoft SQL Server.

...read moreread less

Abstract: In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and their underlying tables partitioned identically so as to make common operations such as backup/restore easier. While partitioning is important, incorporating partitioning makes the problem of automating physical design much harder since: (a) The choices of partitioning can strongly interact with choices of indexes and materialized views. (b) A large new space of physical design alternatives must be considered. (c) Manageability requirements impose a new constraint on the problem. In this paper, we present novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account. We have implemented our techniques and evaluated it on Microsoft SQL Server. Our experiments highlight: (a) the importance of taking an integrated approach to automated physical design and (b) the scalability of our techniques.

...read moreread less

447 citations

Proceedings Article•

Automated Ranking of Database Query Results

[...]

Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis

01 Jan 2003

TL;DR: The challenges and several approaches to enable ranking in databases, including adaptations of known techniques from information retrieval, are discussed and results of preliminary experiments are presented.

...read moreread less

Abstract: Ranking and returning the most relevant results of a query is a popular paradigm in Information Retrieval. We discuss challenges and investigate several approaches to enable ranking in databases, including adaptations of known techniques from information retrieval. We present results of preliminary experiments.

...read moreread less

279 citations

Book Chapter•DOI•

Database Tuning Advisor for Microsoft SQL Server 2005

[...]

Sanjay Agrawal¹, Surajit Chaudhuri¹, Lubor Kollar¹, Arunprasad Marathe¹, Vivek Narasayya¹, Manoj Syamala¹ - Show less +2 more•Institutions (1)

Microsoft¹

01 Aug 2004

TL;DR: This chapter provides an overview of Database Tuning Advisor's (DTA's) novel functionality, the rationale for its architecture, and demonstrates DTA's quality and scalability on large customer workloads.

...read moreread less

Abstract: Publisher Summary This chapter provides an overview of Database Tuning Advisor's (DTA's) novel functionality, the rationale for its architecture, and demonstrates DTA's quality and scalability on large customer workloads The DTA is part of Microsoft SQL Server 2005 It is an automated physical database design tool that significantly advances the state-of-the-art in several ways First, the DTA is capable of providing an integrated physical design recommendation for horizontal partitioning, indexes, and materialized views Second, unlike today's physical design tools that focus solely on performance, the DTA also supports the capability for a database administrator (DBA) to specify manageability requirements while optimizing for performance Third, the DTA is able to scale to large databases and workloads using several novel techniques including: workload compression, reduced statistics creation, and exploiting test server to reduce load on production server Finally, the DTA greatly enhances scriptability and customization through the use of a public XML schema for input and output

...read moreread less

262 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Answering queries using views: A survey

[...]

Alon Halevy¹•Institutions (1)

University of Washington¹

01 Dec 2001

TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.

...read moreread less

Abstract: The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.

...read moreread less

1,642 citations

Journal Article•DOI•

Efficient query evaluation on probabilistic databases

[...]

Nilesh Dalvi¹, Dan Suciu¹•Institutions (1)

University of Washington¹

31 Aug 2004

TL;DR: It is shown that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods, and an optimization algorithm is described that can compute efficiently most queries.

...read moreread less

Abstract: We describe a system that supports arbitrarily complex SQL queries on probabilistic databases. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

...read moreread less

1,113 citations

Journal Article•DOI•

YAGO: A Large Ontology from Wikipedia and WordNet

[...]

Fabian M. Suchanek¹, Gjergji Kasneci¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

01 Sep 2008-Journal of Web Semantics

TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.

...read moreread less

912 citations

Book Chapter•DOI•

Discover: keyword search in relational databases

[...]

Vagelis Hristidis¹, Yannis Papakonstantinou¹•Institutions (1)

University of California, San Diego¹

20 Aug 2002

TL;DR: It is proved that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema and the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete.

...read moreread less

Abstract: DISCOVER operates on relational databases and facilitates information discovery on them by allowing its user to issue keyword queries without any knowledge of the database schema or of SQL. DISCOVER returns qualified joining networks of tuples, that is, sets of tuples that are associated because they join on their primary and foreign keys and collectively contain all the keywords of the query. DISCOVER proceeds in two steps. First the Candidate Network Generator generates all candidate networks of relations, that is, join expressions that generate the joining networks of tuples. Then the Plan Generator builds plans for the efficient evaluation of the set of candidate networks, exploiting the opportunities to reuse common subexpressions of the candidate networks. We prove that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema. We prove that the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete. We provide a greedy algorithm and we show that it provides near-optimal plan execution time cost. Our experimentation also provides hints on tuning the greedy algorithm.

...read moreread less

892 citations

Proceedings Article•DOI•

XRANK: ranked keyword search over XML documents

[...]

Lin Guo¹, Feng Shao¹, Chavdar Botev¹, Jayavel Shanmugasundaram¹•Institutions (1)

Cornell University¹

09 Jun 2003

TL;DR: The XRANK system is presented, designed to handle the novel features of XML keyword search, which naturally generalizes a hyperlink based HTML search engine such as Google and can be used to query a mix of HTML and XML documents.

...read moreread less

Abstract: We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.

...read moreread less

857 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse