Home
/
Authors
/
Andrey Gubichev

Author

Andrey Gubichev

Other affiliations: Max Planck Society, Google

Bio: Andrey Gubichev is an academic researcher from Technische Universität München. The author has contributed to research in topics: RDF & Query optimization. The author has an hindex of 14, co-authored 19 publications receiving 1070 citations. Previous affiliations of Andrey Gubichev include Max Planck Society & Google.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

How good are query optimizers, really?

[...]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz¹, Alfons Kemper, Thomas Neumann - Show less +2 more•Institutions (1)

Centrum Wiskunde & Informatica¹

01 Nov 2015

TL;DR: This paper introduces the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries.

...read moreread less

Abstract: Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality estimates.

...read moreread less

449 citations

Proceedings Article•DOI•

The LDBC Social Network Benchmark: Interactive Workload

[...]

Orri Erling¹, Alex Averbuch, Josep L. Larriba-Pey, Hassan Chafi², Andrey Gubichev³, Arnau Prat⁴, Minh-Duc Pham⁵, Peter Boncz - Show less +4 more•Institutions (5)

OpenLink Software¹, Oracle Corporation², Technische Universität München³, Polytechnic University of Catalonia⁴, VU University Amsterdam⁵

27 May 2015

TL;DR: This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies.

...read moreread less

Abstract: The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

...read moreread less

262 citations

Proceedings Article•DOI•

Fast and accurate estimation of shortest paths in large graphs

[...]

Andrey Gubichev¹, Srikanta Bedathur¹, Stephan Seufert¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

26 Oct 2010

TL;DR: This paper presents a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves, leading to near-exact shortest-path approximations in real world graphs.

...read moreread less

Abstract: Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques - implemented within a fully functional RDF graph database system - over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average.

...read moreread less

172 citations

Journal Article•DOI•

Query optimization through the looking glass, and what we found running the Join Order Benchmark

[...]

Viktor Leis¹, Bernhard Radke¹, Andrey Gubichev¹, Atanas Mirchev¹, Peter Boncz, Alfons Kemper¹, Thomas Neumann¹ - Show less +3 more•Institutions (1)

Technische Universität München¹

01 Oct 2018

TL;DR: This paper introduces the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries and investigates plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and finds that exhaustive enumeration improves performance despite the suboptimal cardinality estimates.

...read moreread less

Abstract: Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. For this purpose, we describe cardinality-estimate injection and extraction techniques that allow us to compare the cardinality estimators of multiple industrial SQL implementations on equal footing, and to characterize the value of having perfect cardinality estimates. Our investigation shows that all industrial-strength cardinality estimators routinely produce large errors: though cardinality estimation using table samples solves the problem for single-table queries, there are still no techniques in industrial systems that can deal accurately with join-crossing correlated query predicates. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. We investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the suboptimal cardinality estimates. Finally, we extend our investigation from main-memory only, to also include disk-based query processing. Here, we find that though accurate cardinality estimation should be the first priority, other aspects such as modeling random versus sequential I/O are also important to predict query runtime.

...read moreread less

105 citations

Proceedings Article•

Cardinality Estimation Done Right: Index-Based Join Sampling.

[...]

Viktor Leis¹, Bernhard Radke¹, Andrey Gubichev¹, Alfons Kemper¹, Thomas Neumann¹ - Show less +1 more•Institutions (1)

Technische Universität München¹

01 Jan 2017

TL;DR: Indexbased join sampling is proposed, a novel cardinality estimation technique for main-memory databases that relies on sampling and existing index structures to obtain accurate estimates and significantly improves estimation as well as overall plan quality.

...read moreread less

Abstract: After four decades of research, today’s database systems still suffer from poor query execution plans. Bad plans are usually caused by poor cardinality estimates, which have been called the “Achilles Heel” of modern query optimizers. In this work we propose indexbased join sampling, a novel cardinality estimation technique for main-memory databases that relies on sampling and existing index structures to obtain accurate estimates. Results on a real-world data set show that this approach significantly improves estimation as well as overall plan quality. The additional sampling effort is quite low and can be configured to match the desired application profile. The technique can be easily integrated into most systems.

...read moreread less

102 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Cypher: An Evolving Query Language for Property Graphs

[...]

Nadime Francis¹, Alastair Green, Paolo Guagliardo², Leonid Libkin², Tobias Lindaaker, Victor Marsault², Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor - Show less +6 more•Institutions (2)

University of Paris¹, University of Edinburgh²

27 May 2018

TL;DR: This work describes Cypher 9, which is the first version of the language governed by the openCypher Implementers Group, and introduces the language by example, and provides a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model.

...read moreread less

Abstract: The Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is the first version of the language governed by the openCypher Implementers Group. We first introduce the language by example, and describe its uses in industry. We then provide a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model, and its ASCII Art graph pattern matching mechanism for expressing subgraphs of interest to an application. We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs.

...read moreread less

353 citations

Posted Content•

Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

[...]

Takuya Akiba¹, Yoichi Iwata¹, Yuichi Yoshida²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

17 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: This work proposes a new exact method for shortest-path distance queries on large-scale networks that can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods.

...read moreread less

Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

...read moreread less

278 citations

Proceedings Article•DOI•

Fast exact shortest-path distance queries on large networks by pruned landmark labeling

[...]

Takuya Akiba¹, Yoichi Iwata¹, Yuichi Yoshida²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

22 Jun 2013

TL;DR: In this article, a new exact method for shortest-path distance queries on large-scale networks is proposed, where the key ingredient introduced here is pruning during breadth-first searches.

...read moreread less

270 citations

Proceedings Article•DOI•

The LDBC Social Network Benchmark: Interactive Workload

[...]

Orri Erling¹, Alex Averbuch, Josep L. Larriba-Pey, Hassan Chafi², Andrey Gubichev³, Arnau Prat⁴, Minh-Duc Pham⁵, Peter Boncz - Show less +4 more•Institutions (5)

OpenLink Software¹, Oracle Corporation², Technische Universität München³, Polytechnic University of Catalonia⁴, VU University Amsterdam⁵

27 May 2015

...read moreread less

262 citations

Proceedings Article•DOI•

A Comparison of Current Graph Database Models

[...]

Renzo Angles

01 Apr 2012

TL;DR: A systematic comparison of current graph database models is presented and includes general features (for data storing and querying), data modeling features (i.e., data structures, query languages, and integrity constraints), and the support for essential graph queries.

...read moreread less

Abstract: The limitations of traditional databases, in particular the relational model, to cover the requirements of current applications has lead the development of new database technologies. Among them, the Graph Databases are calling the attention of the database community because in trendy projects where a database is needed, the extraction of worthy information relies on processing the graph-like structure of the data. In this paper we present a systematic comparison of current graph database models. Our review includes general features (for data storing and querying), data modeling features (i.e., data structures, query languages, and integrity constraints), and the support for essential graph queries.

...read moreread less

255 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse