Home
/
Authors
/
Tim Kraska

Author

Tim Kraska

Other affiliations: University of Fribourg, University of Münster, ETH Zurich ...read more

Bio: Tim Kraska is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Cloud computing & Query optimization. The author has an hindex of 48, co-authored 209 publications receiving 8995 citations. Previous affiliations of Tim Kraska include University of Fribourg & University of Münster.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The Case for Learned Index Structures

[...]

Tim Kraska¹, Alex Beutel², Ed H. Chi², Jeffrey Dean², Neoklis Polyzotis² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Google²

27 May 2018

TL;DR: In this paper, the authors propose to replace traditional index structures with learned models, which can have significant advantages over traditional indexes, and theoretically analyze under which conditions learned indexes outperform traditional index structure and describe the main challenges in designing learned index structures.

...read moreread less

Abstract: Indexes are models: a \btree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term \em learned indexes. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show that our learned indexes can have significant advantages over traditional indexes. More importantly, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work provides just a glimpse of what might be possible.

...read moreread less

742 citations

Proceedings Article•DOI•

CrowdDB: answering queries with crowdsourcing

[...]

Michael J. Franklin¹, Donald Kossmann², Tim Kraska¹, Sukriti Ramesh², Reynold Xin¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, ETH Zurich²

12 Jun 2011

TL;DR: The design of CrowdDB is described, a major change is that the traditional closed-world assumption for query processing does not hold for human input, and important avenues for future work in the development of crowdsourced query processing systems are outlined.

...read moreread less

Abstract: Some queries cannot be answered by machines only. Processing such queries requires human input for providing information that is missing from the database, for performing computationally difficult functions, and for matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. It uses SQL both as a language for posing complex queries and as a way to model data. While CrowdDB leverages many aspects of traditional database systems, there are also important differences. Conceptually, a major change is that the traditional closed-world assumption for query processing does not hold for human input. From an implementation perspective, human-oriented query operators are needed to solicit, integrate and cleanse crowdsourced data. Furthermore, performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location. We describe the design of CrowdDB, report on an initial set of experiments using Amazon Mechanical Turk, and outline important avenues for future work in the development of crowdsourced query processing systems.

...read moreread less

688 citations

Journal Article•DOI•

CrowdER: crowdsourcing entity resolution

[...]

Jiannan Wang¹, Tim Kraska², Michael J. Franklin², Jianhua Feng¹•Institutions (2)

Tsinghua University¹, University of California, Berkeley²

01 Jul 2012

TL;DR: This work proposes a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are use to verify only the most likely matching pairs, and develops a novel two-tiered heuristic approach for creating batched tasks.

...read moreread less

Abstract: Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate but expensive (and slow) way to bring human insight into the process. Previous work has proposed batching verification tasks for presentation to human workers but even with batching, a human-only approach is infeasible for data sets of even moderate size, due to the large numbers of matches to be tested. Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. We show that for such a hybrid system, generating the minimum number of verification tasks of a given size is NP-Hard, but we develop a novel two-tiered heuristic approach for creating batched tasks. We describe this method, and present the results of extensive experiments on real data sets using a popular crowdsourcing platform. The experiments show that our hybrid approach achieves both good efficiency and high accuracy compared to machine-only or human-only alternatives.

...read moreread less

499 citations

Posted Content•

CrowdER: Crowdsourcing Entity Resolution

[...]

Jiannan Wang¹, Tim Kraska², Michael J. Franklin², Jianhua Feng¹•Institutions (2)

Tsinghua University¹, University of California, Berkeley²

09 Aug 2012-arXiv: Databases

TL;DR: In this paper, a hybrid human-machine approach is proposed, in which machines are used to do an initial, coarse pass over all the data, and people were used to verify only the most likely matching pairs.

...read moreread less

450 citations

Proceedings Article•

MLbase: A Distributed Machine-learning System

[...]

Tim Kraska¹, Ameet Talwalkar², John C. Duchi², Rean Griffith³, Michael J. Franklin², Michael I. Jordan² - Show less +2 more•Institutions (3)

Brown University¹, University of California, Berkeley², VMware³

01 Jan 2013

TL;DR: This work presents the vision for MLbase, a novel system harnessing the power of machine learning for both end-users and ML researchers, which provides a simple declarative way to specify ML tasks and a novel optimizer to select and dynamically adapt the choice of learning algorithm.

...read moreread less

Abstract: Machine learning (ML) and statistical techniques are key to transforming big data into actionable knowledge. In spite of the modern primacy of data, the complexity of existing ML algorithms is often overwhelming|many users do not understand the trade-os and challenges of parameterizing and choosing between dierent learning techniques. Furthermore, existing scalable systems that support machine learning are typically not accessible to ML researchers without a strong background in distributed systems and low-level primitives. In this work, we present our vision for MLbase, a novel system harnessing the power of machine learning for both end-users and ML researchers. MLbase provides (1) a simple declarative way to specify ML tasks, (2) a novel optimizer to select and dynamically adapt the choice of learning algorithm, (3) a set of high-level operators to enable ML researchers to scalably implement a wide range of ML methods without deep systems knowledge, and (4) a new run-time optimized for the data-access patterns of these high-level operators.

...read moreread less

359 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集ゲノム医学の現在と未来--基礎と臨床) -- (データベース)

[...]

光輝中尾, 實金久

01 Jan 2000

3,536 citations

A Break in the Clouds: Towards a Cloud Definition

[...]

Chris Rose

01 Jan 2011

2,037 citations

Journal Article•DOI•

Apache Spark: a unified engine for big data processing

[...]

Matei Zaharia¹, Reynold Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave², Xiangrui Meng, Josh Rosen, Shivaram Venkataraman², Michael J. Franklin², Ali Ghodsi², Joseph E. Gonzalez², Scott Shenker², Ion Stoica² - Show less +10 more•Institutions (2)

Stanford University¹, University of California, Berkeley²

28 Oct 2016-Communications of The ACM

TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

...read moreread less

Abstract: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

...read moreread less

1,776 citations

Journal Article•

MLlib: machine learning in apache spark

[...]

Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks¹, Shivaram Venkataraman¹, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen², Doris Xin³, Reynold Xin, Michael J. Franklin¹, Reza Bosagh Zadeh⁴, Matei Zaharia⁵, Ameet Talwalkar⁶ - Show less +12 more•Institutions (6)

University of California, Berkeley¹, Cloudera², Urbana University³, Stanford University⁴, Massachusetts Institute of Technology⁵, University of California, Los Angeles⁶

01 Jan 2016-Journal of Machine Learning Research

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.

...read moreread less

Abstract: Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLLIB provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLLIB supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLLIB has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

...read moreread less

1,551 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse