Home
/
Authors
/
Shivaram Venkataraman

Author

Shivaram Venkataraman

Other affiliations: Microsoft, University of Illinois at Urbana–Champaign, University of Chicago ...read more

Bio: Shivaram Venkataraman is an academic researcher from University of Wisconsin-Madison. The author has contributed to research in topics: Scheduling (computing) & Computer science. The author has an hindex of 31, co-authored 86 publications receiving 6423 citations. Previous affiliations of Shivaram Venkataraman include Microsoft & University of Illinois at Urbana–Champaign.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
1995

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Apache Spark: a unified engine for big data processing

[...]

Matei Zaharia¹, Reynold Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave², Xiangrui Meng, Josh Rosen, Shivaram Venkataraman², Michael J. Franklin², Ali Ghodsi², Joseph E. Gonzalez², Scott Shenker², Ion Stoica² - Show less +10 more•Institutions (2)

Stanford University¹, University of California, Berkeley²

28 Oct 2016-Communications of The ACM

TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

...read moreread less

Abstract: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

...read moreread less

1,776 citations

Journal Article•

MLlib: machine learning in apache spark

[...]

Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks¹, Shivaram Venkataraman¹, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen², Doris Xin³, Reynold Xin, Michael J. Franklin¹, Reza Bosagh Zadeh⁴, Matei Zaharia⁵, Ameet Talwalkar⁶ - Show less +12 more•Institutions (6)

University of California, Berkeley¹, Cloudera², Urbana University³, Stanford University⁴, Massachusetts Institute of Technology⁵, University of California, Los Angeles⁶

01 Jan 2016-Journal of Machine Learning Research

TL;DR: MLlib as mentioned in this paper is an open-source distributed machine learning library for Apache Spark that provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives.

...read moreread less

Abstract: Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark's open-source distributed machine learning library. MLLIB provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLLIB supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLLIB has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.

...read moreread less

1,551 citations

Proceedings Article•DOI•

Consistent and durable data structures for non-volatile byte-addressable memory

[...]

Shivaram Venkataraman¹, Niraj Tolia, Parthasarathy Ranganathan², Roy H. Campbell¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Hewlett-Packard²

15 Feb 2011

TL;DR: This paper presents Consistent and Durable Data Structures (CDDSs), a single-level data store that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies.

...read moreread less

Abstract: The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of "big data", and the subsequent emergence of frameworks such as memcached and NoSQL systems require us to rethink the design of data stores. To derive the maximum performance from these new memory technologies, this paper proposes the use of single-level data stores. For these systems, where no distinction is made between a volatile and a persistent copy of data, we present Consistent and Durable Data Structures (CDDSs) that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies. CDDSs use versioning to allow atomic updates without requiring logging. The same versioning scheme also enables rollback for failure recovery. When compared to a memory-backed Berkeley DB B-Tree, our prototype-based results show that a CDDS B-Tree can increase put and get throughput by 74% and 138%. When compared to Cassandra, a two-level data store, Tembo, a CDDS B-Tree enabled distributed Key-Value system, increases throughput by up to 250%-286%.

...read moreread less

403 citations

Proceedings Article•

Ernest: efficient performance prediction for large-scale advanced analytics

[...]

Shivaram Venkataraman¹, Zongheng Yang¹, Michael J. Franklin¹, Benjamin Recht¹, Ion Stoica¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

16 Mar 2016

TL;DR: Ernest, a performance prediction framework for large scale analytics, and evaluation on Amazon EC2 using several workloads shows that the prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

Abstract: Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the above challenge is having the ability to predict performance of applications under various resource configurations so that we can automatically choose the optimal configuration. Our insight is that a number of jobs have predictable structure in terms of computation and communication. Thus we can build performance models based on the behavior of the job on small samples of data and then predict its performance on larger datasets and cluster sizes. To minimize the time and resources spent in building a model, we use optimal experiment design, a statistical technique that allows us to collect as few training points as required. We have built Ernest, a performance prediction framework for large scale analytics and our evaluation on Amazon EC2 using several workloads shows that our prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

401 citations

Proceedings Article•DOI•

Occupy the cloud: distributed computing for the 99%

[...]

Eric Jonas¹, Qifan Pu¹, Shivaram Venkataraman¹, Ion Stoica¹, Benjamin Recht¹ - Show less +1 more•Institutions (1)

University of California¹

24 Sep 2017

TL;DR: Stateless functions are a natural fit for data processing in future computing environments as mentioned in this paper, based on recent trends in network bandwidth and the advent of disaggregated storage, and stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity.

...read moreread less

Abstract: Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

...read moreread less

369 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

XGBoost: A Scalable Tree Boosting System

[...]

Tianqi Chen¹, Carlos Guestrin¹•Institutions (1)

University of Washington¹

13 Aug 2016

TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.

...read moreread less

Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

...read moreread less

14,872 citations

Proceedings Article•DOI•

XGBoost: A Scalable Tree Boosting System

[...]

Tianqi Chen¹, Carlos Guestrin¹•Institutions (1)

University of Washington¹

09 Mar 2016-arXiv: Learning

TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.

...read moreread less

13,333 citations

Proceedings Article•DOI•

Deep Learning with Differential Privacy

[...]

Martín Abadi¹, Andy Chu¹, Ian Goodfellow, H. Brendan McMahan¹, Ilya Mironov¹, Kunal Talwar¹, Li Zhang¹ - Show less +3 more•Institutions (1)

Google¹

24 Oct 2016

TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

...read moreread less

Abstract: Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

...read moreread less

2,944 citations

Proceedings Article•DOI•

Deep Learning with Differential Privacy

[...]

Martín Abadi¹, Andy Chu¹, Ian Goodfellow, H. Brendan McMahan¹, Ilya Mironov¹, Kunal Talwar¹, Li Zhang¹ - Show less +3 more•Institutions (1)

Google¹

01 Jul 2016-arXiv: Machine Learning

TL;DR: This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

...read moreread less

1,777 citations

Journal Article•DOI•

Apache Spark: a unified engine for big data processing

[...]

Stanford University¹, University of California, Berkeley²

28 Oct 2016-Communications of The ACM

TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

...read moreread less

Abstract: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

...read moreread less

1,776 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse