Home
/
Authors
/
Avinash Lakshman

Author

Avinash Lakshman

Bio: Avinash Lakshman is an academic researcher from Facebook. The author has contributed to research in topics: Distributed data store & Scalability. The author has an hindex of 3, co-authored 3 publications receiving 2960 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Cassandra: a decentralized structured storage system

[...]

Avinash Lakshman¹, Prashant Malik¹•Institutions (1)

Facebook¹

14 Apr 2010-Operating Systems Review

TL;DR: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.

...read moreread less

Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

...read moreread less

2,870 citations

Proceedings Article•DOI•

Cassandra: structured storage system on a P2P network

[...]

Avinash Lakshman¹, Prashant Malik¹•Institutions (1)

Facebook¹

10 Aug 2009

TL;DR: Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure.

...read moreread less

Abstract: Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra has achieved several goals--scalability, high performance, high availability and applicability. In many ways Cassandra resembles a database and shares many design and implementation strategies with databases. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format.

...read moreread less

270 citations

Proceedings Article•DOI•

Cassandra: a structured storage system on a P2P network

[...]

Avinash Lakshman¹, Prashant Malik¹•Institutions (1)

Facebook¹

11 Aug 2009

TL;DR: Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure.

...read moreread less

Abstract: Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra has achieved several goals -- scalability, high performance, high availability and applicability. In many ways Cassandra resembles a database and shares many design and implementation strategies with databases. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format.

...read moreread less

86 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Benchmarking cloud serving systems with YCSB

[...]

Brian F. Cooper¹, Adam Silberstein¹, Erwin Tam¹, Raghu Ramakrishnan¹, Russell Sears¹ - Show less +1 more•Institutions (1)

Yahoo!¹

10 Jun 2010

TL;DR: This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems.

...read moreread less

Abstract: While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for cloud serving use include BigTable, PNUTS, Cassandra, HBase, Azure, CouchDB, SimpleDB, Voldemort, and many others. Further, they are being applied to a diverse range of applications that differ considerably from traditional (e.g., TPC-C like) serving workloads. The number of emerging cloud serving systems and the wide range of proposed applications, coupled with a lack of apples-to-apples performance comparisons, makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited. We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple sharded MySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible--it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.

...read moreread less

3,276 citations

Journal Article•DOI•

Big Data: A Survey

[...]

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

2,303 citations

Proceedings Article•DOI•

ONOS: towards an open, distributed SDN OS

[...]

Pankaj Vishwanath Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi¹, Masayoshi Kobayashi¹, Toshio Koide¹, Bob Lantz, Brian O'Connor, Pavlin Radoslavov, William Snow, Guru Parulkar - Show less +7 more•Institutions (1)

NEC¹

22 Aug 2014

TL;DR: This work identifies additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can build upon.

...read moreread less

Abstract: We present our experiences to date building ONOS (Open Network Operating System), an experimental distributed SDN control platform motivated by the performance, scalability, and availability requirements of large operator networks. We describe and evaluate two ONOS prototypes. The first version implemented core features: a distributed, but logically centralized, global network view; scale-out; and fault tolerance. The second version focused on improving performance. Based on experience with these prototypes, we identify additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can build upon.

...read moreread less

1,137 citations

Journal Article•DOI•

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

[...]

Han Hu¹, Yonggang Wen², Tat-Seng Chua¹, Xuelong Li³•Institutions (3)

National University of Singapore¹, Nanyang Technological University², Chinese Academy of Sciences³

24 Jun 2014-IEEE Access

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

...read moreread less

Abstract: Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.

...read moreread less

1,002 citations

Proceedings Article•DOI•

Windows Azure Storage: a highly available cloud storage service with strong consistency

[...]

Brad Calder¹, Ju Wang¹, Aaron W. Ogus¹, Niranjan Nilakantan¹, Arild E. Skjolsvold¹, Sam McKelvie¹, Yikang Xu¹, Shashwat Srivastav¹, Jiesheng Wu¹, Huseyin Simitci¹, Jaidev Haridas¹, Chakravarthy Uddaraju¹, Hemal Khatri¹, Andrew James Edwards¹, Vaman Bedekar¹, Mainali Shane Kumar¹, Rafay Abbasi¹, Arpit Agarwal¹, Mian Fahim ul Haq¹, Muhammad Ikram ul Haq¹, Deepali Bhardwaj¹, Sowmya Dayanand¹, Anitha Adusumilli¹, Marvin McNett¹, Sriram Sankaran¹, Kavitha Manivannan¹, Leonidas Rigas¹ - Show less +23 more•Institutions (1)

Microsoft¹

23 Oct 2011

TL;DR: The WAS architecture, global namespace, and data model is described, as well as its resource provisioning, load balancing, and replication systems.

...read moreread less

Abstract: Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durably using both local and geographic replication to facilitate disaster recovery. Currently, WAS storage comes in the form of Blobs (files), Tables (structured storage), and Queues (message delivery). In this paper, we describe the WAS architecture, global namespace, and data model, as well as its resource provisioning, load balancing, and replication systems.

...read moreread less

871 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse