Spanner: Google’s Globally Distributed Database

doi:10.1145/2491245

Home
/
Papers
/
Spanner: Google’s Globally Distributed Database

Journal Article•DOI•

Spanner: Google’s Globally Distributed Database

James C. Corbett¹, Jeffrey Dean¹, Michael James Boyer Epstein¹, Andrew Fikes¹, Christopher Frost¹, J. J. Furman¹, Sanjay Ghemawat¹, Andrey Gubarev¹, Christopher Heiser¹, Peter Hochschild¹, Wilson C. Hsieh¹, Sebastian Kanthak¹, Eugene Kogan¹, Hongyi Li¹, Alexander Lloyd¹, Sergey Melnik¹, David Mwaura¹, David Nagle¹, Sean Quinlan¹, Rajesh Rao¹, Lindsay Rolig¹, Yasushi Saito¹, Michal Piotr Szymaniak¹, Chris Jorgen Taylor¹, Ruth Wang¹, Dale Woodford¹ - Show less +22 more•Institutions (1)

Google¹

01 Aug 2013-ACM Transactions on Computer Systems (ACM)-Vol. 31, Iss: 3, pp 8

TL;DR: Spanner as mentioned in this paper is Google's scalable, multiversion, globally distributed, and synchronously replicated database, which is the first system to distribute data at global scale and support externally-consistent distributed transactions.

read less

Abstract: Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Google Earth Engine: Planetary-scale geospatial analysis for everyone

[...]

Noel Gorelick¹, M. Hancher¹, Mike J. Dixon¹, Simon Ilyushchenko¹, David Thau¹, Rebecca Moore¹ - Show less +2 more•Institutions (1)

Google¹

06 Jul 2017-Remote Sensing of Environment

TL;DR: Google Earth Engine is a cloud-based platform for planetary-scale geospatial analysis that brings Google's massive computational capabilities to bear on a variety of high-impact societal issues including deforestation, drought, disaster, disease, food security, water management, climate monitoring and environmental protection.

...read moreread less

6,262 citations

Proceedings Article•DOI•

Making Smart Contracts Smarter

[...]

Loi Luu¹, Duc-Hiep Chu¹, Hrishi Olickel², Prateek Saxena¹, Aquinas Hobor¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Yale-NUS College²

24 Oct 2016

TL;DR: This paper investigates the security of running smart contracts based on Ethereum in an open distributed network like those of cryptocurrencies, and proposes ways to enhance the operational semantics of Ethereum to make contracts less vulnerable.

...read moreread less

Abstract: Cryptocurrencies record transactions in a decentralized data structure called a blockchain. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum's smart contract system has seen steady adoption, supporting tens of thousands of contracts, holding millions dollars worth of virtual coins. In this paper, we investigate the security of running smart contracts based on Ethereum in an open distributed network like those of cryptocurrencies. We introduce several new security problems in which an adversary can manipulate smart contract execution to gain profit. These bugs suggest subtle gaps in the understanding of the distributed semantics of the underlying platform. As a refinement, we propose ways to enhance the operational semantics of Ethereum to make contracts less vulnerable. For developers writing contracts for the existing Ethereum system, we build a symbolic execution tool called Oyente to find potential security bugs. Among 19, 336 existing Ethereum contracts, Oyente flags 8, 833 of them as vulnerable, including the TheDAO bug which led to a 60 million US dollar loss in June 2016. We also discuss the severity of other attacks for several case studies which have source code available and confirm the attacks (which target only our accounts) in the main Ethereum network.

...read moreread less

1,232 citations

Proceedings Article•DOI•

A Secure Sharding Protocol For Open Blockchains

[...]

Loi Luu¹, Viswesh Narayanan¹, Chaodong Zheng¹, Kunal Baweja¹, Seth Gilbert¹, Prateek Saxena¹ - Show less +2 more•Institutions (1)

National University of Singapore¹

24 Oct 2016

TL;DR: ELASTICO is the first candidate for a secure sharding protocol with presence of byzantine adversaries, and scalability experiments on Amazon EC2 with up to $1, 600$ nodes confirm ELASTICO's theoretical scaling properties.

...read moreread less

Abstract: Cryptocurrencies, such as Bitcoin and 250 similar alt-coins, embody at their core a blockchain protocol --- a mechanism for a distributed network of computational nodes to periodically agree on a set of new transactions. Designing a secure blockchain protocol relies on an open challenge in security, that of designing a highly-scalable agreement protocol open to manipulation by byzantine or arbitrarily malicious nodes. Bitcoin's blockchain agreement protocol exhibits security, but does not scale: it processes 3--7 transactions per second at present, irrespective of the available computation capacity at hand. In this paper, we propose a new distributed agreement protocol for permission-less blockchains called ELASTICO. ELASTICO scales transaction rates almost linearly with available computation for mining: the more the computation power in the network, the higher the number of transaction blocks selected per unit time. ELASTICO is efficient in its network messages and tolerates byzantine adversaries of up to one-fourth of the total computational power. Technically, ELASTICO uniformly partitions or parallelizes the mining network (securely) into smaller committees, each of which processes a disjoint set of transactions (or "shards"). While sharding is common in non-byzantine settings, ELASTICO is the first candidate for a secure sharding protocol with presence of byzantine adversaries. Our scalability experiments on Amazon EC2 with up to $1, 600$ nodes confirm ELASTICO's theoretical scaling properties.

...read moreread less

1,036 citations

Book Chapter•DOI•

The Quest for Scalable Blockchain Fabric: Proof-of-Work vs. BFT Replication

[...]

Marko Vukolic¹•Institutions (1)

IBM¹

29 Oct 2015

TL;DR: In the early days of Bitcoin, the performance of its probabilistic proof-of-work (PoW) based consensus fabric, also known as blockchain, was not a major issue, and Bitcoin became a success story, despite its consensus latencies on the order of an hour and the theoretical peak throughput of only up to 7 transactions per second.

...read moreread less

Abstract: Bitcoin cryptocurrency demonstrated the utility of global consensus across thousands of nodes, changing the world of digital transactions forever. In the early days of Bitcoin, the performance of its probabilistic proof-of-work (PoW) based consensus fabric, also known as blockchain, was not a major issue. Bitcoin became a success story, despite its consensus latencies on the order of an hour and the theoretical peak throughput of only up to 7 transactions per second.

...read moreread less

956 citations

Proceedings Article•DOI•

OmniLedger: A Secure, Scale-Out, Decentralized Ledger via Sharding

[...]

Eleftherios Kokoris-Kogias¹, Philipp Jovanovic¹, Linus Gasser¹, Nicolas Gailly¹, Ewa Syta², Bryan Ford¹ - Show less +2 more•Institutions (2)

École Normale Supérieure¹, Trinity College, Dublin²

20 May 2018

TL;DR: OmniLedger ensures security and correctness by using a bias-resistant public-randomness protocol for choosing large, statistically representative shards that process transactions, and by introducing an efficient cross-shard commit protocol that atomically handles transactions affecting multiple shards.

...read moreread less

Abstract: Designing a secure permissionless distributed ledger (blockchain) that performs on par with centralized payment processors, such as Visa, is a challenging task. Most existing distributed ledgers are unable to scale-out, i.e., to grow their total processing capacity with the number of validators; and those that do, compromise security or decentralization. We present OmniLedger, a novel scale-out distributed ledger that preserves longterm security under permissionless operation. It ensures security and correctness by using a bias-resistant public-randomness protocol for choosing large, statistically representative shards that process transactions, and by introducing an efficient cross-shard commit protocol that atomically handles transactions affecting multiple shards. OmniLedger also optimizes performance via parallel intra-shard transaction processing, ledger pruning via collectively-signed state blocks, and low-latency "trust-but-verify" validation for low-value transactions. An evaluation of our experimental prototype shows that OmniLedger’s throughput scales linearly in the number of active validators, supporting Visa-level workloads and beyond, while confirming typical transactions in under two seconds.

...read moreread less

856 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Google file system

[...]

Sanjay Ghemawat¹, Howard Gobioff¹, Shun-Tak Albert Leung¹•Institutions (1)

Google¹

19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

...read moreread less

5,429 citations

Journal Article•DOI•

Linearizability: a correctness condition for concurrent objects

[...]

Maurice Herlihy¹, Jeannette M. Wing¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1990-ACM Transactions on Programming Languages and Systems

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

...read moreread less

Abstract: A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. It permits a high degree of concurrency, yet it permits programmers to specify and reason about concurrent objects using known techniques from the sequential domain. Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response, implying that the meaning of a concurrent object's operations can be given by pre- and post-conditions. This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

...read moreread less

3,396 citations

Journal Article•DOI•

Bigtable: A Distributed Storage System for Structured Data

[...]

Fay W. Chang¹, Jeffrey Dean¹, Sanjay Ghemawat¹, Wilson C. Hsieh¹, Deborah A. Wallach¹, Michael Burrows¹, Tushar Deepak Chandra¹, Andrew Fikes¹, Robert E. Gruber¹ - Show less +5 more•Institutions (1)

Google¹

01 Jun 2008-ACM Transactions on Computer Systems

TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

3,259 citations

Journal Article•DOI•

The part-time parliament

[...]

Leslie Lamport

01 May 1998-ACM Transactions on Computer Systems

TL;DR: The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.

...read moreread less

Abstract: Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.

...read moreread less

2,965 citations

Journal Article•DOI•

MapReduce: a flexible data processing tool

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2010-Communications of The ACM

TL;DR: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.

...read moreread less

Abstract: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.

...read moreread less