Open AccessProceedings Article
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
Matei Zaharia,Mosharaf Chowdhury,Tathagata Das,Ankur Dave,Justin Ma,Murphy McCauley,Michael J. Franklin,Scott Shenker,Ion Stoica +8 more
- pp 2-2
TLDR
Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.Abstract:
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.read more
Citations
More filters
Proceedings ArticleDOI
TensorFlow: a system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more
TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
Posted Content
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martín Abadi,Ashish Agarwal,Paul Barham,Eugene Brevdo,Zhifeng Chen,Craig Citro,Greg S. Corrado,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Ian Goodfellow,Andrew Harp,Geoffrey Irving,Michael Isard,Yangqing Jia,Rafal Jozefowicz,Lukasz Kaiser,Manjunath Kudlur,Josh Levenberg,Dan Mané,Rajat Monga,Sherry Moore,Derek G. Murray,Chris Olah,Mike Schuster,Jonathon Shlens,Benoit Steiner,Ilya Sutskever,Kunal Talwar,Paul A. Tucker,Vincent Vanhoucke,Vijay K. Vasudevan,Fernanda B. Viégas,Oriol Vinyals,Pete Warden,Martin Wattenberg,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +39 more
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Posted Content
TensorFlow: A system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Journal ArticleDOI
Big Data: A Survey
Min Chen,Shiwen Mao,Yunhao Liu +2 more
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Journal ArticleDOI
SCENIC: single-cell regulatory network inference and clustering.
Sara Aibar,Carmen Bravo González-Blas,Thomas Moerman,Vân Anh Huynh-Thu,Hana Imrichova,Gert Hulselmans,Florian Rambow,Jean-Christophe Marine,Pierre Geurts,Jan Aerts,Joost van den Oord,Zeynep Kalender Atak,Jasper Wouters,Stein Aerts +13 more
TL;DR: On a compendium of single-cell data from tumors and brain, it is demonstrated that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.