Proceedings ArticleDOI
PrIter: a distributed framework for prioritized iterative computations
Yanfeng Zhang,Qixin Gao,Lixin Gao,Cuirong Wang +3 more
- pp 13
Reads0
Chats0
TLDR
This paper develops a distributed computing framework, PrIter, which supports the prioritized execution of iterative computations, and shows that PrIter achieves up to 50 × speedup over Hadoop for a series ofIterative algorithms.Abstract:
Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.read more
Citations
More filters
Journal ArticleDOI
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Yucheng Low,Danny Bickson,Joseph E. Gonzalez,Carlos Guestrin,Aapo Kyrola,Joseph M. Hellerstein +5 more
TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.
Proceedings ArticleDOI
Naiad: a timely dataflow system
TL;DR: It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.
Posted Content
Distributed GraphLab: A Framework for Machine Learning in the Cloud
Yucheng Low,Joseph E. Gonzalez,Aapo Kyrola,Danny Bickson,Carlos Guestrin,Joseph M. Hellerstein +5 more
TL;DR: This paper develops graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency, and introduces fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm.
Proceedings ArticleDOI
WTF: the who to follow service at Twitter
TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
Journal ArticleDOI
Petuum: A New Platform for Distributed Machine Learning on Big Data
Eric P. Xing,Qirong Ho,Wei Dai,Jin-Kyu Kim,Jinliang Wei,Seunghak Lee,Xun Zheng,Pengtao Xie,Abhimanu Kumar,Yaoliang Yu +9 more
TL;DR: This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Proceedings Article
Spark: cluster computing with working sets
TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.