PrIter: a distributed framework for prioritized iterative computations

doi:10.1145/2038916.2038929

Proceedings ArticleDOI

PrIter: a distributed framework for prioritized iterative computations

Yanfeng Zhang, +3 more

- pp 13

Chats0

TLDR

This paper develops a distributed computing framework, PrIter, which supports the prioritized execution of iterative computations, and shows that PrIter achieves up to 50 × speedup over Hadoop for a series ofIterative algorithms.

Abstract:

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Distributed GraphLab: a framework for machine learning and data mining in the cloud

Yucheng Low, +5 more

TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.

...read moreread less

Proceedings ArticleDOI

Naiad: a timely dataflow system

Derek G. Murray, +5 more

TL;DR: It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.

...read moreread less

Posted Content

Distributed GraphLab: A Framework for Machine Learning in the Cloud

Yucheng Low, +5 more

- 26 Apr 2012 -

arXiv: Databases

TL;DR: This paper develops graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency, and introduces fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm.

...read moreread less

Proceedings ArticleDOI

WTF: the who to follow service at Twitter

Pankaj Gupta, +5 more

TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.

...read moreread less

Journal ArticleDOI

Petuum: A New Platform for Distributed Machine Learning on Big Data

Eric P. Xing, +9 more

- 01 Jun 2015 -

IEEE Transactions on Big Data

TL;DR: This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, +1 more

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more

- 01 Jan 1998 -

Computer Networks

TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

...read moreread less

Proceedings Article

Spark: cluster computing with working sets

Matei Zaharia, +4 more

TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.

...read moreread less

Collapse

Related Papers (5)

Pregel: a system for large-scale graph processing

Grzegorz Malewicz, +6 more

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

PrIter: a distributed framework for prioritized iterative computations

Citations

Distributed GraphLab: a framework for machine learning and data mining in the cloud

Naiad: a timely dataflow system

Distributed GraphLab: A Framework for Machine Learning in the Cloud

WTF: the who to follow service at Twitter

Petuum: A New Platform for Distributed Machine Learning on Big Data

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Spark: cluster computing with working sets

Related Papers (5)

Pregel: a system for large-scale graph processing

MapReduce: simplified data processing on large clusters

Spark: cluster computing with working sets

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

PowerGraph: distributed graph-parallel computation on natural graphs