scispace - formally typeset
Proceedings ArticleDOI

Apache Hadoop YARN: yet another resource negotiator

TLDR
The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Abstract
The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Large-scale cluster management at Google with Borg

TL;DR: A summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it are presented.
Proceedings ArticleDOI

Resource Management with Deep Reinforcement Learning

TL;DR: This work presents DeepRM, an example solution that translates the problem of packing tasks with multiple resource demands into a learning problem, and shows that it performs comparably to state-of-the-art heuristics, adapts to different conditions, converges quickly, and learns strategies that are sensible in hindsight.
Proceedings ArticleDOI

Storm@twitter

TL;DR: The architecture of Storm and its methods for distributed scale-out and fault-tolerance are described, how queries are executed in Storm is described, and some operational stories based on running Storm at Twitter are presented.
Journal ArticleDOI

State-of-the-art, challenges, and open issues in the integration of Internet of things and cloud computing

TL;DR: A survey of integration components: Cloud platforms, Cloud infrastructures and IoT Middleware is presented and some integration proposals and data analytics techniques are surveyed as well as different challenges and open research issues are pointed out.
Proceedings ArticleDOI

Twitter Heron: Stream Processing at Scale

TL;DR: Heron is now the de facto stream data processing engine inside Twitter, and in this paper the design and implementation of this new system, called Heron are presented and the experiences from running Heron in production are shared.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings ArticleDOI

The Hadoop Distributed File System

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.
Proceedings Article

Spark: cluster computing with working sets

TL;DR: Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time.
Book

The Mythical Man-Month

TL;DR: The Mythical Man-Month, Addison-Wesley, 1975 (excerpted in Datamation, December 1974), gathers some of the published data about software engineering and mixes it with the assertion of a lot of personal opinions.
Related Papers (5)