Apache Hadoop YARN: yet another resource negotiator

doi:10.1145/2523616.2523633

Proceedings ArticleDOI

Apache Hadoop YARN: yet another resource negotiator

- pp 5

TLDR

The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.

Abstract:

The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

Apache Hadoop YARN: yet another resource negotiator

Citations

Large-scale cluster management at Google with Borg

Resource Management with Deep Reinforcement Learning

Storm@twitter

State-of-the-art, challenges, and open issues in the integration of Internet of things and cloud computing

Twitter Heron: Stream Processing at Scale

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The Hadoop Distributed File System

Spark: cluster computing with working sets

The Mythical Man-Month

Related Papers (5)

MapReduce: simplified data processing on large clusters

Mesos: a platform for fine-grained resource sharing in the data center

Spark: cluster computing with working sets

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

The Hadoop Distributed File System