scispace - formally typeset
Open AccessProceedings ArticleDOI

Twitter Heron: Stream Processing at Scale

Reads0
Chats0
TLDR
Heron is now the de facto stream data processing engine inside Twitter, and in this paper the design and implementation of this new system, called Heron are presented and the experiences from running Heron in production are shared.
Abstract
Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real-time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage -- all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now the de facto stream data processing engine inside Twitter, and in this paper we also share our experiences from running Heron in production. In this paper, we also provide empirical evidence demonstrating the efficiency and scalability of Heron.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming

TL;DR: A streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming is developed and a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations is provided.
Journal ArticleDOI

Distributed data stream processing and edge computing

TL;DR: This work describes how existing solutions exploit resource elasticity features of cloud computing in stream processing and presents a gap analysis and future directions on stream processing on heterogeneous environments.
Journal ArticleDOI

Samza: stateful scalable stream processing at LinkedIn

TL;DR: The experiments show that Samza handles state efficiently, improving latency and throughput by more than 100X compared to using a remote storage; provides recovery time independent of state size; scales performance linearly with number of containers; and supports reprocessing of the data stream quickly and with minimal interference on real-time traffic.
Proceedings ArticleDOI

R-Storm: Resource-Aware Scheduling in Storm

TL;DR: R-Storm as mentioned in this paper implements resource-aware scheduling within Storm, which can satisfy both soft and hard resource constraints as well as minimize network distance between components that communicate with each other, achieving 30-47% higher throughput and 69-350% better CPU utilization than default Storm.
Journal ArticleDOI

A Serverless Real-Time Data Analytics Platform for Edge Computing

TL;DR: A novel approach implements cloud-supported, real-time data analytics in edge-computing applications based on real-life healthcare use case scenarios and discusses the main design requirements and challenges.
References
More filters
Proceedings ArticleDOI

Apache Hadoop YARN: yet another resource negotiator

TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Proceedings ArticleDOI

Mesos: a platform for fine-grained resource sharing in the data center

TL;DR: The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.
Proceedings ArticleDOI

S4: Distributed Stream Computing Platform

TL;DR: The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers.
Proceedings ArticleDOI

Storm@twitter

TL;DR: The architecture of Storm and its methods for distributed scale-out and fault-tolerance are described, how queries are executed in Storm is described, and some operational stories based on running Storm at Twitter are presented.
Related Papers (5)