Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni,Nikunj Bhagat,Maosong Fu,Vikas Kedigehalli,Christopher Kellogg,Sailesh Mittal,Jignesh M. Patel,Karthik Ramasamy,Siddarth Taneja +8 more
- pp 239-250
Reads0
Chats0
TLDR
Heron is now the de facto stream data processing engine inside Twitter, and in this paper the design and implementation of this new system, called Heron are presented and the experiences from running Heron in production are shared.Abstract:
Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real-time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage -- all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now the de facto stream data processing engine inside Twitter, and in this paper we also share our experiences from running Heron in production. In this paper, we also provide empirical evidence demonstrating the efficiency and scalability of Heron.read more
Citations
More filters
Proceedings ArticleDOI
Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming
Sanket Chintapalli,Derek Dagit,Bobby Evans,Reza Farivar,Thomas Graves,Mark Holderbaugh,Zhuo Liu,Kyle Nusbaum,Kishorkumar Patil,Boyang Jerry Peng,Paul Poulosky +10 more
TL;DR: A streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming is developed and a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations is provided.
Journal ArticleDOI
Distributed data stream processing and edge computing
TL;DR: This work describes how existing solutions exploit resource elasticity features of cloud computing in stream processing and presents a gap analysis and future directions on stream processing on heterogeneous environments.
Journal ArticleDOI
Samza: stateful scalable stream processing at LinkedIn
Shadi A. Noghabi,Kartik Paramasivam,Yi Pan,Navina Ramesh,Jon Bringhurst,Indranil Gupta,Roy H. Campbell +6 more
TL;DR: The experiments show that Samza handles state efficiently, improving latency and throughput by more than 100X compared to using a remote storage; provides recovery time independent of state size; scales performance linearly with number of containers; and supports reprocessing of the data stream quickly and with minimal interference on real-time traffic.
Proceedings ArticleDOI
R-Storm: Resource-Aware Scheduling in Storm
TL;DR: R-Storm as mentioned in this paper implements resource-aware scheduling within Storm, which can satisfy both soft and hard resource constraints as well as minimize network distance between components that communicate with each other, achieving 30-47% higher throughput and 69-350% better CPU utilization than default Storm.
Journal ArticleDOI
A Serverless Real-Time Data Analytics Platform for Edge Computing
Stefan Nastic,Thomas Rausch,Ognjen Scekic,Schahram Dustdar,Marjan Gusev,Bojana Koteska,Magdalena Kostoska,Boro Jakimovski,Sasko Ristov,Radu Prodan +9 more
TL;DR: A novel approach implements cloud-supported, real-time data analytics in edge-computing applications based on real-life healthcare use case scenarios and discusses the main design requirements and challenges.
References
More filters
Proceedings ArticleDOI
Apache Hadoop YARN: yet another resource negotiator
Vinod Kumar Vavilapalli,Arun C. Murthy,Chris Douglas,Sharad Agarwal,Mahadev Konar,Robert Evans,Thomas Graves,Jason Lowe,Hitesh Shah,Siddharth Seth,Bikas Saha,Carlo Curino,Owen O'Malley,Sanjay Radia,Benjamin Reed,Eric Baldeschwieler +15 more
TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Proceedings ArticleDOI
Mesos: a platform for fine-grained resource sharing in the data center
Benjamin Hindman,Andy Konwinski,Matei Zaharia,Ali Ghodsi,Anthony D. Joseph,Randy H. Katz,Scott Shenker,Ion Stoica +7 more
TL;DR: The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.
Proceedings ArticleDOI
S4: Distributed Stream Computing Platform
TL;DR: The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers.
Proceedings ArticleDOI
Storm@twitter
Ankit Toshniwal,Siddarth Taneja,Amit Shukla,Karthik Ramasamy,Jignesh M. Patel,Sanjeev Kulkarni,Jason Jackson,Krishna Gade,Maosong Fu,Jake Donham,Nikunj Bhagat,Sailesh Mittal,Dmitriy Ryaboy +12 more
TL;DR: The architecture of Storm and its methods for distributed scale-out and fault-tolerance are described, how queries are executed in Storm is described, and some operational stories based on running Storm at Twitter are presented.