Proceedings ArticleDOI
Sparkle: optimizing spark for large memory machines and analytics
Mijung Kim,Jun Li,Haris Volos,Manish Marwah,Alexander Ulanov,Kimberly Keeton,Joseph Tucek,Lucy Cherkasova,Le Xu,Pradeep Fernando +9 more
- pp 656-656
Reads0
Chats0
TLDR
This work leverages Spark, an existing memory-centric data analytics framework with wide-spread adoption among data scientists, to bring the performance benefits of in-memory processing on scale-up servers to an increasingly common class of data analytics applications that process small to medium size datasets.Abstract:
Given the growing availability of affordable scale-up servers, our goal is to bring the performance benefits of in-memory processing on scale-up servers to an increasingly common class of data analytics applications that process small to medium size datasets (up to a few 100GBs) that can easily fit in the memory of a typical scale-up server To achieve this goal, we leverage Spark, an existing memory-centric data analytics framework with wide-spread adoption among data scientists. Bringing Spark's data analytic capabilities to a scale-up system requires rethinking the original design assumptions, which, although effective for a scale-out system, are a poor match to a scale-up system resulting in unnecessary communication and memory inefficiencies.read more
Citations
More filters
Proceedings Article
Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores.
TL;DR: This paper explores the design of disaggregating PM and managing them remotely from compute servers, a model the authors call passive disaggregated persistent memory, or pDPM, which significantly lowers monetary and energy costs and avoids scalability bottlenecks at storage servers.
Memory-Driven Computing.
TL;DR: This talk will discuss the technologies that comprise The Machine and their implications for systems software and application programs, as well as describe the work the team is doing at HPE to address some of these challenges and opportunities.
Journal ArticleDOI
A Survey on Spark Ecosystem for Big Data Processing.
TL;DR: A thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark and introduces Spark programming model and computing system, and discusses the pros and cons.
Proceedings ArticleDOI
Characterizing the Scale-Up Performance of Microservices using TeaStore
TL;DR: A study of a publicly available microservice based application on a state-of-the-art x86 server supporting 128 logical CPUs per socket highlights the significant performance opportunities that exist when the scaling properties of individual services and knowledge of the underlying processor topology are properly exploited.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
Apache Spark: a unified engine for big data processing
Matei Zaharia,Reynold Xin,Patrick Wendell,Tathagata Das,Michael Armbrust,Ankur Dave,Xiangrui Meng,Josh Rosen,Shivaram Venkataraman,Michael J. Franklin,Ali Ghodsi,Joseph E. Gonzalez,Scott Shenker,Ion Stoica +13 more
TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Proceedings ArticleDOI
Scaling Distributed Machine Learning with the Parameter Server
TL;DR: View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered.
Journal ArticleDOI
Region-based memory management
Mads Tofte,Jean-Pierre Talpin +1 more
TL;DR: A region-based dynamic semantics for a skeletal programming language extracted from Standard ML is defined and the inference system which specifies where regions can be allocated and de-allocated is presented and a detailed proof that the system is sound with respect to a standard semantics is presented.
Proceedings ArticleDOI
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
TL;DR: Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers.