scispace - formally typeset
Proceedings ArticleDOI

Sparkle: optimizing spark for large memory machines and analytics

Reads0
Chats0
TLDR
This work leverages Spark, an existing memory-centric data analytics framework with wide-spread adoption among data scientists, to bring the performance benefits of in-memory processing on scale-up servers to an increasingly common class of data analytics applications that process small to medium size datasets.
Abstract
Given the growing availability of affordable scale-up servers, our goal is to bring the performance benefits of in-memory processing on scale-up servers to an increasingly common class of data analytics applications that process small to medium size datasets (up to a few 100GBs) that can easily fit in the memory of a typical scale-up server To achieve this goal, we leverage Spark, an existing memory-centric data analytics framework with wide-spread adoption among data scientists. Bringing Spark's data analytic capabilities to a scale-up system requires rethinking the original design assumptions, which, although effective for a scale-out system, are a poor match to a scale-up system resulting in unnecessary communication and memory inefficiencies.

read more

Citations
More filters
Proceedings Article

Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores.

TL;DR: This paper explores the design of disaggregating PM and managing them remotely from compute servers, a model the authors call passive disaggregated persistent memory, or pDPM, which significantly lowers monetary and energy costs and avoids scalability bottlenecks at storage servers.

Memory-Driven Computing.

TL;DR: This talk will discuss the technologies that comprise The Machine and their implications for systems software and application programs, as well as describe the work the team is doing at HPE to address some of these challenges and opportunities.
Journal ArticleDOI

A Survey on Spark Ecosystem for Big Data Processing.

TL;DR: A thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark and introduces Spark programming model and computing system, and discusses the pros and cons.
Proceedings ArticleDOI

Characterizing the Scale-Up Performance of Microservices using TeaStore

TL;DR: A study of a publicly available microservice based application on a state-of-the-art x86 server supporting 128 logical CPUs per socket highlights the significant performance opportunities that exist when the scaling properties of individual services and knowledge of the underlying processor topology are properly exploited.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Proceedings ArticleDOI

Scaling Distributed Machine Learning with the Parameter Server

Mu Li
TL;DR: View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered.
Journal ArticleDOI

Region-based memory management

TL;DR: A region-based dynamic semantics for a skeletal programming language extracted from Standard ML is defined and the inference system which specifies where regions can be allocated and de-allocated is presented and a detailed proof that the system is sound with respect to a standard semantics is presented.
Proceedings ArticleDOI

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

TL;DR: Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers.
Related Papers (5)