Journal ArticleDOI
Apache Spark: a unified engine for big data processing
Matei Zaharia,Reynold Xin,Patrick Wendell,Tathagata Das,Michael Armbrust,Ankur Dave,Xiangrui Meng,Josh Rosen,Shivaram Venkataraman,Michael J. Franklin,Ali Ghodsi,Joseph E. Gonzalez,Scott Shenker,Ion Stoica +13 more
Reads0
Chats0
TLDR
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.Abstract:
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applicationsread more
Citations
More filters
Journal ArticleDOI
Evaluating end-to-end optimization for data analytics applications in weld
Shoumik Palkar,James J. Thomas,Deepak Narayanan,Pratiksha Thaker,Rahul Palamuttam,Parimajan Negi,Anil Shanbhag,Malte Schwarzkopf,Holger Pirk,Saman Amarasinghe,Samuel Madden,Matei Zaharia +11 more
TL;DR: Using the optimizer designed, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization.
Journal ArticleDOI
Random Sample Partition: A Distributed Data Model for Big Data Analysis
TL;DR: The Random Sample Partition (RSP) distributed data model is proposed to represent a big data set as a set of disjoint data blocks, called RSP blocks, which have a probability distribution similar to that of the entire data set.
Journal ArticleDOI
Improved sqrt-cosine similarity measurement
Sahar Sohangir,Dingding Wang +1 more
TL;DR: The proposed improved sqrt-cosine similarity measure is applied to a variety of document-understanding tasks, such as text classification, clustering, and query search, and experimental results show that the proposed method is indeed effective.
Journal ArticleDOI
Research and Analysis of an Enterprise E-Commerce Marketing System Under the Big Data Environment
Journal ArticleDOI
Review and Classification of Bio-inspired Algorithms and Their Applications
TL;DR: This paper provides a systematic, pragmatic and comprehensive review of the latest developments in evolutionary based bio-inspired algorithms, swarm intelligence-based bio- inspired algorithms, ecology based bio -inspired algorithms and multi-objective bio-Inspired algorithms.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Proceedings Article
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
Matei Zaharia,Mosharaf Chowdhury,Tathagata Das,Ankur Dave,Justin Ma,Murphy McCauley,Michael J. Franklin,Scott Shenker,Ion Stoica +8 more
TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Journal ArticleDOI
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Proceedings ArticleDOI
Dryad: distributed data-parallel programs from sequential building blocks
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.