Journal ArticleDOI
Apache Spark: a unified engine for big data processing
Matei Zaharia,Reynold Xin,Patrick Wendell,Tathagata Das,Michael Armbrust,Ankur Dave,Xiangrui Meng,Josh Rosen,Shivaram Venkataraman,Michael J. Franklin,Ali Ghodsi,Joseph E. Gonzalez,Scott Shenker,Ion Stoica +13 more
Reads0
Chats0
TLDR
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.Abstract:
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applicationsread more
Citations
More filters
Posted Content
A Simple Deep Personalized Recommendation System
TL;DR: This work proposes a Simple Deep Personalized Recommendation System to compute travelers' conditional embeddings, which combines listing embeddINGS in a supervised structure to build short-term historical context to personalize recommendations for travelers.
Journal ArticleDOI
Provisioning Input and Output Data Rates in Data Processing Frameworks
TL;DR: Experimental results show that the proposed tool can provision the I/O data rate sharing of competing data processing applications and can provide the guarantee and the appropriate share of an input and output (I/O) data rate.
Journal ArticleDOI
Innovative research of dynamic monitoring system of mental health vocational students based on big data
Hongli Yang,Qinghui Liu +1 more
TL;DR: This article establishes vocational students based on the background of big data, mental health archives, and mental health dynamic monitoring system, using the SCL-90 symptom self-rating scale to compare the gender differences, single-child vs. non-independent, urban-rural comparison, and other dynamic monitoring data research of mental health under different objective conditions.
Journal ArticleDOI
SOUL: Scala Oversampling and Undersampling Library for imbalance classification
TL;DR: A novel software approach named as SOUL, which stands for Scala Oversampling and Undersampling Library for imbalanced classification, is presented, which includes a large number of different data preprocessing techniques, efficient execution of these approaches, and a graphical environment to contrast the output for the different preprocessing solutions.
Proceedings ArticleDOI
Ensembled Outlier Detection using Multi-Variable Correlation in WSN through Unsupervised Learning Techniques.
TL;DR: A complete study of multi- variable based outlier detection is carried out and shows that for correlated variables, multi-variable EOD has a very good detection rate with a very low false alarm rate.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Proceedings Article
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
Matei Zaharia,Mosharaf Chowdhury,Tathagata Das,Ankur Dave,Justin Ma,Murphy McCauley,Michael J. Franklin,Scott Shenker,Ion Stoica +8 more
TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Journal ArticleDOI
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Proceedings ArticleDOI
Dryad: distributed data-parallel programs from sequential building blocks
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.