scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks and a comprehensive experimental study of these interfaces on a set of benchmarks.
Abstract: Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. High-Performance Computing models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users' requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is needed or not, efficient or not, and whether it is the best solution to tackle future computational challenges.

23 citations

Journal ArticleDOI
TL;DR: In this article, the authors present a model-generating tool called GenSPARK, which constructs a zonal model of an entire building by assembling the appropriate modules, and solves the set of equations resulting from this construction to obtain the air flow and temperature distribution in the building.

23 citations

Journal ArticleDOI
01 Feb 2001
TL;DR: In this article, a predictive procedure is described for determining the effective time period needed to complete the energy release by combustion from the moment of flame initiation by a spark to the completion of flame propagation in a spark ignition engine while using a number of gaseous fuels and some of their mixtures.
Abstract: A predictive procedure is described for determining the effective time period needed to complete the energy release by combustion from the moment of flame initiation by a spark to the completion of flame propagation in a spark ignition engine while using a number of gaseous fuels and some of their mixtures. These predicted values of the combustion period when used in a relatively simple modelling procedure can produce predicted values of key engine performance parameters that compare well with the corresponding experimentally obtained values.

23 citations

Proceedings ArticleDOI
17 Mar 2015
TL;DR: The aim of this work was to develop and compare recommendation systems which use the item-based collaborative filtering algorithm, based on Hadoop and Spark, and the Tanimoto coefficient which provides the most precise results for the available data.
Abstract: The aim of this work was to develop and compare recommendation systems which use the item-based collaborative filtering algorithm, based on Hadoop and Spark. Data for the research were gathered from a real social portal the users of which can express their preferences regarding the applications on offer. The Hadoop version was implemented with the use of the Mahout library which was an element of the Hadoop ecosystem. The authors original solution was implemented with the use of the Apache Spark platform and the Scala programming language. The applied similarity measure was the Tanimoto coefficient which provides the most precise results for the available data. The initial assumptions were confirmed as the solution based on the Apache Spark platform turned out to be more efficient.

23 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683