scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Journal ArticleDOI
Khaled Morsi1, A. El-Desouky1, B. Johnson1, A. Mar1, S. Lanka1 
TL;DR: In this article, the prospects and potential of spark plasma extrusion as a process that can allow the production of extended geometries via electric-current processing was discussed, showing the feasibility of this processing approach, which has major implications for the spark plasma sintering field.

27 citations

Proceedings ArticleDOI
17 Sep 2017
TL;DR: A solution based on distributed processing concepts to generate predictive map of air pollution for the next 24 hours on monitoring stations of Tehran, the capital of Iran, shows that the proposed approach can achieve a reasonable speed in processing of big spatial data along with horizontal scalability.
Abstract: Air pollution is one of the major environmental problems in the industrial and populated cities. Predictive mapping of urban air pollution and sharing the generated maps with the public and city officials have positive impacts on society and environment. This article presents a solution based on distributed processing concepts to generate predictive map of air pollution for the next 24 hours. Apache Hadoop has been utilized as the underlying framework to form a cluster of processing machines. In order to improve the processing speed along with required machine learning functionalities, Apache Spark has been employed on the Hadoop cluster. The solution enables us to efficiently predict air quality classes on monitoring stations of Tehran, the capital of Iran for the next 24 hours. Using Inverse distance weighting (IDW) method, the predictive map of air quality classes is generated afterward for the whole city. The results showed that the proposed approach can achieve a reasonable speed in processing of big spatial data along with horizontal scalability.

27 citations

Proceedings ArticleDOI
19 Apr 2017
TL;DR: New efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters and an evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computation in an in- memory distributed environment.
Abstract: The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of the-art distributed matrix computation systems on a wide range of applications

26 citations

Book ChapterDOI
01 Jan 2019
TL;DR: This research focuses on the selection of parameters of ALS algorithms that can affect the performance of a building robust RS and proposes a movie recommender system based on ALS using Apache Spark.
Abstract: Recently, the building of recommender systems becomes a significant research area that attractive several scientists and researchers across the world. The recommender systems are used in a variety of areas including music, movies, books, news, search queries, and commercial products. Collaborative Filtering algorithm is one of the popular successful techniques of RS, which aims to find users closely similar to the active one in order to recommend items. Collaborative filtering (CF) with alternating least squares (ALS) algorithm is the most imperative techniques which are used for building a movie recommendation engine. The ALS algorithm is one of the models of matrix factorization related CF which is considered as the values in the item list of user matrix. As there is a need to perform analysis on the ALS algorithm by selecting different parameters which can eventually help in building efficient movie recommender engine. In this paper, we propose a movie recommender system based on ALS using Apache Spark. This research focuses on the selection of parameters of ALS algorithms that can affect the performance of a building robust RS. From the results, a conclusion is drawn according to the selection of parameters of ALS algorithms which can affect the performance of building of a movie recommender engine. The model evaluation is done using different metrics such as execution time, root mean squared error (RMSE) of rating prediction, and rank in which the best model was trained. Two best cases are chosen based on best parameters selection from experimental results which can lead to building good prediction rating for a movie recommender.

26 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683