scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an efficient rolling bearing fault diagnosis method based on Spark and improved random forest (IRF) algorithm by eliminating the decision trees with low classification accuracy and those prone to repeated voting in the original RF, an improved RF with faster diagnosis speed and higher classification accuracy is constructed.
Abstract: The random forest (RF) algorithm is a typical representative of ensemble learning, which is widely used in rolling bearing fault diagnosis. In order to solve the problems of slower diagnosis speed and repeated voting of traditional RF algorithm in rolling bearing fault diagnosis under the big data environment, an efficient rolling bearing fault diagnosis method based on Spark and improved random forest (IRF) algorithm is proposed. By eliminating the decision trees with low classification accuracy and those prone to repeated voting in the original RF, an improved RF with faster diagnosis speed and higher classification accuracy is constructed. For the massive rolling bearing vibration data, in order to improve the training speed and diagnosis speed of the rolling bearing fault diagnosis model, the IRF algorithm is parallelized on the Spark platform. First, an original RF model is obtained by training multiple decision trees in parallel. Second, the decision trees with low classification accuracy in the original RF model are filtered. Third, all path information of the reserved decision trees is obtained in parallel. Fourth, a decision tree similarity matrix is constructed in parallel to eliminate the decision trees which are prone to repeated voting. Finally, an IRF model which can diagnose rolling bearing faults quickly and effectively is obtained. A series of experiments are carried out to evaluate the effectiveness of the proposed rolling bearing fault diagnosis method based on Spark and IRF algorithm. The results show that the proposed method can not only achieve good fault diagnosis accuracy, but also have fast model training speed and fault diagnosis speed for large-scale rolling bearing datasets.

39 citations

Journal ArticleDOI
I. Crotty1, J. Lamas Valverde1, G. Laurenti1, M. C. S. Williams1, Antonino Zichichi1 
TL;DR: In this article, the resistive plate chamber (RPC) was used for muon trigger systems at future colliders, and the authors investigated various materials and discovered a new mode of operation that allowed them to operate at 150 Hz/cm 2.
Abstract: The good time and position resolution of the resistive plate chamber (RPC) make it an attractive candidate for muon trigger systems at future colliders. However, this device has severe rate problems that make it unusable above 1 Hz/cm 2 in its present form. We have investigated various materials and have also discovered a new mode of operation that allowed us to operate the RPC at 150 Hz/cm 2 . We discuss further improvements that may extend operation to even higher rates. We also discuss spark formation and explain the cause for the abnormally late spark signals.

39 citations

Journal ArticleDOI
TL;DR: This work proposes a parallel array operator, based on a specific form of matrix multiplication, that computes a comprehensive data summarization matrix that benefits statistical models, including PCA, linear regression, and variable selection and introduces two specialized array operators for dense and sparse data sets, respectively.
Abstract: Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. On the other hand, array DBMSs enable scalable computation with large matrices. With that motivation in mind, we propose a parallel array operator, based on a specific form of matrix multiplication, that computes a comprehensive data summarization matrix. By deriving equivalent equations based on the summarization matrix, statistical methods are adapted to work in two phases: (1) Parallel summarization of the data set in one pass; (2) Iteration exploiting the summarization matrix in many intermediate computations. We prove our summarization matrix captures essential statistical properties of the data set and it allows iterative algorithms to work faster in main memory, by decreasing the number of times the data set is scanned, and by reducing the number of CPU operations. Specifically, we show our summarization matrix benefits statistical models, including PCA, linear regression, and variable selection. From a systems perspective, we carefully study the efficient computation of the summarization matrix on the SciDB parallel array DBMS and how to exploit it in the R language statistical system. To achieve best performance, we introduce two specialized array operators for dense and sparse data sets, respectively. We present an experimental evaluation comparing SciDB, R, a columnar DBMS (a fast SQL engine), and Spark (a popular Hadoop system). Our experiments show R working together with SciDB eliminates main memory and performance limitations from R. More importantly, our R+SciDB prototype is significantly faster and more scalable than Spark and the columnar DBMS.

39 citations

Proceedings ArticleDOI
16 May 2016
TL;DR: OptEx is the first work that analytically models job completion time on Spark, and it is shown experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job under an SLO deadline with an accuracy of 98%.
Abstract: We present OptEx, a closed-form model of job execution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work that analytically models job completion time on Spark. The model can be used to estimate the completion time of a given Spark job on a cloud, with respect to the size of the input dataset, the number of iterations, the number of nodes comprising the underlying cluster. Experimental results demonstrate that OptEx yields a mean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the cost optimal cluster composition for running a given Spark job on a cloud under a completion deadline specified in the SLO (i.e., Service Level Objective). We show experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job under an SLO deadline with an accuracy of 98%.

39 citations

Patent
24 May 1946
TL;DR: In this article, an ionic discharge device with a stable uniform pulsing periodicity of discharge is described, and the principal object of the invention is to obtain a long operating life for spark Pulsing...
Abstract: 18 Claims. This invention relates to ionic discharge devices and more particularly to spark discharge devices operating at high voltage levels and having a stable uniform pulsing periodicity of discharge. The principal object of the invention is to obtain a long operating life for spark Pulsing...

39 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683