Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An Efficient Rolling Bearing Fault Diagnosis Method Based on Spark and Improved Random Forest Algorithm

[...]

Lanjun Wan¹, Kun Gong¹, Gen Zhang¹, Xinpan Yuan¹, Changyun Li¹, Xiaojun Deng¹ - Show less +2 more•Institutions (1)

Hunan University of Technology¹

04 Mar 2021-IEEE Access

TL;DR: Wang et al. as discussed by the authors proposed an efficient rolling bearing fault diagnosis method based on Spark and improved random forest (IRF) algorithm by eliminating the decision trees with low classification accuracy and those prone to repeated voting in the original RF, an improved RF with faster diagnosis speed and higher classification accuracy is constructed.

...read moreread less

Abstract: The random forest (RF) algorithm is a typical representative of ensemble learning, which is widely used in rolling bearing fault diagnosis. In order to solve the problems of slower diagnosis speed and repeated voting of traditional RF algorithm in rolling bearing fault diagnosis under the big data environment, an efficient rolling bearing fault diagnosis method based on Spark and improved random forest (IRF) algorithm is proposed. By eliminating the decision trees with low classification accuracy and those prone to repeated voting in the original RF, an improved RF with faster diagnosis speed and higher classification accuracy is constructed. For the massive rolling bearing vibration data, in order to improve the training speed and diagnosis speed of the rolling bearing fault diagnosis model, the IRF algorithm is parallelized on the Spark platform. First, an original RF model is obtained by training multiple decision trees in parallel. Second, the decision trees with low classification accuracy in the original RF model are filtered. Third, all path information of the reserved decision trees is obtained in parallel. Fourth, a decision tree similarity matrix is constructed in parallel to eliminate the decision trees which are prone to repeated voting. Finally, an IRF model which can diagnose rolling bearing faults quickly and effectively is obtained. A series of experiments are carried out to evaluate the effectiveness of the proposed rolling bearing fault diagnosis method based on Spark and IRF algorithm. The results show that the proposed method can not only achieve good fault diagnosis accuracy, but also have fast model training speed and fault diagnosis speed for large-scale rolling bearing datasets.

...read moreread less

39 citations

Journal Article•DOI•

The non-spark mode and high rate operation of resistive parallel plate chambers

[...]

I. Crotty¹, J. Lamas Valverde¹, G. Laurenti¹, M. C. S. Williams¹, Antonino Zichichi¹ - Show less +1 more•Institutions (1)

CERN¹

01 Jan 1994-Nuclear Instruments & Methods in Physics Research Section A-accelerators Spectrometers Detectors and Associated Equipment

TL;DR: In this article, the resistive plate chamber (RPC) was used for muon trigger systems at future colliders, and the authors investigated various materials and discovered a new mode of operation that allowed them to operate at 150 Hz/cm 2.

...read moreread less

Abstract: The good time and position resolution of the resistive plate chamber (RPC) make it an attractive candidate for muon trigger systems at future colliders. However, this device has severe rate problems that make it unusable above 1 Hz/cm 2 in its present form. We have investigated various materials and have also discovered a new mode of operation that allowed us to operate the RPC at 150 Hz/cm 2 . We discuss further improvements that may extend operation to even higher rates. We also discuss spark formation and explain the cause for the abnormally late spark signals.

...read moreread less

39 citations

Journal Article•DOI•

The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics

[...]

Carlos Ordonez¹, Yiqun Zhang¹, Wellington Cabrera¹•Institutions (1)

University of Houston¹

01 Jul 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a parallel array operator, based on a specific form of matrix multiplication, that computes a comprehensive data summarization matrix that benefits statistical models, including PCA, linear regression, and variable selection and introduces two specialized array operators for dense and sparse data sets, respectively.

...read moreread less

Abstract: Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. On the other hand, array DBMSs enable scalable computation with large matrices. With that motivation in mind, we propose a parallel array operator, based on a specific form of matrix multiplication, that computes a comprehensive data summarization matrix. By deriving equivalent equations based on the summarization matrix, statistical methods are adapted to work in two phases: (1) Parallel summarization of the data set in one pass; (2) Iteration exploiting the summarization matrix in many intermediate computations. We prove our summarization matrix captures essential statistical properties of the data set and it allows iterative algorithms to work faster in main memory, by decreasing the number of times the data set is scanned, and by reducing the number of CPU operations. Specifically, we show our summarization matrix benefits statistical models, including PCA, linear regression, and variable selection. From a systems perspective, we carefully study the efficient computation of the summarization matrix on the SciDB parallel array DBMS and how to exploit it in the R language statistical system. To achieve best performance, we introduce two specialized array operators for dense and sparse data sets, respectively. We present an experimental evaluation comparing SciDB, R, a columnar DBMS (a fast SQL engine), and Spark (a popular Hadoop system). Our experiments show R working together with SciDB eliminates main memory and performance limitations from R. More importantly, our R+SciDB prototype is significantly faster and more scalable than Spark and the columnar DBMS.

...read moreread less

39 citations

Proceedings Article•DOI•

OptEx: a deadline-aware cost optimization model for spark

[...]

Subhajit Sidhanta¹, Wojciech Golab², Supratik Mukhopadhyay¹•Institutions (2)

Louisiana State University¹, University of Waterloo²

16 May 2016

TL;DR: OptEx is the first work that analytically models job completion time on Spark, and it is shown experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job under an SLO deadline with an accuracy of 98%.

...read moreread less

Abstract: We present OptEx, a closed-form model of job execution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work that analytically models job completion time on Spark. The model can be used to estimate the completion time of a given Spark job on a cloud, with respect to the size of the input dataset, the number of iterations, the number of nodes comprising the underlying cluster. Experimental results demonstrate that OptEx yields a mean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the cost optimal cluster composition for running a given Spark job on a cloud under a completion deadline specified in the SLO (i.e., Service Level Objective). We show experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job under an SLO deadline with an accuracy of 98%.

...read moreread less

39 citations

Patent•

Ionic discharge device

[...]

Depew Charles

24 May 1946

TL;DR: In this article, an ionic discharge device with a stable uniform pulsing periodicity of discharge is described, and the principal object of the invention is to obtain a long operating life for spark Pulsing...

...read moreread less

Abstract: 18 Claims. This invention relates to ionic discharge devices and more particularly to spark discharge devices operating at high voltage levels and having a stable uniform pulsing periodicity of discharge. The principal object of the invention is to obtain a long operating life for spark Pulsing...

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

7,304

Papers

74,604

Citations

No. of papers in the topic in previous years
Year	Papers
2022	10
2021	429
2020	525
2019	661
2018	758
2017	683

Spark (mathematics)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics