scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Proceedings ArticleDOI
01 Aug 2015
TL;DR: An air quality prediction model using the parallelized random forest algorithm implemented using Spark on the basis of resilient distributed dataset and shared variable and the results prove the effectiveness and scalability of the method when deal with big data.
Abstract: As particulate materials in the air can cause several kinds of respiratory and cardiovascular diseases, the air quality information predicting attracts more and more attention. Knowing these information in advance is very important to protect human from health problems. With the development of computer technology, the data we can collect is increasingly becoming fine-grained. Most important of all, they need to be analyzed in real-time. However, existing methods could not meet the demand of real-time analysis. In this paper, we predict air quality based on a Spark implementation of random forest algorithm. First, a distributed random forest algorithm is implemented using Spark on the basis of resilient distributed dataset and shared variable. Then, we build an air quality prediction model using the parallelized random forest algorithm. The proposed method is evaluated with real meteorology data obtained from Beijing. The experiment results show that the proposed method is fast in predicting concentration level of PM2.5. And the results also prove the effectiveness and scalability of our method when deal with big data.

22 citations

Patent
08 Oct 2004
TL;DR: On second conductive region (2) is formed second spark tip in direction of first region (1). Between first region spark tip (11) and second sparktip remains gap (d2) which is greater than gap between each spark tip and opposite rim of opposite region as discussed by the authors.
Abstract: On second conductive region (2) is formed second spark tip in direction of first region (1). Between first region spark tip (11) and second spark tip remains gap (d2) which is greater than gap (d0) between each spark tip and opposite rim of opposite region.Preferably both regions, including spark tips, are structured by etching. PCB may be coated by non-conductive protective layer in section adjacent to spark tip and opposite rim, while spark path proper is without any protective layer.

22 citations

Book ChapterDOI
23 Oct 2016
TL;DR: This work investigates the impact of the most important of the tunable Spark parameters on the application performance and offers a trial-and-error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs.
Abstract: Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. In this work, we investigate the impact of the most important of the tunable Spark parameters on the application performance and guide developers on how to proceed to changes to the default values. We conduct a series of experiments and we offer a trial-and-error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. We test our methodology in three case studies, where we manage to achieve speedups of more than 10 times.

22 citations

Journal ArticleDOI
TL;DR: This paper proposes an approach to Automatically Configure Spark workloads, named ACS, which constructs performance models as functions of Spark configuration parameters by using random forest which is an ensemble learning algorithm and leverages genetic algorithm to search the optimum configuration by taking configurations and the corresponding performance predicted by theperformance models as inputs.

22 citations

Proceedings ArticleDOI
23 Jul 2015
TL;DR: This paper designs and implements FLOWPROPHET, a general framework to predict traffic flows for DCFs, and demonstrates that it can achieve almost 100% accuracy in source, destination, and flow size predictions.
Abstract: Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOW PROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOW PROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow size, establish time), ahead-of-time for all flows. We also provide generic programming interface to FLOW PROPHET, so that current and future DCFs can deploy FLOW PROPHET readily. We implement FLOW PROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOW PROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOW PROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.

22 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683