scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Proceedings ArticleDOI
Tatsuhiro Chiba1, Tamiya Onodera1
17 Apr 2016
TL;DR: This paper used the TPC-H benchmark as the optimization case study and gathered many perspective logs such as application, JVM, OS parameters, Spark configuration, and application code based on CPU characteristics to introduce several JVM and OS parameter optimization approaches for accelerating Spark performance.
Abstract: Besides being an in-memory-oriented computing framework, Spark runs on top of Java Virtual Machines (JVMs), so JVM parameters must be tuned to improve Spark application performance. Misconfigured parameters and settings degrade performance. For example, using Java heaps that are too large often causes a long garbage collection pause time, which accounts for over 10–20% of application execution time. Moreover, recent computing nodes have many cores with simultaneous multi-threading technology and the processors on the node are connected via NUMA, so it is difficult to exploit best performance without taking into account of these hardware features. Thus, optimization in a full stack is also important. Not only JVM parameters but also OS parameters, Spark configuration, and application code based on CPU characteristics need to be optimized to take full advantage of underlying computing resources. In this paper, we used the TPC-H benchmark as our optimization case study and gathered many perspective logs such as application, JVM (e.g. GC and JIT), system utilization, and hardware events from a performance monitoring unit. We discuss current problems and introduce several JVM and OS parameter optimization approaches for accelerating Spark performance. As a result, our optimization exhibits 30–40% increase in speed on average and is up to 5x faster than the naive configuration.

51 citations

Journal ArticleDOI
TL;DR: A distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency and is different and better than state-of-the-art static association rulemining algorithms.
Abstract: Extraction of valuable data from extensive datasets is a standout amongst the most vital exploration issues. Association rule mining is one of the highly used methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent itemsets) is most crucial part of the association rule mining task. Many single-machine based association rule mining algorithms exist but the massive amount of data available these days is above the capacity of a single machine based algorithm. Therefore, to meet the demands of this ever-growing enormous data, there is a need for distributed association rule mining algorithm which can run on multiple machines. For these types of parallel/distributed applications, MapReduce is one of the best fault-tolerant frameworks. Hadoop is one of the most popular open-source software frameworks with MapReduce based approach for distributed storage and processing of large datasets using standalone clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce based platforms are being developed for parallel computing in recent years. Among them, a platform, namely, Spark have attracted a lot of attention because of its inbuilt support to distributed computations. Therefore, we implemented a distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency. Adaptive-Miner uses an adaptive strategy based on the partial processing of datasets. Adaptive-Miner makes execution plans before every iteration and goes with the best suitable plan to minimize time and space complexity. Adpative-Miner is a dynamic association rule mining algorithm which change its approach based on the nature of dataset. Therefore, it is different and better than state-of-the-art static association rule mining algorithms. We conduct in-depth experiments to gain insight into the effectiveness, efficiency, and scalability of the Adaptive-Miner algorithm on Spark. Available: https://github.com/sanjaysinghrathi/Adaptive-Miner

51 citations

Journal ArticleDOI
01 Mar 2017-Energy
TL;DR: In this article, a self-optimization strategy for lean-burn operation mode of spark-ignition (SI) engine is presented, which aims on-board combustion phase tuning to achieve high efficiency under a probability constraint of knocking events.

51 citations

Proceedings ArticleDOI
18 Jun 2014
TL;DR: This paper develops a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost, and develops techniques based on matrix factorizations to contain epidemics of change in linear algebra.
Abstract: Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.

51 citations

Journal ArticleDOI
TL;DR: In this paper, a technique based on Mie scattering interferometry (MSI) was used to size droplets in planar laser light scattering for the case of a scattering angle range close to 90°.
Abstract: A theoretical explanation is given of a technique based on Mie scattering interferometry (MSI), obtained by defocusing of the collecting optics, to size droplets. The originality of this study is the development of a droplet sizing method by planar laser light scattering for the case of a scattering angle range close to 90°. The feasibility of this method and its limitations are fully described. The dependence on intensity levels and refractive index variations can be neglected. After discussion of some practical details about particle size, imaging and camera constraints, the results obtained in the combustion chamber of a spark ignition (SI) engine, near the spark plug, prior to ignition and for different injection timings are described and discussed. It can be concluded that the implementation of the MSI method in this experimental set-up has been realized successfully to provide droplet distributions in an SI engine. To allow the easier use of the technique, image processing software will be developed in the Matlab environment.

50 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683