Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Workload characterization and optimization of TPC-H queries on Apache Spark

[...]

Tatsuhiro Chiba¹, Tamiya Onodera¹•Institutions (1)

IBM¹

17 Apr 2016

TL;DR: This paper used the TPC-H benchmark as the optimization case study and gathered many perspective logs such as application, JVM, OS parameters, Spark configuration, and application code based on CPU characteristics to introduce several JVM and OS parameter optimization approaches for accelerating Spark performance.

...read moreread less

Abstract: Besides being an in-memory-oriented computing framework, Spark runs on top of Java Virtual Machines (JVMs), so JVM parameters must be tuned to improve Spark application performance. Misconfigured parameters and settings degrade performance. For example, using Java heaps that are too large often causes a long garbage collection pause time, which accounts for over 10–20% of application execution time. Moreover, recent computing nodes have many cores with simultaneous multi-threading technology and the processors on the node are connected via NUMA, so it is difficult to exploit best performance without taking into account of these hardware features. Thus, optimization in a full stack is also important. Not only JVM parameters but also OS parameters, Spark configuration, and application code based on CPU characteristics need to be optimized to take full advantage of underlying computing resources. In this paper, we used the TPC-H benchmark as our optimization case study and gathered many perspective logs such as application, JVM (e.g. GC and JIT), system utilization, and hardware events from a performance monitoring unit. We discuss current problems and introduce several JVM and OS parameter optimization approaches for accelerating Spark performance. As a result, our optimization exhibits 30–40% increase in speed on average and is up to 5x faster than the naive configuration.

...read moreread less

51 citations

Journal Article•DOI•

Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark

[...]

Sanjay Rathee¹, Arti Kashyap¹•Institutions (1)

Indian Institute of Technology Mandi¹

20 Feb 2018-Journal of Big Data

TL;DR: A distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency and is different and better than state-of-the-art static association rulemining algorithms.

...read moreread less

Abstract: Extraction of valuable data from extensive datasets is a standout amongst the most vital exploration issues. Association rule mining is one of the highly used methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent itemsets) is most crucial part of the association rule mining task. Many single-machine based association rule mining algorithms exist but the massive amount of data available these days is above the capacity of a single machine based algorithm. Therefore, to meet the demands of this ever-growing enormous data, there is a need for distributed association rule mining algorithm which can run on multiple machines. For these types of parallel/distributed applications, MapReduce is one of the best fault-tolerant frameworks. Hadoop is one of the most popular open-source software frameworks with MapReduce based approach for distributed storage and processing of large datasets using standalone clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce based platforms are being developed for parallel computing in recent years. Among them, a platform, namely, Spark have attracted a lot of attention because of its inbuilt support to distributed computations. Therefore, we implemented a distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency. Adaptive-Miner uses an adaptive strategy based on the partial processing of datasets. Adaptive-Miner makes execution plans before every iteration and goes with the best suitable plan to minimize time and space complexity. Adpative-Miner is a dynamic association rule mining algorithm which change its approach based on the nature of dataset. Therefore, it is different and better than state-of-the-art static association rule mining algorithms. We conduct in-depth experiments to gain insight into the effectiveness, efficiency, and scalability of the Adaptive-Miner algorithm on Spark. Available: https://github.com/sanjaysinghrathi/Adaptive-Miner

...read moreread less

51 citations

Journal Article•DOI•

Spark advance self-optimization with knock probability threshold for lean-burn operation mode of SI engine

[...]

Xun Shen¹, Yahui Zhang¹, Tielong Shen¹, Chanyut Khajorntraidet¹•Institutions (1)

Sophia University¹

01 Mar 2017-Energy

TL;DR: In this article, a self-optimization strategy for lean-burn operation mode of spark-ignition (SI) engine is presented, which aims on-board combustion phase tuning to achieve high efficiency under a probability constraint of knocking events.

...read moreread less

51 citations

Proceedings Article•DOI•

LINVIEW: incremental view maintenance for complex analytical queries

[...]

Milos Nikolic¹, Mohammed Elseidy¹, Christoph Koch¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

18 Jun 2014

TL;DR: This paper develops a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost, and develops techniques based on matrix factorizations to contain epidemics of change in linear algebra.

...read moreread less

Abstract: Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.

...read moreread less

51 citations

Journal Article•DOI•

Droplet Sizing by Mie Scattering Interferometry in a Spark Ignition Engine

[...]

Christine Mounaïm-Rousselle, Olivier Pajot

01 Aug 1999-Particle & Particle Systems Characterization

TL;DR: In this paper, a technique based on Mie scattering interferometry (MSI) was used to size droplets in planar laser light scattering for the case of a scattering angle range close to 90°.

...read moreread less

Abstract: A theoretical explanation is given of a technique based on Mie scattering interferometry (MSI), obtained by defocusing of the collecting optics, to size droplets. The originality of this study is the development of a droplet sizing method by planar laser light scattering for the case of a scattering angle range close to 90°. The feasibility of this method and its limitations are fully described. The dependence on intensity levels and refractive index variations can be neglected. After discussion of some practical details about particle size, imaging and camera constraints, the results obtained in the combustion chamber of a spark ignition (SI) engine, near the spark plug, prior to ignition and for different injection timings are described and discussed. It can be concluded that the implementation of the MSI method in this experimental set-up has been realized successfully to provide droplet distributions in an SI engine. To allow the easier use of the technique, image processing software will be developed in the Matlab environment.

...read moreread less

50 citations

Collapse

Network Information

Performance

Metrics

7,304

Papers

74,604

Citations

No. of papers in the topic in previous years
Year	Papers
2022	10
2021	429
2020	525
2019	661
2018	758
2017	683

Spark (mathematics)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics