scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation by dynamicallyadapting thePrefetching in Spark, a popular in-memory data analytics framework.
Abstract: Big data analytics is an indispensable tool in transforming science, engineering, medicine, health-care, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. In this context, data shuffling, a particularly difficult transformation pattern, introduces important challenges. Specifically, data shuffling is a key component of complex computations that has a major impact on the overall performance and scalability. Thus, speeding up data shuffling is a critical goal. To this end, state-of-the-art solutions often rely on overlapping the data transfers with the shuffling phase. However, they employ simple mechanisms to decide how much data and where to fetch it from, which leads to sub-optimal performance and excessive auxiliary memory utilization for the purpose of prefetching. The latter aspect is a growing concern, given evidence that memory per computation unit is continuously decreasing while interconnect bandwidth is increasing. This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation. We implemented this novel strategy in Spark, a popular in-memory data analytics framework. To demonstrate the benefits of our proposal, we run extensive experiments on an HPC cluster with large core count per node. Compared with the default Spark shuffle strategy, our proposal shows: up to 40 percent better performance with 50 percent less memory utilization for buffering and excellent weak scalability.

23 citations

Proceedings ArticleDOI
29 Oct 2015
TL;DR: This paper characterizes two file systems in literature, Tachyon and Triple-H, that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on the performance and fault tolerance of Hadoop MapReduce and Spark applications.
Abstract: For data-intensive computing, the low throughput of the existing disk-bound storage systems is a major bottleneck. Recent emergence of the in-memory file systems with heterogeneous storage support mitigates this problem to a great extent. Parallel programming frameworks, e.g. Hadoop MapReduce and Spark are increasingly being run on such high-performance file systems. However, no comprehensive study has been done to analyze the impacts of the in-memory file systems on various Big Data applications. This paper characterizes two file systems in literature, Tachyon [17] and Triple-H [13] that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on the performance and fault tolerance of Hadoop MapReduce and Spark applications. We present a complete methodology for evaluating MapReduce and Spark workloads on top of in-memory file systems and provide insights about the interactions of different system components while running these workloads. We also propose advanced acceleration techniques to adapt Triple-H for iterative applications and study the impact of different parameters on the performance of MapReduce and Spark jobs on HPC systems. Our evaluations show that, although Tachyon is 5x faster than HDFS for primitive operations, Triple-H performs 47% and 2.4x better than Tachyon for MapReduce and Spark workloads, respectively. Triple-H also accelerates K-Means by 15% over HDFS and 9% over Tachyon.

23 citations

Journal ArticleDOI
Kazuhiro Fujita1
TL;DR: In this paper, a magnetically mixed Newmark-Leapfrog finite-difference time-domain (MNL-FDTD) method was proposed for efficient three-dimensional electromagnetic simulations of transient interactions between a spark channel of air-discharge electrostatic discharge (ESD) occurred at a short gap and its surrounding environment.
Abstract: This paper presents a magnetically mixed Newmark-Leapfrog finite-difference time-domain (MNL-FDTD) method for efficient three-dimensional electromagnetic simulations of transient interactions between a spark channel of air-discharge electrostatic discharge (ESD) occurred at a short gap and its surrounding environment. The formulation is based on introducing the implicit Newmark-Beta method into the explicit leapfrog scheme of Yee's FDTD method directionally and magnetically. The stability condition of the algorithm does not include the mesh step in the channel direction, and therefore, is more relaxed than the Courant–Friedrichs–Lewy condition of the Yee scheme. For combined full-wave/circuit systems involving air-discharge ESD, both the sequential and simultaneous solutions are discussed in the context of MNL-FDTD. A stable direct linking of MNL-FDTD with SPICE is presented to include a discharge current characterized by arbitrary spark resistance. The relaxed stability is maintained in the combined systems. The presented method is verified with three spark resistance formulae. The accuracy, stability, and computational efficiency of the method are demonstrated in comparison with the conventional approaches in several numerical examples.

23 citations

Patent
09 Dec 1963

23 citations

Patent
23 Oct 1986
TL;DR: In this paper, a line (14) carries a voltage whose magnitude and duration correspond to those of an electrical field in a spark ignition device, and a comparator (21) provides an output signal if the magnitude of an energizing pulse is not less than a set value, and circuits (26) provide an indicating signal when the output from the comparator has less a predetermined duration.
Abstract: A line (14) carries a voltage whose magnitude and duration correspond to those of an electrical field in a spark ignition device. A comparator (21) provides an output signal if the magnitude of an energising pulse is not less than a set value, and a circuit (26) provides an indicating signal if the output from the comparator (21) has less than a predetermined duration. Circuits 44, 45 provide a third signal if the time integral of spark discharge voltages in the ignition device, over a period set by a timing circuit (27), exceed a predetermined value. An output signal is provided at a terminal (16) only if the energising pulse and spark discharge characteristics are satisfactory.

23 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683