Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics

[...]

Bogdan Nicolae¹, Carlos Costa¹, Claudia Misale², Kostas Katrinis¹, Yoonho Park¹ - Show less +1 more•Institutions (2)

IBM¹, University of Turin²

01 Jun 2017-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation by dynamicallyadapting thePrefetching in Spark, a popular in-memory data analytics framework.

...read moreread less

Abstract: Big data analytics is an indispensable tool in transforming science, engineering, medicine, health-care, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. In this context, data shuffling, a particularly difficult transformation pattern, introduces important challenges. Specifically, data shuffling is a key component of complex computations that has a major impact on the overall performance and scalability. Thus, speeding up data shuffling is a critical goal. To this end, state-of-the-art solutions often rely on overlapping the data transfers with the shuffling phase. However, they employ simple mechanisms to decide how much data and where to fetch it from, which leads to sub-optimal performance and excessive auxiliary memory utilization for the purpose of prefetching. The latter aspect is a growing concern, given evidence that memory per computation unit is continuously decreasing while interconnect bandwidth is increasing. This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation. We implemented this novel strategy in Spark, a popular in-memory data analytics framework. To demonstrate the benefits of our proposal, we run extensive experiments on an HPC cluster with large core count per node. Compared with the default Spark shuffle strategy, our proposal shows: up to 40 percent better performance with 50 percent less memory utilization for buffering and excellent weak scalability.

...read moreread less

23 citations

Proceedings Article•DOI•

Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters

[...]

Nusrat Sharmin Islam¹, Md. Wasi-ur-Rahman¹, Xiaoyi Lu¹, Dipti Shankar¹, Dhabaleswar K. Panda¹ - Show less +1 more•Institutions (1)

Ohio State University¹

29 Oct 2015

TL;DR: This paper characterizes two file systems in literature, Tachyon and Triple-H, that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on the performance and fault tolerance of Hadoop MapReduce and Spark applications.

...read moreread less

Abstract: For data-intensive computing, the low throughput of the existing disk-bound storage systems is a major bottleneck. Recent emergence of the in-memory file systems with heterogeneous storage support mitigates this problem to a great extent. Parallel programming frameworks, e.g. Hadoop MapReduce and Spark are increasingly being run on such high-performance file systems. However, no comprehensive study has been done to analyze the impacts of the in-memory file systems on various Big Data applications. This paper characterizes two file systems in literature, Tachyon [17] and Triple-H [13] that support in-memory and heterogeneous storage, and discusses the impacts of these two architectures on the performance and fault tolerance of Hadoop MapReduce and Spark applications. We present a complete methodology for evaluating MapReduce and Spark workloads on top of in-memory file systems and provide insights about the interactions of different system components while running these workloads. We also propose advanced acceleration techniques to adapt Triple-H for iterative applications and study the impact of different parameters on the performance of MapReduce and Spark jobs on HPC systems. Our evaluations show that, although Tachyon is 5x faster than HDFS for primitive operations, Triple-H performs 47% and 2.4x better than Tachyon for MapReduce and Spark workloads, respectively. Triple-H also accelerates K-Means by 15% over HDFS and 9% over Tachyon.

...read moreread less

23 citations

Journal Article•DOI•

MNL-FDTD/SPICE Method for Fast Analysis of Short-Gap ESD in Complex Systems

[...]

Kazuhiro Fujita¹•Institutions (1)

Fujitsu¹

11 Mar 2016-IEEE Transactions on Electromagnetic Compatibility

TL;DR: In this paper, a magnetically mixed Newmark-Leapfrog finite-difference time-domain (MNL-FDTD) method was proposed for efficient three-dimensional electromagnetic simulations of transient interactions between a spark channel of air-discharge electrostatic discharge (ESD) occurred at a short gap and its surrounding environment.

...read moreread less

Abstract: This paper presents a magnetically mixed Newmark-Leapfrog finite-difference time-domain (MNL-FDTD) method for efficient three-dimensional electromagnetic simulations of transient interactions between a spark channel of air-discharge electrostatic discharge (ESD) occurred at a short gap and its surrounding environment. The formulation is based on introducing the implicit Newmark-Beta method into the explicit leapfrog scheme of Yee's FDTD method directionally and magnetically. The stability condition of the algorithm does not include the mesh step in the channel direction, and therefore, is more relaxed than the Courant–Friedrichs–Lewy condition of the Yee scheme. For combined full-wave/circuit systems involving air-discharge ESD, both the sequential and simultaneous solutions are discussed in the context of MNL-FDTD. A stable direct linking of MNL-FDTD with SPICE is presented to include a discharge current characterized by arbitrary spark resistance. The relaxed stability is maintained in the combined systems. The presented method is verified with three spark resistance formulae. The accuracy, stability, and computational efficiency of the method are demonstrated in comparison with the conventional approaches in several numerical examples.

...read moreread less

23 citations

Patent•

Spark pressure shaping

[...]

Inoue Kiyoshi

09 Dec 1963

23 citations

Patent•

Method and apparatus for monitoring operation of a spark ignition device in a gas turbine engine

[...]

Robert Charles Skerritt

23 Oct 1986

TL;DR: In this paper, a line (14) carries a voltage whose magnitude and duration correspond to those of an electrical field in a spark ignition device, and a comparator (21) provides an output signal if the magnitude of an energizing pulse is not less than a set value, and circuits (26) provide an indicating signal when the output from the comparator has less a predetermined duration.

...read moreread less

Abstract: A line (14) carries a voltage whose magnitude and duration correspond to those of an electrical field in a spark ignition device. A comparator (21) provides an output signal if the magnitude of an energising pulse is not less than a set value, and a circuit (26) provides an indicating signal if the output from the comparator (21) has less than a predetermined duration. Circuits 44, 45 provide a third signal if the time integral of spark discharge voltages in the ignition device, over a period set by a timing circuit (27), exceed a predetermined value. An output signal is provided at a terminal (16) only if the energising pulse and spark discharge characteristics are satisfactory.

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

7,304

Papers

74,604

Citations

No. of papers in the topic in previous years
Year	Papers
2022	10
2021	429
2020	525
2019	661
2018	758
2017	683

Spark (mathematics)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics