scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: MCA SGD, a method for distributed training of deep neural networks that is specifically designed to run in low-budget environments, and runs on top of the popular Apache Spark framework, achieves significantly faster convergence rates than many popular alternatives.
Abstract: Many distributed deep learning systems have been published over the past few years, often accompanied by impressive performance claims. In practice these figures are often achieved in high performance computing (HPC) environments with fast InfiniBand network connections. For average deep learning practitioners this is usually an unrealistic scenario, since they cannot afford access to these facilities. Simple re-implementations of algorithms such as EASGD [1] for standard Ethernet environments often fail to replicate the scalability and performance of the original works [2] . In this paper, we explore this particular problem domain and present MPCA SGD, a method for distributed training of deep neural networks that is specifically designed to run in low-budget environments. MPCA SGD tries to make the best possible use of available resources, and can operate well if network bandwidth is constrained. Furthermore, MPCA SGD runs on top of the popular Apache Spark [3] framework. Thus, it can easily be deployed in existing data centers and office environments where Spark is already used. When training large deep learning models in a gigabit Ethernet cluster, MPCA SGD achieves significantly faster convergence rates than many popular alternatives. For example, MPCA SGD can train ResNet-152 [4] up to 5.3x faster than state-of-the-art systems like MXNet [5] , up to 5.3x faster than bulk-synchronous systems like SparkNet [6] and up to 5.3x faster than decentral asynchronous systems like EASGD [1] .

27 citations

Proceedings ArticleDOI
06 May 2002

27 citations

Journal ArticleDOI
TL;DR: In this paper, the authors measured the shock wave emitted by a 4m spark of energy 2×104 J at distances from spark midgap of between 0.34 and 16.5 m. The discrepancies between the experimental data and cylindrical shockwave theory are partially explained by consideration of the spark channel tortuosity.
Abstract: The shock wave emitted by a 4‐m spark of energy 2×104 J has been measured at distances from spark midgap of between 0.34 and 16.5 m. Close to the spark, a single dominant shock wave is observed; farther from the spark, a number of significant shock waves (generally 3 or 4) are observed. For distances less than 2 m, both the shock overpressure and the duration of the overpressure are between a factor of 1.5 to 5 less than predicted by cylindrical shock‐wave theory. The discrepancies between the experimental data and cylindrical shock‐wave theory are partially explained by consideration of the spark channel tortuosity.

27 citations

Journal ArticleDOI
TL;DR: A thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark and introduces Spark programming model and computing system, and discusses the pros and cons.
Abstract: With the explosive increase of big data, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays, due to its good properties including generality, fault tolerance, high performance of in-memory data processing, and scalability. Spark adopts a flexible Resident Distributed Dataset(RDD) programming model with a set of provided transformation and action operators whose operating functions can be customized by users according to their applications. It is originally positioned as a fast and general data processing system. A large body of research efforts have been made to make it more efficient(faster) and general by considering various circumstances since its introduction. In this survey, we aim to have a thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark. We introduce Spark programming model and computing system, discuss the pros and cons of Spark, and have an investigation on various solving techniques in the literature. Moreover, we also introduce various data management and processing systems, machine learning algorithms and applications supported by Spark. Finally, we make a discussion on the open issues and challenges for Spark.

27 citations

Posted Content
TL;DR: State management and its use in diverse applications varies widely across big data processing systems as discussed by the authors, which is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm.
Abstract: State management and its use in diverse applications varies widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to some open problems.

27 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683