Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Book•

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

[...]

Holden Karau, Rachel Warren

25 May 2017

TL;DR: This practical book describes techniques that can reduce data infrastructure costs and developer hours and demonstrates performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

...read moreread less

Abstract: Apache Spark is amazing when everything clicks. But if you havent seen the performance improvements you expected, or still dont feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, youll also learn how to make it sing. With this book, youll explore: How Spark SQLs new interfaces improve performance over SQLs RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Sparks key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Sparks Streaming components and external community packages

...read moreread less

50 citations

Dissertation•DOI•

Spark ignition : experimental and numerical investigation with application to aviation safety

[...]

Sally P. M. Bane

11 Jun 2010

TL;DR: In this article, a two-dimensional model of spark discharge in air and spark ignition was developed using the non-reactive and reactive Navier-Stokes equations, and methods for calculating effective one-step parameters were developed using constant pressure explosion theory.

...read moreread less

Abstract: Determining the risk of accidental ignition of flammable mixtures is a topic of tremendous importance in industry and aviation safety. The concept of minimum ignition energy (MIE) has traditionally formed the basis for studying ignition hazards of fuels. However, in recent years, particularly in the aviation safety industry, the viewpoint has changed to one where ignition is statistical in nature. Approaching ignition as statistical rather than a threshold phenomenon appears to be more consistent with the inherent variability in the engineering test data. Ignition tests were performed in lean hydrogen-based aviation test mixtures and in two hexane-air mixtures using low-energy capacitive spark ignition systems. Tests were carried out using both short, fixed sparks (1 to 2 mm) and variable length sparks up to 10 mm. The results were analyzed using statistical tools to obtain probability distributions for ignition versus spark energy and spark energy density (energy per unit spark length). Results show that a single threshold MIE value does not exist, and that the energy per unit length may be a more appropriate parameter for quantifying the risk of ignition than only the energy. The probability of ignition versus spark charge was also investigated, and the statistical results for the spark charge and spark energy density were compared. It was found that the test results were less variable with respect to the spark charge than the energy density. However, variability was still present due to phenomena such as plasma instabilities and cathode effects that are caused by the electrodynamics. Work was also done to develop a two-dimensional numerical model of spark ignition that accurately simulates all physical scales of the fluid mechanics and chemistry. In this work a two-dimensional model of spark discharge in air and spark ignition was developed using the non-reactive and reactive Navier-Stokes equations. One-step chemistry models were used to allow for highly resolved simulations, and methods for calculating effective one-step parameters were developed using constant pressure explosion theory. The one-step model was tuned to accurately simulate the flame speed, temperature, and straining behavior using one-dimensional flame computations. The simulations were performed with three different electrode geometries to investigate the effect of the geometry on the fluid mechanics of the evolving spark kernel and on flame formation. The computational results were compared with high-speed schlieren visualization of spark and ignition kernels. It was found that the electrode geometry had a significant effect on the fluid motion following spark discharge and hence influences the ignition process.

...read moreread less

50 citations

Proceedings Article•DOI•

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

[...]

Jags Ramnarayan, Barzan Mozafari¹, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, Rishitesh Mishra, Kishor Bachhav - Show less +6 more•Institutions (1)

University of Michigan¹

26 Jun 2016

TL;DR: This work proposes a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics).

...read moreread less

Abstract: In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.

...read moreread less

50 citations

Journal Article•DOI•

Multi-class imbalanced big data classification on Spark

[...]

William C. Sleeman¹, Bartosz Krawczyk¹•Institutions (1)

Virginia Commonwealth University¹

05 Jan 2021-Knowledge Based Systems

TL;DR: This paper proposes the first compound framework for dealing with multi-class big data problems, addressing at the same time the existence of multiple classes and high volumes of data, and proposes an efficient implementation of the discussed algorithm on Apache Spark.

...read moreread less

Abstract: Despite more than two decades of progress, learning from imbalanced data is still considered as one of the contemporary challenges in machine learning. This has been further complicated by the advent of the big data era, where popular algorithms dedicated to alleviating the class skew impact are no longer feasible due to the volume of datasets. Additionally, most of existing algorithms focus on binary imbalanced problems, where majority and minority classes are well-defined. Multi-class imbalanced data poses further challenges as the relationship between classes is much more complex and simple decomposition into a number of binary problems leads to a significant loss of information. In this paper, we propose the first compound framework for dealing with multi-class big data problems, addressing at the same time the existence of multiple classes and high volumes of data. We propose to analyze the instance-level difficulties in each class, leading to understanding what causes learning difficulties. We embed this information in popular resampling algorithms which allows for informative balancing of multiple classes. We propose an efficient implementation of the discussed algorithm on Apache Spark, including a novel version of SMOTE that overcomes spatial limitations in distributed environments of its predecessor. Extensive experimental study shows that using instance-level information significantly improves learning from multi-class imbalanced big data. Our framework can be downloaded from https://github.com/fsleeman/minority-type-imbalanced .

...read moreread less

50 citations

Proceedings Article•DOI•

Spark Ignition Producer Gas Engine and Dedicated Compressed Natural Gas Engine - Technology Development and Experimental Performance Optimisation

[...]

Shashikantha, P. P. Parikh¹•Institutions (1)

Indian Institutes of Technology¹

25 Oct 1999

50 citations

Collapse

Network Information

Performance

Metrics

7,304

Papers

74,604

Citations

No. of papers in the topic in previous years
Year	Papers
2022	10
2021	429
2020	525
2019	661
2018	758
2017	683

Spark (mathematics)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics