Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Posted Content•

DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility

[...]

Hanjoo Kim, Jaehong Park, Jaehee Jang, Sungroh Yoon

26 Feb 2016-arXiv: Learning

TL;DR: DeepSpark is proposed, a distributed and parallel deep learning framework that simultaneously exploits Apache Spark for large-scale distributed data management and Caffe for GPU-based acceleration.

...read moreread less

Abstract: The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data process pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning framework that simultaneously exploits Apache Spark for large-scale distributed data management and Caffe for GPU-based acceleration. DeepSpark directly accepts Caffe input specifications, providing seamless compatibility with existing designs and network structures. To support parallel operations, DeepSpark automatically distributes workloads and parameters to Caffe-running nodes using Spark and iteratively aggregates training results by a novel lock-free asynchronous variant of the popular elastic averaging stochastic gradient descent (SGD) update scheme, effectively complementing the synchronized processing capabilities of Spark. DeepSpark is an on-going project, and the current release is available at this http URL

...read moreread less

37 citations

Journal Article•DOI•

Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark

[...]

Bilal Iqbal¹, Waheed Iqbal¹, Nazar Khan¹, Arif Mahmood², Abdelkarim Erradi³ - Show less +1 more•Institutions (3)

College of Information Technology¹, Information Technology University², Qatar University³

01 Mar 2020-Cluster Computing

TL;DR: This paper proposes and evaluates cloud services for high resolution video streams in order to perform line detection using Canny edge detection followed by Hough transform in Hadoop and Spark and demonstrates the effectiveness of parallel implementation of computer vision algorithms to achieve good scalability for real-world applications.

...read moreread less

Abstract: Nowadays, video cameras are increasingly used for surveillance, monitoring, and activity recording. These cameras generate high resolution image and video data at large scale. Processing such large scale video streams to extract useful information with time constraints is challenging. Traditional methods do not offer scalability to process large scale data. In this paper, we propose and evaluate cloud services for high resolution video streams in order to perform line detection using Canny edge detection followed by Hough transform. These algorithms are often used as preprocessing steps for various high level tasks including object, anomaly, and activity recognition. We implement and evaluate both Canny edge detector and Hough transform algorithms in Hadoop and Spark. Our experimental evaluation using Spark shows an excellent scalability and performance compared to Hadoop and standalone implementations for both Canny edge detection and Hough transform. We obtained a speedup of 10.8$$\times$$ and 9.3$$\times$$ for Canny edge detection and Hough transform respectively using Spark. These results demonstrate the effectiveness of parallel implementation of computer vision algorithms to achieve good scalability for real-world applications.

...read moreread less

37 citations

Journal Article•DOI•

Measurement of effective blast energy for direct initiation of spherical gaseous detonations from high-voltage spark discharge

[...]

Bo Zhang¹, Hoi Dick Ng², John H.S. Lee³•Institutions (3)

Beijing Institute of Technology¹, Concordia University², McGill University³

01 Jan 2012-Shock Waves

TL;DR: In this paper, the authors investigated the effective energy from spark discharge for direct blast initiation of spherical gaseous detonations using a piezoelectric pressure transducer.

...read moreread less

Abstract: In this study, effective energy from spark discharge for direct blast initiation of spherical gaseous detonations is investigated. In the experiment, direct initiation of detonation is achieved via a spark discharge from a high-voltage and low-inductance capacitor bank and the spark energy is estimated from the analysis of the current output. To determine the blast wave energy from the powerful spark, the time-of-arrival of the blast wave in air is measured at different radii using a piezoelectric pressure transducer. Good agreement is found in the scaled blast trajectories, i.e., scaled time c o·t/R o where c o is the ambient sound speed, as a function of blast radius R s/R o between the numerical simulation of a spherical blast wave from a point energy source and the experimental results where the explosion length scale R o is computed using the equivalent spark energy from the first 1/4 current discharge cycle. Alternatively, by fitting the experimental trajectories data, the blast energy estimated from the numerical simulation appears also in good agreement with that obtained experimentally using the 1/4 cycle criterion. Using the 1/4 cycle of spark discharge for the effective energy, direct initiation experiments of spherical gaseous detonations are carried out to determine the critical initiation energy in C2H2–2.5O2 mixtures with 70 and 0% argon dilution. The experimental results obtained from the 1/4 cycle of spark discharge agree well with the prediction from two initiation models, namely, the Lee’s surface energy model and a simplified work done model. The main source of discrepancy in the comparison can be explained by the uncertainty of cell size measurement which is needed for both the semi-empirical models.

...read moreread less

37 citations

Journal Article•DOI•

Distributed ReliefF based Feature Selection in Spark.

[...]

Raul-Jose Palma-Mendoza, Daniel Rodriguez¹, Luis de-Marcos¹•Institutions (1)

University of Alcalá¹

01 Nov 2018-arXiv: Learning

TL;DR: This paper presents a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that is called DiReliefF and can process large volumes of data in a scalable way with much better processing times and memory usage.

...read moreread less

Abstract: Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm's accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that has become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. Spark is increasing its popularity due to its much faster processing times compared with Hadoop's MapReduce model implementation. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.

...read moreread less

37 citations

Journal Article•DOI•

VariantSpark: population scale clustering of genotype information

[...]

Aidan R. O’Brien¹, Aidan R. O’Brien², Neil F. W. Saunders², Yi Guo², Fabian A. Buske³, Fabian A. Buske⁴, Rodney J. Scott¹, Denis C. Bauer² - Show less +4 more•Institutions (4)

University of Newcastle¹, Commonwealth Scientific and Industrial Research Organisation², Garvan Institute of Medical Research³, University of New South Wales⁴

10 Dec 2015-BMC Genomics

TL;DR: The benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

...read moreread less

Abstract: Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed Spark engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VariantSpark provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80 % faster than the Spark-based genome clustering approach, adam, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. The benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

...read moreread less

37 citations

Collapse

Network Information

Performance

Metrics

7,304

Papers

74,604

Citations

No. of papers in the topic in previous years
Year	Papers
2022	10
2021	429
2020	525
2019	661
2018	758
2017	683

Spark (mathematics)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics