scispace - formally typeset
Search or ask a question
Topic

Spark (mathematics)

About: Spark (mathematics) is a research topic. Over the lifetime, 7304 publications have been published within this topic receiving 63322 citations.


Papers
More filters
Patent
18 Nov 1991
TL;DR: In this article, an engine controller having a user-adjustable electronic control unit (ECU) and a remote calibration module, and a method for use of the same, is provided for allowing a vehicle operator to precisely regulate the fuel delivery and spark advance of an internal combustion engine.
Abstract: An engine controller having a user-adjustable electronic control unit (ECU) (10) and a remote calibration module (12), and a method for use of the same, is provided for allowing a vehicle operator to precisely regulate the fuel delivery and spark advance of an internal combustion engine. The ECU (10) has outputs for regulating fuel flow rate, spark advance and engine idle speed, inputs adapted for coupling to engine sensors, and a microprocessor for processing information supplied by the sensors, to generate the ouputs utilizing mathematical formulas and data tables stored in a read-write memory. The vehicle operator can also control auxiliary hardware such as switches, relays, timers and solenoids. The calibration module (12) cooperates with the ECU (10) and has a display screen (14) and input keys (16) enabling the vehicle operator to modify the fuel delivery and spark advance information and save the modifications.

18 citations

Proceedings Article
01 Jan 2018
TL;DR: A new RDF store called PRoST (Partitioned RDF on Spark Tables) based on Apache Spark is presented, an innovative strategy that combines the Vertical Partitioning approach with the Property Table, two preexisting models for storing RDF datasets.
Abstract: The rapidly growing size of RDF graphs in recent years necessitates distributed storage and parallel processing strategies. To obtain efficient query processing using computer clusters a wide variety of different approaches have been proposed. Related to the approach presented in the current paper are systems built on top of Hadoop HDFS, for example using Apache Accumulo or using Apache Spark. We present a new RDF store called PRoST (Partitioned RDF on Spark Tables) based on Apache Spark. PRoST introduces an innovative strategy that combines the Vertical Partitioning approach with the Property Table, two preexisting models for storing RDF datasets. We demonstrate that our proposal outperforms state-of-the-art systems w.r.t. the runtime for a wide range of query types and without any extensive precomputing phase.

18 citations

Journal ArticleDOI
TL;DR: This paper develops data-driven analytical models to estimate the effect of interference among multiple Apache Spark jobs on job execution time in virtualized cloud environments and presents the design of an interference aware job scheduling algorithm leveraging the developed analytical framework.
Abstract: Apache Spark is one of the recently popularized open-source platforms that is increasingly being used for large-scale data analytic applications. However, while performance prediction in such systems is important for efficient job scheduling and optimizing resource allocation, interference among multiple Apache Spark jobs running concurrently in a virtualized environment makes it extremely difficult, which is addressed in this paper. Towards that, first, we develop data-driven analytical models to estimate the effect of interference among multiple Apache Spark jobs on job execution time in virtualized cloud environments. Next, we present the design of an interference aware job scheduling algorithm leveraging the developed analytical framework. We evaluated the accuracy of our models using four real-life applications (e.g., Page rank, K-means, Logistic regression, and Word count) on a 6 node cluster while running up to four jobs concurrently. Our experimental results show that the scheduling algorithm reduces the average execution time of individual jobs and the total execution time significantly, and ranges between 47 and 26% for individual jobs and 2–13% for total execution time respectively.

18 citations

Journal ArticleDOI
30 Nov 2020-PeerJ
TL;DR: This systematic survey investigates the existing Spark-based clustering methods in terms of their support to the characteristics Big Data and proposes a new taxonomy for the Spark- based clustering Methods.
Abstract: A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters. Traditional clustering methods are greatly challenged by the recent massive growth of data. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing. However, Spark-based clustering research is still in its early days. In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data. Moreover, we propose a new taxonomy for the Spark-based clustering methods. To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data. Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010-2020. This survey also highlights the new research directions in the field of clustering massive data.

18 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
76% related
Combustion
172.3K papers, 1.9M citations
72% related
Cluster analysis
146.5K papers, 2.9M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Hydrogen
132.2K papers, 2.5M citations
69% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202210
2021429
2020525
2019661
2018758
2017683