scispace - formally typeset
Search or ask a question
Journal Article•DOI•

A Big Data-as-a-Service Framework: State-of-the-Art and Perspectives

TL;DR: A tensor-based multiple clustering on bicycle renting and returning data is illustrated, which can provide several suggestions for rebalancing of the bicycle-sharing system and some challenges about the proposed framework are discussed.
Abstract: Due to the rapid advances of information technologies, Big Data, recognized with 4Vs characteristics (volume, variety, veracity, and velocity), bring significant benefits as well as many challenges A major benefit of Big Data is to provide timely information and proactive services for humans The primary purpose of this paper is to review the current state-of-the-art of Big Data from the aspects of organization and representation, cleaning and reduction, integration and processing, security and privacy, analytics and applications, then present a novel framework to provide high-quality so called Big Data-as-a-Service The framework consists of three planes, namely sensing plane, cloud plane and application plane, to systemically address all challenges of the above aspects Also, to clearly demonstrate the working process of the proposed framework, a tensor-based multiple clustering on bicycle renting and returning data is illustrated, which can provide several suggestions for rebalancing of the bicycle-sharing system Finally, some challenges about the proposed framework are discussed
Citations
More filters
Journal Article•DOI•
TL;DR: A system model and dynamic schedules of data/control-constrained computing tasks are investigated, including the execution time and energy consumption for mobile devices, and NSGA-III (non-dominated sorting genetic algorithm III) is employed to address the multi-objective optimization problem of task offloading in cloud-edge computing.

237 citations


Cites background from "A Big Data-as-a-Service Framework: ..."

  • ...However, the finite computation capacity and cache size of the mobile devices impede the wide usage of the mobile applications and cause tremendous amount of time for storing and processing the big data on the mobile devices [8][9]....

    [...]

Journal Article•DOI•
TL;DR: A privacy preservation method, named ECO, with privacy preservation for IoV is proposed in this paper and NSGA-II (non-dominated sorting genetic algorithm II) is adopted to realize multi-objective optimization to reduce the execution time and energy consumption of ECDs and prevent privacy conflicts of the computing tasks.

194 citations


Additional excerpts

  • ...computing paradigm in the IoV environment [24][25][26]....

    [...]

Journal Article•DOI•
TL;DR: The enabling technologies of big data analytics of manufacturing data are surveyed and discussed and the future directions in this promising area are outlined.
Abstract: Data analytics in massive manufacturing data can extract huge business values while can also result in research challenges due to the heterogeneous data types, enormous volume and real-time velocit...

184 citations

Journal Article•DOI•
TL;DR: This article focuses on the deep-learning-enhanced HAR in IoHT environments, and a semisupervised deep learning framework is designed and built for more accurate HAR, which efficiently uses and analyzes the weakly labeled sensor data to train the classifier learning model.
Abstract: Along with the advancement of several emerging computing paradigms and technologies, such as cloud computing, mobile computing, artificial intelligence, and big data, Internet of Things (IoT) technologies have been applied in a variety of fields. In particular, the Internet of Healthcare Things (IoHT) is becoming increasingly important in human activity recognition (HAR) due to the rapid development of wearable and mobile devices. In this article, we focus on the deep-learning-enhanced HAR in IoHT environments. A semisupervised deep learning framework is designed and built for more accurate HAR, which efficiently uses and analyzes the weakly labeled sensor data to train the classifier learning model. To better solve the problem of the inadequately labeled sample, an intelligent autolabeling scheme based on deep $Q$ -network (DQN) is developed with a newly designed distance-based reward rule which can improve the learning efficiency in IoT environments. A multisensor based data fusion mechanism is then developed to seamlessly integrate the on-body sensor data, context sensor data, and personal profile data together, and a long short-term memory (LSTM)-based classification method is proposed to identify fine-grained patterns according to the high-level features contextually extracted from the sequential motion data. Finally, experiments and evaluations are conducted to demonstrate the usefulness and effectiveness of the proposed method using real-world data.

171 citations


Cites background from "A Big Data-as-a-Service Framework: ..."

  • ...These sensor technologies provide us opportunities to improve the robustness of multimodal data sensing and fusion in HAR, which may support the development of human-centric applications and services in cyber–physical–social systems based on the enrichment of sensed information from real-time big data environments [3], [4]....

    [...]

Journal Article•DOI•
TL;DR: A novel deep learning network, namely Multi-scale Dense Gate Recurrent Unit Network (MDGRU) is proposed in this paper, which is composed of the feature layers initialized by pre-trained Restricted Boltzmann Machine (RBM) network, multi-scale layers, skip gate recurrent unit layers, dense layers.

113 citations

References
More filters
Book•
01 Jan 1983

34,729 citations

Journal Article•DOI•
Jeffrey Dean1, Sanjay Ghemawat1•
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations


"A Big Data-as-a-Service Framework: ..." refers methods in this paper

  • ...was exploited by Amazon as an online web service storage system [114]....

    [...]

Journal Article•DOI•
Jeffrey Dean1, Sanjay Ghemawat1•
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Journal Article•DOI•
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations


"A Big Data-as-a-Service Framework: ..." refers background in this paper

  • ...As an important dimensionality reduction [38] and fea-...

    [...]

  • ...in [38], viewed the data as high-dimensional graphs that are projected onto low-dimensional spaces....

    [...]

  • ...tive low-dimensional projections of high-dimensional data [38], [102]....

    [...]

Journal Article•DOI•
21 Oct 1999-Nature
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

11,500 citations