scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Big Data: A Survey

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

2,141 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a review of the state-of-the-art of Big Data applications in Smart Farming and identify the related socio-economic challenges to be addressed.

1,477 citations


Cites background or methods from "Big Data: A Survey"

  • ...…peripherals, systems software, application packages (application software), procedures, technical, information and communication ations, based on Chen et al. (2014). standards (reference informationmodels and coding andmessage standards) etc., that are used and necessary for adequate data…...

    [...]

  • ...The data chain refers to the sequence of activities from data capture to decision making and data marketing (Chen et al., 2014; Miller and Mork, 2013)....

    [...]

  • ...In big data applications, the value chain refers to the sequence of activities from data capture to decision making and data marketing (Chen et al., 2014; Miller and Mork, 2013)....

    [...]

  • ...…monitoring (Yan et al., 2013) Big Data in the cloud Weather/climate data, Yield data, Soil types, Market information, agricultural census data (Chen et al., 2014) Livestock movements (Faulkner and Cebul, 2014; Wamba and Wicks, 2010) Weather/climate, market information, social media (Verdouw…...

    [...]

  • ...Big data applications in farming are not strictly about primary production, but play a major role in improving the efficiency of the entire supply chain and alleviating food security concerns (Chen et al., 2014; Esmeijer et al., 2015; Gilpin, 2015a)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions.

1,267 citations

Journal ArticleDOI
TL;DR: This paper proposes a brief framework that incorporates industrial wireless networks, cloud, and fixed or mobile terminals with smart artifacts such as machines, products, and conveyors and concludes that the smart factory of Industrie 4.0 is achievable by extensively applying the existing enabling technologies while actively coping with the technical challenges.
Abstract: With the application of Internet of Things and services to manufacturing, the fourth stage of industrialization, referred to as Industrie 4.0, is believed to be approaching. For Industrie 4.0 to come true, it is essential to implement the horizontal integration of inter-corporation value network, the end-to-end integration of engineering value chain, and the vertical integration of factory inside. In this paper, we focus on the vertical integration to implement flexible and reconfigurable smart factory. We first propose a brief framework that incorporates industrial wireless networks, cloud, and fixed or mobile terminals with smart artifacts such as machines, products, and conveyors. Then, we elaborate the operational mechanism from the perspective of control engineering, that is, the smart artifacts form a self-organized system which is assisted with the feedback and coordination blocks that are implemented on the cloud and based on the big data analytics. In addition, we outline the main technical features and beneficial outcomes and present a detailed design scheme. We conclude that the smart factory of Industrie 4.0 is achievable by extensively applying the existing enabling technologies while actively coping with the technical challenges.

1,108 citations


Cites background from "Big Data: A Survey"

  • ..., Internet of Things (IoT) [1–3], wireless sensor networks [4, 5], big data [6], cloud computing [7–9], embedded system [10], and mobile Internet [11]) are being introduced into the manufacturing environment, which ushers in a fourth industrial revolution....

    [...]

Journal ArticleDOI
TL;DR: A smart factory framework that incorporates industrial network, cloud, and supervisory control terminals with smart shop-floor objects such as machines, conveyers, and products is presented and an intelligent negotiation mechanism for agents to cooperate with each other is proposed.

1,074 citations

References
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations


"Big Data: A Survey" refers background in this paper

  • ...Page Rank [125] and CLEVER [126] make full use of the models to look up relevant website pages....

    [...]

  • ...Page Rank [125] and CLEVER [126] make full use...

    [...]

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

Book
14 Sep 1984
TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Abstract: Preface to the Third Edition.Preface to the Second Edition.Preface to the First Edition.1. Introduction.2. The Multivariate Normal Distribution.3. Estimation of the Mean Vector and the Covariance Matrix.4. The Distributions and Uses of Sample Correlation Coefficients.5. The Generalized T2-Statistic.6. Classification of Observations.7. The Distribution of the Sample Covariance Matrix and the Sample Generalized Variance.8. Testing the General Linear Hypothesis: Multivariate Analysis of Variance9. Testing Independence of Sets of Variates.10. Testing Hypotheses of Equality of Covariance Matrices and Equality of Mean Vectors and Covariance Matrices.11. Principal Components.12. Cononical Correlations and Cononical Variables.13. The Distributions of Characteristic Roots and Vectors.14. Factor Analysis.15. Pattern of Dependence Graphical Models.Appendix A: Matrix Theory.Appendix B: Tables.References.Index.

9,693 citations