Big Data: A Survey

doi:10.1007/S11036-013-0489-0

Home
/
Papers
/
Big Data: A Survey

Journal Article•DOI•

Big Data: A Survey

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications (Springer US)-Vol. 19, Iss: 2, pp 171-209

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

read less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The rise of big data on cloud computing

[...]

Ibrahim Abaker Targio Hashem¹, Ibrar Yaqoob¹, Nor Badrul Anuar¹, Salimah Binti Mokhtar¹, Abdullah Gani¹, Samee U. Khan² - Show less +2 more•Institutions (2)

Information Technology University¹, North Dakota State University²

01 Jan 2015-Information Systems

TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

...read moreread less

2,141 citations

Journal Article•DOI•

Big Data in Smart Farming – A review

[...]

Sjaak Wolfert¹, Lan Ge¹, Cor Verdouw¹, Marc Jeroen Bogaardt¹•Institutions (1)

Wageningen University and Research Centre¹

01 May 2017-Agricultural Systems

TL;DR: In this paper, the authors present a review of the state-of-the-art of Big Data applications in Smart Farming and identify the related socio-economic challenges to be addressed.

...read moreread less

1,477 citations

Cites background or methods from "Big Data: A Survey"

...…peripherals, systems software, application packages (application software), procedures, technical, information and communication ations, based on Chen et al. (2014). standards (reference informationmodels and coding andmessage standards) etc., that are used and necessary for adequate data…...
[...]
...The data chain refers to the sequence of activities from data capture to decision making and data marketing (Chen et al., 2014; Miller and Mork, 2013)....
[...]
...In big data applications, the value chain refers to the sequence of activities from data capture to decision making and data marketing (Chen et al., 2014; Miller and Mork, 2013)....
[...]
...…monitoring (Yan et al., 2013) Big Data in the cloud Weather/climate data, Yield data, Soil types, Market information, agricultural census data (Chen et al., 2014) Livestock movements (Faulkner and Cebul, 2014; Wamba and Wicks, 2010) Weather/climate, market information, social media (Verdouw…...
[...]
...Big data applications in farming are not strictly about primary production, but play a major role in improving the efficiency of the entire supply chain and alleviating food security concerns (Chen et al., 2014; Esmeijer et al., 2015; Gilpin, 2015a)....
[...]

Journal Article•DOI•

Critical analysis of Big Data challenges and analytical methods

[...]

Uthayasankar Sivarajah¹, Muhammad Kamal¹, Zahir Irani¹, Vishanth Weerakkody¹•Institutions (1)

Brunel University London¹

01 Jan 2017-Journal of Business Research

TL;DR: In this article, the authors present a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions.

...read moreread less

1,267 citations

Journal Article•DOI•

Implementing smart factory of Industrie 4.0: an outlook

[...]

Shiyong Wang¹, Jiafu Wan¹, Di Li¹, Chunhua Zhang¹•Institutions (1)

South China University of Technology¹

01 Jan 2016-International Journal of Distributed Sensor Networks

TL;DR: This paper proposes a brief framework that incorporates industrial wireless networks, cloud, and fixed or mobile terminals with smart artifacts such as machines, products, and conveyors and concludes that the smart factory of Industrie 4.0 is achievable by extensively applying the existing enabling technologies while actively coping with the technical challenges.

...read moreread less

Abstract: With the application of Internet of Things and services to manufacturing, the fourth stage of industrialization, referred to as Industrie 4.0, is believed to be approaching. For Industrie 4.0 to come true, it is essential to implement the horizontal integration of inter-corporation value network, the end-to-end integration of engineering value chain, and the vertical integration of factory inside. In this paper, we focus on the vertical integration to implement flexible and reconfigurable smart factory. We first propose a brief framework that incorporates industrial wireless networks, cloud, and fixed or mobile terminals with smart artifacts such as machines, products, and conveyors. Then, we elaborate the operational mechanism from the perspective of control engineering, that is, the smart artifacts form a self-organized system which is assisted with the feedback and coordination blocks that are implemented on the cloud and based on the big data analytics. In addition, we outline the main technical features and beneficial outcomes and present a detailed design scheme. We conclude that the smart factory of Industrie 4.0 is achievable by extensively applying the existing enabling technologies while actively coping with the technical challenges.

...read moreread less

1,108 citations

Cites background from "Big Data: A Survey"

..., Internet of Things (IoT) [1–3], wireless sensor networks [4, 5], big data [6], cloud computing [7–9], embedded system [10], and mobile Internet [11]) are being introduced into the manufacturing environment, which ushers in a fourth industrial revolution....
[...]

Journal Article•DOI•

Towards smart factory for industry 4.0

[...]

Shiyong Wang¹, Jiafu Wan¹, Daqiang Zhang², Di Li¹, Chunhua Zhang¹ - Show less +1 more•Institutions (2)

South China University of Technology¹, Tongji University²

04 Jun 2016-Computer Networks

TL;DR: A smart factory framework that incorporates industrial network, cloud, and supervisory control terminals with smart shop-floor objects such as machines, conveyers, and products is presented and an intelligent negotiation mechanism for agents to cooperate with each other is proposed.

...read moreread less

1,074 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Journal Article•DOI•

The anatomy of a large-scale hypertextual Web search engine

[...]

Sergey Brin¹, Lawrence Page¹•Institutions (1)

Stanford University¹

01 Apr 1998

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

14,696 citations

"Big Data: A Survey" refers background in this paper

...Page Rank [125] and CLEVER [126] make full use of the models to look up relevant website pages....
[...]
...Page Rank [125] and CLEVER [126] make full use...
[...]

Journal Article•

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

[...]

Sergey Brin, Lawrence Page

01 Jan 1998-Computer Networks

TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

...read moreread less

13,327 citations

Book•

An Introduction to Multivariate Statistical Analysis

[...]

T. W. Anderson

14 Sep 1984

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.

...read moreread less

Abstract: Preface to the Third Edition.Preface to the Second Edition.Preface to the First Edition.1. Introduction.2. The Multivariate Normal Distribution.3. Estimation of the Mean Vector and the Covariance Matrix.4. The Distributions and Uses of Sample Correlation Coefficients.5. The Generalized T2-Statistic.6. Classification of Observations.7. The Distribution of the Sample Covariance Matrix and the Sample Generalized Variance.8. Testing the General Linear Hypothesis: Multivariate Analysis of Variance9. Testing Independence of Sets of Variates.10. Testing Hypotheses of Equality of Covariance Matrices and Equality of Mean Vectors and Covariance Matrices.11. Principal Components.12. Cononical Correlations and Cononical Variables.13. The Distributions of Characteristic Roots and Vectors.14. Factor Analysis.15. Pattern of Dependence Graphical Models.Appendix A: Matrix Theory.Appendix B: Tables.References.Index.

...read moreread less

9,693 citations