scispace - formally typeset
Search or ask a question
Author

Fei Hu

Other affiliations: IBM, East China Normal University
Bio: Fei Hu is an academic researcher from George Mason University. The author has contributed to research in topics: Big data & Geospatial analysis. The author has an hindex of 13, co-authored 25 publications receiving 877 citations. Previous affiliations of Fei Hu include IBM & East China Normal University.

Papers
More filters
Journal ArticleDOI
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Abstract: Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i...

545 citations

Journal ArticleDOI
Chaowei Yang1, Manzhu Yu1, Fei Hu1, Yongyao Jiang1, Yun Li1 
TL;DR: This paper investigates how Cloud Computing can be utilized to address Big Data challenges to enable such transformation, and presents a tabular framework that supports the life cycle of Big Data processing, including management, access, mining analytics, simulation and forecasting.

151 citations

Journal ArticleDOI
TL;DR: This work proposes a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment and shows that the index can significantly accelerate querying and processing, while keeping the index-to-data ratio small.
Abstract: Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration NASA Modern-Era Retrospective Analysis for Research and Applications MERRA climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing ~10× speedup compared to the baseline test using the same computing cluster, while keeping the index-to-data ratio small 0.0328%. The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster.

61 citations

Journal ArticleDOI
17 May 2019
TL;DR: This paper reviews the big Earth data analytics from several aspects to capture the latest advancements in this fast-growing domain and introduces the concepts ofbig Earth data.
Abstract: Big Earth data are produced from satellite observations, Internet-of-Things, model simulations, and other sources. The data embed unprecedented insights and spatiotemporal stamps of relevant Earth ...

54 citations

Journal ArticleDOI
TL;DR: A graph-based method to detect tourist movement patterns from Twitter data that assist business and government activities whose mission is tour product planning, transportation, and development of both shopping and accommodation centers is introduced.
Abstract: Understanding the characteristics of tourist movement is essential for tourist behavior studies since the characteristics underpin how the tourist industry management selects strategies for attract...

49 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Abstract: Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i...

545 citations

Journal ArticleDOI
TL;DR: In this paper, the Earth's land cover and how it has changed over time motivated the mission and sensor design of early terrestrial remote sensing systems, and interest in knowing more about the Earth’s land cover was expressed.
Abstract: Interest in knowing more about the Earth’s land cover and how it has changed over time motivated the mission and sensor design of early terrestrial remote sensing systems. Rapid developments in com...

234 citations

Journal ArticleDOI
04 Sep 2018-Sensors
TL;DR: The results showed that IoT-based sensors and the proposed big data processing system are sufficiently efficient to monitor the manufacturing process and that the proposed hybrid prediction model has better fault prediction accuracy than other models given the sensor data as input.
Abstract: With the increase in the amount of data captured during the manufacturing process, monitoring systems are becoming important factors in decision making for management Current technologies such as Internet of Things (IoT)-based sensors can be considered a solution to provide efficient monitoring of the manufacturing process In this study, a real-time monitoring system that utilizes IoT-based sensors, big data processing, and a hybrid prediction model is proposed Firstly, an IoT-based sensor that collects temperature, humidity, accelerometer, and gyroscope data was developed The characteristics of IoT-generated sensor data from the manufacturing process are: real-time, large amounts, and unstructured type The proposed big data processing platform utilizes Apache Kafka as a message queue, Apache Storm as a real-time processing engine and MongoDB to store the sensor data from the manufacturing process Secondly, for the proposed hybrid prediction model, Density-Based Spatial Clustering of Applications with Noise (DBSCAN)-based outlier detection and Random Forest classification were used to remove outlier sensor data and provide fault detection during the manufacturing process, respectively The proposed model was evaluated and tested at an automotive manufacturing assembly line in Korea The results showed that IoT-based sensors and the proposed big data processing system are sufficiently efficient to monitor the manufacturing process Furthermore, the proposed hybrid prediction model has better fault prediction accuracy than other models given the sensor data as input The proposed system is expected to support management by improving decision-making and will help prevent unexpected losses caused by faults during the manufacturing process

217 citations

Journal ArticleDOI
TL;DR: This survey examines the potential and benefits of data-driven research in EWM, gives a synopsis of key concepts and approaches in BigData andML, provides a systematic review of current applications, and discusses major issues and challenges to recommend future research directions.
Abstract: BigData andmachine learning (ML) technologies have the potential to impactmany facets of environment andwatermanagement (EWM). BigData are information assets characterized by high volume, velocity, variety, and veracity. Fast advances in high-resolution remote sensing techniques, smart information and communication technologies, and socialmedia have contributed to the proliferation of BigData inmany EWMfields, such asweather forecasting, disastermanagement, smart water and energymanagement systems, and remote sensing. BigData brings about new opportunities for data-driven discovery in EWM, but it also requires new forms of information processing, storage, retrieval, as well as analytics.ML, a subdomain of artificial intelligence (AI), refers broadly to computer algorithms that can automatically learn fromdata.MLmay help unlock the power of BigData if properly integratedwith data analytics. Recent breakthroughs inAI and computing infrastructure have led to the fast development of powerful deep learning (DL) algorithms that can extract hierarchical features fromdata, with better predictive performance and less human intervention. Collectively BigData andML techniques have shown great potential for data-driven decisionmaking, scientific discovery, and process optimization. These technological advancesmay greatly benefit EWM, especially because (1)many EWMapplications (e.g. early floodwarning) require the capability to extract useful information from a large amount of data in autonomousmanner and in real time, (2)EWMresearches have become highlymultidisciplinary, and handling the ever increasing data volume/types using the traditional workflow is simply not an option, and last but not least, (3) the current theoretical knowledge aboutmany EWMprocesses is still incomplete, but whichmay now be complemented through data-driven discovery. A large number of applications onBigData andML have already appeared in the EWM literature in recent years. The purposes of this survey are to (1) examine the potential and benefits of data-driven research in EWM, (2) give a synopsis of key concepts and approaches in BigData andML, (3) provide a systematic review of current applications, andfinally (4) discussmajor issues and challenges, and recommend future research directions. EWM includes a broad range of research topics. Instead of attempting to survey each individual area, this review focuses on areas of nexus in EWM,with an emphasis on elucidating the potential benefits of increased data availability and predictive analytics to improving the EWMresearch.

210 citations

Journal ArticleDOI
20 Sep 2018
TL;DR: The challenges in designing a better healthcare system to make early detection and diagnosis of diseases and the possible solutions while providing e-health services in secure manner are analyzed and possible future work guidelines are provided.
Abstract: Personalized healthcare systems deliver e-health services to fulfill the medical and assistive needs of the aging population Internet of Things (IoT) is a significant advancement in the Big Data era, which supports many real-time engineering applications through enhanced services Analytics over data streams from IoT has become a source of user data for the healthcare systems to discover new information, predict early detection, and makes decision over the critical situation for the improvement of the quality of life In this paper, we have made a detailed study on the recent emerging technologies in the personalized healthcare systems with the focus towards cloud computing, fog computing, Big Data analytics, IoT and mobile based applications We have analyzed the challenges in designing a better healthcare system to make early detection and diagnosis of diseases and discussed the possible solutions while providing e-health services in secure manner This paper poses a light on the rapidly growing needs of the better healthcare systems in real-time and provides possible future work guidelines

210 citations