scispace - formally typeset
Search or ask a question
Author

Yun Li

Other affiliations: National Science Foundation
Bio: Yun Li is an academic researcher from George Mason University. The author has contributed to research in topics: Big data & Data discovery. The author has an hindex of 13, co-authored 39 publications receiving 620 citations. Previous affiliations of Yun Li include National Science Foundation.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
05 May 2018
TL;DR: This paper reviews the major big data sources, the associated achievements in different disaster management phases, and emerging technological topics associated with leveraging this new ecosystem of Big Data to monitor and detect natural hazards, mitigate their effects, assist in relief efforts, and contribute to the recovery and reconstruction processes.
Abstract: Undoubtedly, the age of big data has opened new options for natural disaster management, primarily because of the varied possibilities it provides in visualizing, analyzing, and predicting natural disasters. From this perspective, big data has radically changed the ways through which human societies adopt natural disaster management strategies to reduce human suffering and economic losses. In a world that is now heavily dependent on information technology, the prime objective of computer experts and policy makers is to make the best of big data by sourcing information from varied formats and storing it in ways that it can be effectively used during different stages of natural disaster management. This paper aimed at making a systematic review of the literature in analyzing the role of big data in natural disaster management and highlighting the present status of the technology in providing meaningful and effective solutions in natural disaster management. The paper has presented the findings of several researchers on varied scientific and technological perspectives that have a bearing on the efficacy of big data in facilitating natural disaster management. In this context, this paper reviews the major big data sources, the associated achievements in different disaster management phases, and emerging technological topics associated with leveraging this new ecosystem of Big Data to monitor and detect natural hazards, mitigate their effects, assist in relief efforts, and contribute to the recovery and reconstruction processes.

178 citations

Journal ArticleDOI
Chaowei Yang1, Manzhu Yu1, Fei Hu1, Yongyao Jiang1, Yun Li1 
TL;DR: This paper investigates how Cloud Computing can be utilized to address Big Data challenges to enable such transformation, and presents a tabular framework that supports the life cycle of Big Data processing, including management, access, mining analytics, simulation and forecasting.

151 citations

Journal ArticleDOI
23 Dec 2020-PLOS ONE
TL;DR: It is shown that population density is an effective predictor of cumulative infection cases in the U.S. at the county level, and population density and sizes of vulnerable population subgroups should be explicitly included in transmission models that predict the impacts of COVID-19, particularly at the sub-county level.
Abstract: Physical distancing has been argued as one of the effective means to combat the spread of COVID-19 before a vaccine or therapeutic drug becomes available. How far people can be spatially separated is partly behavioral but partly constrained by population density. Most models developed to predict the spread of COVID-19 in the U.S. do not include population density explicitly. This study shows that population density is an effective predictor of cumulative infection cases in the U.S. at the county level. Daily cumulative cases by counties are converted into 7-day moving averages. Treating the weekly averages as the dependent variable and the county population density levels as the explanatory variable, both in logarithmic scale, this study assesses how population density has shaped the distributions of infection cases across the U.S. from early March to late May, 2020. Additional variables reflecting the percentages of African Americans, Hispanic-Latina, and older adults in logarithmic scale are also included. Spatial regression models with a spatial error specification are also used to account for the spatial spillover effect. Population density alone accounts for 57% of the variation (R-squared) in the aspatial models and up to 76% in the spatial models. Adding the three population subgroup percentage variables raised the R-squared of the aspatial models to 72% and the spatial model to 84%. The influences of the three population subgroups were substantial, but changed over time, while the contributions of population density have been quite stable after the first several weeks, ascertaining the importance of population density in shaping the spread of infection in individual counties, and in their neighboring counties. Thus, population density and sizes of vulnerable population subgroups should be explicitly included in transmission models that predict the impacts of COVID-19, particularly at the sub-county level.

148 citations

Journal ArticleDOI
TL;DR: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, Ind...
Abstract: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, Ind...

89 citations

Journal ArticleDOI
TL;DR: This perspective paper presents the collective view on the global health emergency and the effort in collecting, analyzing, and sharing relevant data on global policy and government responses, human mobility, environmental impact, socioeconomical impact, and reflecting on the dynamic responses from human societies.
Abstract: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, India, the U.K., Italy, France, Germany, and most states of the U.S. The transmission of the virus accelerated rapidly with the most confirmed cases in the U.S., and New York City became an epicenter of the pandemic by the end of March. In response to this national and global emergency, the NSF Spatiotemporal Innovation Center brought together a taskforce of international researchers and assembled implemented strategies to rapidly respond to this crisis, for supporting research, saving lives, and protecting the health of global citizens. This perspective paper presents our collective view on the global health emergency and our effort in collecting, analyzing, and sharing relevant data on global policy and government responses, geospatial indicators of the outbreak and evolving forecasts; in developing research capabilities and mitigation measures with global scientists, promoting collaborative research on outbreak dynamics, and reflecting on the dynamic responses from human societies.

68 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Abstract: Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i...

545 citations

Journal ArticleDOI
TL;DR: This survey examines the potential and benefits of data-driven research in EWM, gives a synopsis of key concepts and approaches in BigData andML, provides a systematic review of current applications, and discusses major issues and challenges to recommend future research directions.
Abstract: BigData andmachine learning (ML) technologies have the potential to impactmany facets of environment andwatermanagement (EWM). BigData are information assets characterized by high volume, velocity, variety, and veracity. Fast advances in high-resolution remote sensing techniques, smart information and communication technologies, and socialmedia have contributed to the proliferation of BigData inmany EWMfields, such asweather forecasting, disastermanagement, smart water and energymanagement systems, and remote sensing. BigData brings about new opportunities for data-driven discovery in EWM, but it also requires new forms of information processing, storage, retrieval, as well as analytics.ML, a subdomain of artificial intelligence (AI), refers broadly to computer algorithms that can automatically learn fromdata.MLmay help unlock the power of BigData if properly integratedwith data analytics. Recent breakthroughs inAI and computing infrastructure have led to the fast development of powerful deep learning (DL) algorithms that can extract hierarchical features fromdata, with better predictive performance and less human intervention. Collectively BigData andML techniques have shown great potential for data-driven decisionmaking, scientific discovery, and process optimization. These technological advancesmay greatly benefit EWM, especially because (1)many EWMapplications (e.g. early floodwarning) require the capability to extract useful information from a large amount of data in autonomousmanner and in real time, (2)EWMresearches have become highlymultidisciplinary, and handling the ever increasing data volume/types using the traditional workflow is simply not an option, and last but not least, (3) the current theoretical knowledge aboutmany EWMprocesses is still incomplete, but whichmay now be complemented through data-driven discovery. A large number of applications onBigData andML have already appeared in the EWM literature in recent years. The purposes of this survey are to (1) examine the potential and benefits of data-driven research in EWM, (2) give a synopsis of key concepts and approaches in BigData andML, (3) provide a systematic review of current applications, andfinally (4) discussmajor issues and challenges, and recommend future research directions. EWM includes a broad range of research topics. Instead of attempting to survey each individual area, this review focuses on areas of nexus in EWM,with an emphasis on elucidating the potential benefits of increased data availability and predictive analytics to improving the EWMresearch.

210 citations

Journal ArticleDOI
20 Sep 2018
TL;DR: The challenges in designing a better healthcare system to make early detection and diagnosis of diseases and the possible solutions while providing e-health services in secure manner are analyzed and possible future work guidelines are provided.
Abstract: Personalized healthcare systems deliver e-health services to fulfill the medical and assistive needs of the aging population Internet of Things (IoT) is a significant advancement in the Big Data era, which supports many real-time engineering applications through enhanced services Analytics over data streams from IoT has become a source of user data for the healthcare systems to discover new information, predict early detection, and makes decision over the critical situation for the improvement of the quality of life In this paper, we have made a detailed study on the recent emerging technologies in the personalized healthcare systems with the focus towards cloud computing, fog computing, Big Data analytics, IoT and mobile based applications We have analyzed the challenges in designing a better healthcare system to make early detection and diagnosis of diseases and discussed the possible solutions while providing e-health services in secure manner This paper poses a light on the rapidly growing needs of the better healthcare systems in real-time and provides possible future work guidelines

210 citations

Journal ArticleDOI
TL;DR: Climate Engine is a web-based application that overcomes many computational barriers that users face by employing Google’s parallel cloud-computing platform, Google Earth Engine, to process, visualize, download, and share climate and remote sensing datasets in real time.
Abstract: The paucity of long-term observations, particularly in regions with heterogeneous climate and land cover, can hinder incorporating climate data at appropriate spatial scales for decision-making and scientific research. Numerous gridded climate, weather, and remote sensing products have been developed to address the needs of both land managers and scientists, in turn enhancing scientific knowledge and strengthening early-warning systems. However, these data remain largely inaccessible for a broader segment of users given the computational demands of big data. Climate Engine (http://ClimateEngine.org) is a web-based application that overcomes many computational barriers that users face by employing Google’s parallel cloud-computing platform, Google Earth Engine, to process, visualize, download, and share climate and remote sensing datasets in real time. The software application development and design of Climate Engine is briefly outlined to illustrate the potential for high-performance processing of...

187 citations

Journal ArticleDOI
TL;DR: This work developed a workflow for predicting the probability of wetland occurrence using a boosted regression tree machine-learning framework applied to digital topographic and EO data, and demonstrates the central role of high-quality topographic variables for modeling wetland distribution at regional scales.
Abstract: Modern advances in cloud computing and machine-leaning algorithms are shifting the manner in which Earth-observation (EO) data are used for environmental monitoring, particularly as we settle into the era of free, open-access satellite data streams. Wetland delineation represents a particularly worthy application of this emerging research trend, since wetlands are an ecologically important yet chronically under-represented component of contemporary mapping and monitoring programs, particularly at the regional and national levels. Exploiting Google Earth Engine and R Statistical software, we developed a workflow for predicting the probability of wetland occurrence using a boosted regression tree machine-learning framework applied to digital topographic and EO data. Working in a 13,700 km2 study area in northern Alberta, our best models produced excellent results, with AUC (area under the receiver-operator characteristic curve) values of 0.898 and explained-deviance values of 0.708. Our results demonstrate the central role of high-quality topographic variables for modeling wetland distribution at regional scales. Including optical and/or radar variables into the workflow substantially improved model performance, though optical data performed slightly better. Converting our wetland probability-of-occurrence model into a binary Wet-Dry classification yielded an overall accuracy of 85%, which is virtually identical to that derived from the Alberta Merged Wetland Inventory (AMWI): the contemporary inventory used by the Government of Alberta. However, our workflow contains several key advantages over that used to produce the AMWI, and provides a scalable foundation for province-wide monitoring initiatives.

180 citations