scispace - formally typeset
Search or ask a question
Author

Manzhu Yu

Bio: Manzhu Yu is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Cloud computing & Big data. The author has an hindex of 14, co-authored 53 publications receiving 758 citations. Previous affiliations of Manzhu Yu include University of South Carolina & George Mason University.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
05 May 2018
TL;DR: This paper reviews the major big data sources, the associated achievements in different disaster management phases, and emerging technological topics associated with leveraging this new ecosystem of Big Data to monitor and detect natural hazards, mitigate their effects, assist in relief efforts, and contribute to the recovery and reconstruction processes.
Abstract: Undoubtedly, the age of big data has opened new options for natural disaster management, primarily because of the varied possibilities it provides in visualizing, analyzing, and predicting natural disasters. From this perspective, big data has radically changed the ways through which human societies adopt natural disaster management strategies to reduce human suffering and economic losses. In a world that is now heavily dependent on information technology, the prime objective of computer experts and policy makers is to make the best of big data by sourcing information from varied formats and storing it in ways that it can be effectively used during different stages of natural disaster management. This paper aimed at making a systematic review of the literature in analyzing the role of big data in natural disaster management and highlighting the present status of the technology in providing meaningful and effective solutions in natural disaster management. The paper has presented the findings of several researchers on varied scientific and technological perspectives that have a bearing on the efficacy of big data in facilitating natural disaster management. In this context, this paper reviews the major big data sources, the associated achievements in different disaster management phases, and emerging technological topics associated with leveraging this new ecosystem of Big Data to monitor and detect natural hazards, mitigate their effects, assist in relief efforts, and contribute to the recovery and reconstruction processes.

178 citations

Journal ArticleDOI
Chaowei Yang1, Manzhu Yu1, Fei Hu1, Yongyao Jiang1, Yun Li1 
TL;DR: This paper investigates how Cloud Computing can be utilized to address Big Data challenges to enable such transformation, and presents a tabular framework that supports the life cycle of Big Data processing, including management, access, mining analytics, simulation and forecasting.

151 citations

Journal ArticleDOI
TL;DR: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, Ind...
Abstract: The sudden outbreak of the Coronavirus disease (COVID-19) swept across the world in early 2020, triggering the lockdowns of several billion people across many countries, including China, Spain, Ind...

89 citations

Journal ArticleDOI
TL;DR: Investigating the spatiotemporal patterns and changes in air pollution before, during and after the lockdown of the state of California offers evidence of the environmental impact introduced by COVID-19, and insight into related economic influences.

84 citations

Journal ArticleDOI
TL;DR: This research examines the capability of a convolutional neural network (CNN) model in cross-event Twitter topic classification based on three geo-tagged twitter datasets collected during Hurricanes Sandy, Harvey, and Irma to indicate that the CNN model hasThe capability of pre-training Twitter data from past events to classify for an upcoming event for situational awareness.
Abstract: Social media platforms have been contributing to disaster management during the past several years. Text mining solutions using traditional machine learning techniques have been developed to catego...

79 citations


Cited by
More filters
01 Jan 2016
TL;DR: Thank you very much for downloading using mpi portable parallel programming with the message passing interface for reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.
Abstract: Thank you very much for downloading using mpi portable parallel programming with the message passing interface. As you may know, people have search hundreds times for their chosen novels like this using mpi portable parallel programming with the message passing interface, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

593 citations

Journal ArticleDOI
TL;DR: This book is for social scientists, but the book had no difficulty imagining my own important oil exploration application within the framework of geographically weighted regression (GWR), and the first chapter nicely explains what is unique in this book.
Abstract: Being newly immersed in the upstream part of the oil business, I just recently had my first work session with data in ARC–GIS®. The project involves subsurface geographical modeling. Obviously I had considerable interest in discovering if the methodology in this book would enhance my modeling capabilities. The book is for social scientists, but I had no difficulty imagining my own important oil exploration application within the framework of geographically weighted regression (GWR). The first chapter nicely explains what is unique in this book. A standard regression model using geographically oriented data (the example is housing prices across all of England) is a global representation of a spatial relationship, an average that does not account for any local differences. In y = f (x), imagine a whole family of f ’s that are indexed by spatial location. That is the focus of this book. It is about one form of local spatial modeling, which is GWR. A more general resource for this topic is the earlier book by Fotheringham and Wegener (2000), which escaped the notice of Technometrics. Imagine a display of model parameters in a geographical information system (GIS) and you will understand the focus for this book. The authors note, “only where there is no significant spatial variation in a measured relationship can global models be accepted” (p. 10). The second chapter develops the basis of GWR. It analyzes the housing sales prices versus the 33 boroughs in London and begins by fitting a conventional multiple regression model versus housing characteristics. The GWR is motivated by differences in the regression models fitted separately by borough. The GWR is a spatial moving-window approach with all data distances weighted versus a specific data point using a weighting function and a bandwidth. A GIS can then be used to evaluate the spatial dependency of the parameters. As in kriging, local standard errors also are calculated. The chapter also provides all the math. Chapter 3 comprises several further considerations: parameters that are globally constant, outliers, and spatial heteroscedasticity. The first issue leads to hypothesis tests for model comparison using an Akaike information criterion (AIC). Local outliers are hard to detect. Studentized (deletion) residuals are recommended. The outliers can be plotted geographically. Robust regression is suggested as a less computationally intensive alternative. Hetereoscedasticity is harder to handle. Chapter 4 adds statistical inference to the capabilities of GWR: both a confidence interval approach using local likelihood and an AIC method. Four additional methodology chapters present various extensions of GWR. Chapter 5 considers the relationship between GWR and spatial autocorrelation, and includes a combined version of GWR and spatial regression using some complex hybrid models. Chapter 6 examines the relationship of scale and zoning problems in spatial analysis to GWR. Chapter 7 introduces the use of initial exploratory data analysis using geographically weighted statistics, which are based on the idea of using a kernel around each data point to create weights. Univariate statistics and correlation coefficients are defined for exploring local patterns in data. A final set of extensions in Chapter 8 discusses regression models with non-Gaussian errors, logistic regression, local principal components analysis, and local probability density estimation. The methods all use some kind of distributional model. The million-dollar question for me is always, “What about software?” The authors have a stand-alone program, GWR 3, available in CD–ROM by contacting the authors. Basically the drill with GWR 3 is to gather your data, use Excel to transform and reformat the data for GWR 3, use GWR 3 to produce a set of coefficients, and feed those coefficients to your favorite GIS to produce your maps. Forty pages of discussion about using the software are provided. A final epilogue chapter also discusses embedding GWR in R or Matlab and includes some references to people who have done that type of work. I probably would not have read this book if I had not happened to have had it in my briefcase on a visit with the exploration technologists. Though inclusive of appropriate mathematical development, this material is readily approachable because of the many illustrations and the pages and pages of GIS displays. The authors unabashedly present much of the material as their developmental work, so GWR offers a lot of opportunity for research and further development through novel applications and extensions.

545 citations

Journal ArticleDOI
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Abstract: Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i...

545 citations

Journal ArticleDOI
TL;DR: The review shows that most previous studies have concentrated on the mapping and analysis of network components, and more attention should be given to an integrated use of various data sources to benefit from the various techniques in an optimal way.
Abstract: To secure uninterrupted distribution of electricity, effective monitoring and maintenance of power lines are needed This literature review article aims to give a wide overview of the possibilities provided by modern remote sensing sensors in power line corridor surveys and to discuss the potential and limitations of different approaches Monitoring of both power line components and vegetation around them is included Remotely sensed data sources discussed in the review include synthetic aperture radar (SAR) images, optical satellite and aerial images, thermal images, airborne laser scanner (ALS) data, land-based mobile mapping data, and unmanned aerial vehicle (UAV) data The review shows that most previous studies have concentrated on the mapping and analysis of network components In particular, automated extraction of power line conductors has achieved much attention, and promising results have been reported For example, accuracy levels above 90% have been presented for the extraction of conductors from ALS data or aerial images However, in many studies datasets have been small and numerical quality analyses have been omitted Mapping of vegetation near power lines has been a less common research topic than mapping of the components, but several studies have also been carried out in this field, especially using optical aerial and satellite images Based on the review we conclude that in future research more attention should be given to an integrated use of various data sources to benefit from the various techniques in an optimal way Knowledge in related fields, such as vegetation monitoring from ALS, SAR and optical image data should be better exploited to develop useful monitoring approaches Special attention should be given to rapidly developing remote sensing techniques such as UAVs and laser scanning from airborne and land-based platforms To demonstrate and verify the capabilities of automated monitoring approaches, large tests in various environments and practical monitoring conditions are needed These should include careful quality analyses and comparisons between different data sources, methods and individual algorithms

350 citations

Journal Article
TL;DR: In this article, a bipartite graph based data clustering method is proposed, where terms and documents are simultaneously grouped into semantically meaningful co-categories and subject descriptors.
Abstract: Bipartite Graph Partitioning and Data Clustering* Hongyuan Zha Xiaofeng He Dept. of Comp. Sci. & Eng. Penn State Univ. State College, PA 16802 {zha,xhe}@cse.psu.edu Chris Ding Horst Simon NERSC Division Berkeley National Lab. Berkeley, CA 94720 {chqding,hdsimon} Qlbl. gov Ming Gu Dept. of Math. U.C. Berkeley Berkeley, CA 94720 mgu@math.berkeley.edu ABSTRACT M a n y data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approxi­ mate solution to the minimization problem can be obtained by computing a partial singular value decomposition ( S V D ) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency. 1. INTRODUCTION Cluster analysis is an important tool for exploratory data mining applications arising from many diverse disciplines. Informally, cluster analysis seeks to partition a given data set into compact clusters so that data objects within a clus­ ter are more similar than those in distinct clusters. The liter­ ature on cluster analysis is enormous including contributions from many research communities, (see [6, 9] for recent sur­ veys of some classical approaches.) M a n y traditional clus­ tering algorithms are based on the assumption that the given dataset consists of covariate information (or attributes) for each individual data object, and cluster analysis can be cast as a problem of grouping a set of n-dimensional vectors each representing a data object in the dataset. A familiar ex­ ample is document clustering using the vector space model [1]. Here each document is represented by an n-dimensional vector, and each coordinate of the vector corresponds to a term in a vocabulary of size n. This formulation leads to the so-called term-document matrix A = (oy) for the rep­ resentation of the collection of documents, where o y is the so-called term frequency, i.e., the number of times term i occurs in document j. In this vector space model terms and documents are treated asymmetrically with terms consid­ ered as the covariates or attributes of documents. It is also possible to treat both terms and documents as first-class citizens in a symmetric fashion, and consider a y as the fre­ quency of co-occurrence of term i and document j as is done, for example, in probabilistic latent semantic indexing [12]. In this paper, we follow this basic principle and propose a new approach to model terms and documents as vertices in a bipartite graph with edges of the graph indicating the co-occurrence of terms and documents. In addition we can optionally use edge weights to indicate the frequency of this co-occurrence. Cluster analysis for document collections in this context is based on a very intuitive notion: documents are grouped by topics, on one hand documents in a topic tend to more heavily use the same subset of terms which form a term cluster, and on the other hand a topic usually is characterized by a subset of terms and those documents heavily using those terms tend to be about that particular topic. It is this interplay of terms and documents which gives rise to what we call bi-clustering by which terms and documents are simultaneously grouped into semantically co- Categories and Subject Descriptors 11.3.3 [ I n f o r m a t i o n S e a r c h a n d R e t r i e v a l ] : Clustering; G.1.3 [ N u m e r i c a l L i n e a r A l g e b r a ] : Singular value de­ composition; G.2.2 [ G r a p h T h e o r y ] : G r a p h algorithms General Terms Algorithms, theory Keywords document clustering, bipartite graph, graph partitioning, spectral relaxation, singular value decomposition, correspon­ dence analysis *Part of this work was done while Xiaofeng He was a grad­ uate research assistant at N E R S C , Berkeley National Lab. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM '01 November 5-10, 2001, Atlanta, Georgia. U S A Copyright 2001 A C M X - X X X X X - X X - X / X X / X X ...$5.00. O u r clustering algorithm computes an approximate global optimal solution while probabilistic latent semantic indexing relies on the E M algorithm and therefore might be prone to local m i n i m a even with the help of some annealing process. x

295 citations