A CUDA-Based Parallel Geographically Weighted Regression for Large-Scale Geographic Data

doi:10.3390/IJGI9110653

Home
/
Papers
/
A CUDA-Based Parallel Geographically Weighted Regression for Large-Scale Geographic Data

Journal Article•DOI•

A CUDA-Based Parallel Geographically Weighted Regression for Large-Scale Geographic Data

Dongchao Wang, Yi Yang, Agen Qiu, Xiaochen Kang, Jiakuan Han, Zhengyuan Chai - Show less +2 more

30 Oct 2020-ISPRS international journal of geo-information (Multidisciplinary Digital Publishing Institute)-Vol. 9, Iss: 11, pp 653

TL;DR: An improved approach based on the compute unified device architecture (CUDA) parallel architecture fast-parallel-GWR (FPGWR) is proposed in this paper to efficiently handle the computational demands of performing GWR over millions of data points.

read less

Abstract: Geographically weighted regression (GWR) introduces the distance weighted kernel function to examine the non-stationarity of geographical phenomena and improve the performance of global regression. However, GWR calibration becomes critical when using a serial computing mode to process large volumes of data. To address this problem, an improved approach based on the compute unified device architecture (CUDA) parallel architecture fast-parallel-GWR (FPGWR) is proposed in this paper to efficiently handle the computational demands of performing GWR over millions of data points. FPGWR is capable of decomposing the serial process into parallel atomic modules and optimizing the memory usage. To verify the computing capability of FPGWR, we designed simulation datasets and performed corresponding testing experiments. We also compared the performance of FPGWR and other GWR software packages using open datasets. The results show that the runtime of FPGWR is negatively correlated with the CUDA core number, and the calculation efficiency of FPGWR achieves a rate of thousands or even tens of thousands times faster than the traditional GWR algorithms. FPGWR provides an effective tool for exploring spatial heterogeneity for large-scale geographic data (geodata).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

High-performance solutions of geographically weighted regression in R

[...]

Binbin Lu, Yigong Hu, K. Murakami, Chris Brunsdon, Alexis Comber, Martin Charlton, Paul Harrison - Show less +3 more

20 May 2022-Geo-spatial Information Science

TL;DR: In this article , two high-performance R solutions for GWR via Multi-core Parallel (MP) and Compute Unified Device Architecture (CUDA) techniques, respectively GWR-MP and GWR -CUDA, were proposed.

...read moreread less

Abstract: ABSTRACT As an established spatial analytical tool, Geographically Weighted Regression (GWR) has been applied across a variety of disciplines. However, its usage can be challenging for large datasets, which are increasingly prevalent in today’s digital world. In this study, we propose two high-performance R solutions for GWR via Multi-core Parallel (MP) and Compute Unified Device Architecture (CUDA) techniques, respectively GWR-MP and GWR-CUDA. We compared GWR-MP and GWR-CUDA with three existing solutions available in Geographically Weighted Models (GWmodel), Multi-scale GWR (MGWR) and Fast GWR (FastGWR). Results showed that all five solutions perform differently across varying sample sizes, with no single solution a clear winner in terms of computational efficiency. Specifically, solutions given in GWmodel and MGWR provided acceptable computational costs for GWR studies with a relatively small sample size. For a large sample size, GWR-MP and FastGWR provided coherent solutions on a Personal Computer (PC) with a common multi-core configuration, GWR-MP provided more efficient computing capacity for each core or thread than FastGWR. For cases when the sample size was very large, and for these cases only, GWR-CUDA provided the most efficient solution, but should note its I/O cost with small samples. In summary, GWR-MP and GWR-CUDA provided complementary high-performance R solutions to existing ones, where for certain data-rich GWR studies, they should be preferred.

...read moreread less

5 citations

Journal Article•DOI•

Geographically and temporally weighted co-location quotient: an analysis of spatiotemporal crime patterns in greater Manchester

[...]

Ling Li, Jianquan Cheng, Jon Bannister, Xiong-Fa Mai

10 Mar 2022-International Journal of Geographical Information Science

TL;DR: In this article , a geographically and temporally weighted co-location quotient which includes global and local computation, a method to calculate a spatiotemporal weight matrix and a significance test using Monte Carlo simulation is used to identify spatio-temporal crime patterns across Greater Manchester.

...read moreread less

Abstract: Abstract Incident data, a form of big data frequently used in urban studies, are characterized by point features with high spatial and temporal resolution and categorical values. In contrast to panel data, such spatial data pooled over time reflect multi-directional spatial effects but only unidirectional temporal effects, which are challenging to analyze. This paper presents an innovative approach to address this challenge – a geographically and temporally weighted co-location quotient which includes global and local computation, a method to calculate a spatiotemporal weight matrix and a significance test using Monte Carlo simulation. This new approach is used to identify spatio-temporal crime patterns across Greater Manchester in 2016 from open source recorded crime data. The results show that this approach is suitable for the analysis and visualization of spatio-temporal dependence and heterogeneity in categorical spatial data pooled over time. It is particularly useful for detecting symmetrical spatio-temporal co-location patterns and mapping local clusters. The method also addresses the unbalanced temporal scale problem caused by unidirectional temporal data representation and explores potential impacts. The empirical evidence of the spatiotemporal crime patterns might usefully be deployed to inform the development of criminological theory by helping to disentangle the relationships between crime and the urban environment.

...read moreread less

5 citations

Journal Article•DOI•

Non-Stationary Modeling of Microlevel Road-Curve Crash Frequency with Geographically Weighted Regression

[...]

Ce Wang, Shuo Li, Jie Shan

30 Apr 2021-ISPRS international journal of geo-information

TL;DR: In this paper, the authors adopted two types of methods allowing parameters to fluctuate among observations, that is, the random parameter approach and the geographically weighted regression (GWR) approach.

...read moreread less

Abstract: Vehicle crashes on roads are caused by many factors. However, the influence of these factors is not necessarily homogenous across locations, which is a challenge for non-stationary modeling approaches. To address this problem, this paper adopts two types of methods allowing parameters to fluctuate among observations, that is, the random parameter approach and the geographically weighted regression (GWR) approach. With road curvature, curve length, pavement friction, and traffic volume as independent variables, vehicle crash frequencies are modeled by two non-spatial methods, including the negative binomial (NB) model and random parameter negative binomial (RPNB), as well as three spatial methods (GWR approach). These models are calibrated in microlevel using a dataset of 9415 horizontal curve segments with a total length of 1545 kilometers for a period of three years (2016–2018) over the State of Indiana. The results revealed that the GWR approach can capture spatial heterogeneity and therefore significantly outperforms the conventional non-spatial approach. Based on the Akaike Information Criterion (AICc), geographically weighted negative binomial regression (GWNBR) was proved to be a superior approach for statewide microlevel crash analysis.

...read moreread less

4 citations

Journal Article•DOI•

Temporal trend evaluation in monitoring programs with high spatial resolution and low temporal resolution using geographically weighted regression models

[...]

Claudia von Brömssen, Jens Fölster, Karin Eklöf

10 Apr 2023-Environmental Monitoring and Assessment

TL;DR: In this article , the authors used geographically weighted regression models, extended with a temporal component, to evaluate linear and nonlinear trends in environmental monitoring data, and applied the methods developed here, identified nonlinear changes in TOC from consistent negative trends over most of Sweden around 2010 to positive trends during later years in parts of the country.

...read moreread less

Abstract: Abstract Data from monitoring programs with high spatial resolution but low temporal resolution are often overlooked when assessing temporal trends, as the data structure does not permit the use of established trend analysis methods. However, the data include uniquely detailed information about geographically differentiated temporal trends driven by large-scale influences, such as climate or airborne deposition. In this study, we used geographically weighted regression models, extended with a temporal component, to evaluate linear and nonlinear trends in environmental monitoring data. To improve the results, we tested approaches for station-wise pre-processing of data and for validation of the resulting models. To illustrate the method, we used data on changes in total organic carbon (TOC) obtained in a monitoring program of around 4800 Swedish lakes observed once every 6 years between 2008 and 2021. On applying the methods developed here, we identified nonlinear changes in TOC from consistent negative trends over most of Sweden around 2010 to positive trends during later years in parts of the country.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

A new look at the statistical model identification

[...]

Hirotugu Akaike

01 Dec 1974-IEEE Transactions on Automatic Control

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.

...read moreread less

Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

...read moreread less

47,133 citations

Book•

Geographically Weighted Regression: The Analysis of Spatially Varying Relationships

[...]

A. Fotheringham, Chris Brunsdon, Martin Charlton

11 Oct 2002

TL;DR: In this paper, the basic GWR model is extended to include local statistics and local models for spatial data, and a software for Geographically Weighting Regression is described. But this software is not suitable for the analysis of large scale data.

...read moreread less

Abstract: Acknowledgements.Local Statistics and Local Models for Spatial Data. Geographically Weighted Regression: The Basics. Extensions to the Basic GWR Model. Statistical Inference and Geographically Weighted Regression. GWR and Spatial Autocorrelation. Scale Issues and Geographically Weighted Regression. Geographically Weighted Local Statistics. Extensions of Geographically Weighting. Software for Geographically Weighted Regression. Epilogue. Bibliography.Index.

...read moreread less

2,845 citations

Journal Article•DOI•

Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity

[...]

Chris Brunsdon¹, A. Stewart Fotheringham¹, Martin Charlton¹•Institutions (1)

Newcastle University¹

03 Sep 2010-Geographical Analysis

TL;DR: A technique is developed, termed geographically weighted regression, which attempts to capture variation by calibrating a multiple regression model which allows different relationships to exist at different points in space by using Monte Carlo methods.

...read moreread less

Abstract: Spatial nonstationarity is a condition in which a simple “global” model cannot explain the relationships between some sets of variables. The nature of the model must alter over space to reflect the structure within the data. In this paper, a technique is developed, termed geographically weighted regression, which attempts to capture this variation by calibrating a multiple regression model which allows different relationships to exist at different points in space. This technique is loosely based on kernel regression. The method itself is introduced and related issues such as the choice of a spatial weighting function are discussed. Following this, a series of related statistical tests are considered which can be described generally as tests for spatial nonstationarity. Using Monte Carlo methods, techniques are proposed for investigating the null hypothesis that the data may be described by a global model rather than a non-stationary one and also for testing whether individual regression coefficients are stable over geographic space. These techniques are demonstrated on a data set from the 1991 U.K. census relating car ownership rates to social class and male unemployment. The paper concludes by discussing ways in which the technique can be extended.

...read moreread less

2,330 citations

Journal Article•DOI•

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

[...]

Ablimit Aji¹, Fusheng Wang¹, Hoang Vo¹, Rubao Lee², Qiaoling Liu¹, Xiaodong Zhang², Joel H. Saltz¹ - Show less +3 more•Institutions (2)

Emory University¹, Ohio State University²

01 Aug 2013

TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.

...read moreread less

Abstract: Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

...read moreread less

571 citations

Book•

Speedup versus efficiency in parallel systems

[...]

Derek L. Eager, John Zahorjan, Edward D. Lazowska

01 Jan 1995

TL;DR: The tradeoff between speedup and efficiency that is inherent to a software system is investigated in this paper, and the extent to which this tradeoff is determined by the average parallelism of the software system, as contrasted with other, more detailed, characterizations, is shown.

...read moreread less

Abstract: The tradeoff between speedup and efficiency that is inherent to a software system is investigated. The extent to which this tradeoff is determined by the average parallelism of the software system, as contrasted with other, more detailed, characterizations, is shown. The extent to which both speedup and efficiency can simultaneously be poor is bound: it is shown that for any software system and any number of processors, the sum of the average processor utilization (i.e. efficiency) and the attained fraction of the maximum possible speedup must exceed one. Bounds are given on speedup and efficiency, and on the incremental benefit and cost of allocating additional processors. An explicit formulation, as well as bounds, are given for the location of the knee of the execution time-efficiency profile, where the benefit per unit cost is maximized. >

...read moreread less

422 citations