scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A CUDA-Based Parallel Geographically Weighted Regression for Large-Scale Geographic Data

30 Oct 2020-ISPRS international journal of geo-information (Multidisciplinary Digital Publishing Institute)-Vol. 9, Iss: 11, pp 653
TL;DR: An improved approach based on the compute unified device architecture (CUDA) parallel architecture fast-parallel-GWR (FPGWR) is proposed in this paper to efficiently handle the computational demands of performing GWR over millions of data points.
Abstract: Geographically weighted regression (GWR) introduces the distance weighted kernel function to examine the non-stationarity of geographical phenomena and improve the performance of global regression. However, GWR calibration becomes critical when using a serial computing mode to process large volumes of data. To address this problem, an improved approach based on the compute unified device architecture (CUDA) parallel architecture fast-parallel-GWR (FPGWR) is proposed in this paper to efficiently handle the computational demands of performing GWR over millions of data points. FPGWR is capable of decomposing the serial process into parallel atomic modules and optimizing the memory usage. To verify the computing capability of FPGWR, we designed simulation datasets and performed corresponding testing experiments. We also compared the performance of FPGWR and other GWR software packages using open datasets. The results show that the runtime of FPGWR is negatively correlated with the CUDA core number, and the calculation efficiency of FPGWR achieves a rate of thousands or even tens of thousands times faster than the traditional GWR algorithms. FPGWR provides an effective tool for exploring spatial heterogeneity for large-scale geographic data (geodata).
Citations
More filters
Journal ArticleDOI
TL;DR: In this article , two high-performance R solutions for GWR via Multi-core Parallel (MP) and Compute Unified Device Architecture (CUDA) techniques, respectively GWR-MP and GWR -CUDA, were proposed.
Abstract: ABSTRACT As an established spatial analytical tool, Geographically Weighted Regression (GWR) has been applied across a variety of disciplines. However, its usage can be challenging for large datasets, which are increasingly prevalent in today’s digital world. In this study, we propose two high-performance R solutions for GWR via Multi-core Parallel (MP) and Compute Unified Device Architecture (CUDA) techniques, respectively GWR-MP and GWR-CUDA. We compared GWR-MP and GWR-CUDA with three existing solutions available in Geographically Weighted Models (GWmodel), Multi-scale GWR (MGWR) and Fast GWR (FastGWR). Results showed that all five solutions perform differently across varying sample sizes, with no single solution a clear winner in terms of computational efficiency. Specifically, solutions given in GWmodel and MGWR provided acceptable computational costs for GWR studies with a relatively small sample size. For a large sample size, GWR-MP and FastGWR provided coherent solutions on a Personal Computer (PC) with a common multi-core configuration, GWR-MP provided more efficient computing capacity for each core or thread than FastGWR. For cases when the sample size was very large, and for these cases only, GWR-CUDA provided the most efficient solution, but should note its I/O cost with small samples. In summary, GWR-MP and GWR-CUDA provided complementary high-performance R solutions to existing ones, where for certain data-rich GWR studies, they should be preferred.

5 citations

Journal ArticleDOI
TL;DR: In this article , a geographically and temporally weighted co-location quotient which includes global and local computation, a method to calculate a spatiotemporal weight matrix and a significance test using Monte Carlo simulation is used to identify spatio-temporal crime patterns across Greater Manchester.
Abstract: Abstract Incident data, a form of big data frequently used in urban studies, are characterized by point features with high spatial and temporal resolution and categorical values. In contrast to panel data, such spatial data pooled over time reflect multi-directional spatial effects but only unidirectional temporal effects, which are challenging to analyze. This paper presents an innovative approach to address this challenge – a geographically and temporally weighted co-location quotient which includes global and local computation, a method to calculate a spatiotemporal weight matrix and a significance test using Monte Carlo simulation. This new approach is used to identify spatio-temporal crime patterns across Greater Manchester in 2016 from open source recorded crime data. The results show that this approach is suitable for the analysis and visualization of spatio-temporal dependence and heterogeneity in categorical spatial data pooled over time. It is particularly useful for detecting symmetrical spatio-temporal co-location patterns and mapping local clusters. The method also addresses the unbalanced temporal scale problem caused by unidirectional temporal data representation and explores potential impacts. The empirical evidence of the spatiotemporal crime patterns might usefully be deployed to inform the development of criminological theory by helping to disentangle the relationships between crime and the urban environment.

5 citations

Journal ArticleDOI
TL;DR: In this paper, the authors adopted two types of methods allowing parameters to fluctuate among observations, that is, the random parameter approach and the geographically weighted regression (GWR) approach.
Abstract: Vehicle crashes on roads are caused by many factors. However, the influence of these factors is not necessarily homogenous across locations, which is a challenge for non-stationary modeling approaches. To address this problem, this paper adopts two types of methods allowing parameters to fluctuate among observations, that is, the random parameter approach and the geographically weighted regression (GWR) approach. With road curvature, curve length, pavement friction, and traffic volume as independent variables, vehicle crash frequencies are modeled by two non-spatial methods, including the negative binomial (NB) model and random parameter negative binomial (RPNB), as well as three spatial methods (GWR approach). These models are calibrated in microlevel using a dataset of 9415 horizontal curve segments with a total length of 1545 kilometers for a period of three years (2016–2018) over the State of Indiana. The results revealed that the GWR approach can capture spatial heterogeneity and therefore significantly outperforms the conventional non-spatial approach. Based on the Akaike Information Criterion (AICc), geographically weighted negative binomial regression (GWNBR) was proved to be a superior approach for statewide microlevel crash analysis.

4 citations

Journal ArticleDOI
TL;DR: In this article , the authors used geographically weighted regression models, extended with a temporal component, to evaluate linear and nonlinear trends in environmental monitoring data, and applied the methods developed here, identified nonlinear changes in TOC from consistent negative trends over most of Sweden around 2010 to positive trends during later years in parts of the country.
Abstract: Abstract Data from monitoring programs with high spatial resolution but low temporal resolution are often overlooked when assessing temporal trends, as the data structure does not permit the use of established trend analysis methods. However, the data include uniquely detailed information about geographically differentiated temporal trends driven by large-scale influences, such as climate or airborne deposition. In this study, we used geographically weighted regression models, extended with a temporal component, to evaluate linear and nonlinear trends in environmental monitoring data. To improve the results, we tested approaches for station-wise pre-processing of data and for validation of the resulting models. To illustrate the method, we used data on changes in total organic carbon (TOC) obtained in a monitoring program of around 4800 Swedish lakes observed once every 6 years between 2008 and 2021. On applying the methods developed here, we identified nonlinear changes in TOC from consistent negative trends over most of Sweden around 2010 to positive trends during later years in parts of the country.
References
More filters
Journal ArticleDOI
TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

47,133 citations

Book
11 Oct 2002
TL;DR: In this paper, the basic GWR model is extended to include local statistics and local models for spatial data, and a software for Geographically Weighting Regression is described. But this software is not suitable for the analysis of large scale data.
Abstract: Acknowledgements.Local Statistics and Local Models for Spatial Data. Geographically Weighted Regression: The Basics. Extensions to the Basic GWR Model. Statistical Inference and Geographically Weighted Regression. GWR and Spatial Autocorrelation. Scale Issues and Geographically Weighted Regression. Geographically Weighted Local Statistics. Extensions of Geographically Weighting. Software for Geographically Weighted Regression. Epilogue. Bibliography.Index.

2,845 citations

Journal ArticleDOI
TL;DR: A technique is developed, termed geographically weighted regression, which attempts to capture variation by calibrating a multiple regression model which allows different relationships to exist at different points in space by using Monte Carlo methods.
Abstract: Spatial nonstationarity is a condition in which a simple “global” model cannot explain the relationships between some sets of variables. The nature of the model must alter over space to reflect the structure within the data. In this paper, a technique is developed, termed geographically weighted regression, which attempts to capture this variation by calibrating a multiple regression model which allows different relationships to exist at different points in space. This technique is loosely based on kernel regression. The method itself is introduced and related issues such as the choice of a spatial weighting function are discussed. Following this, a series of related statistical tests are considered which can be described generally as tests for spatial nonstationarity. Using Monte Carlo methods, techniques are proposed for investigating the null hypothesis that the data may be described by a global model rather than a non-stationary one and also for testing whether individual regression coefficients are stable over geographic space. These techniques are demonstrated on a data set from the 1991 U.K. census relating car ownership rates to social class and male unemployment. The paper concludes by discussing ways in which the technique can be extended.

2,330 citations

Journal ArticleDOI
01 Aug 2013
TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.
Abstract: Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

571 citations

Book
01 Jan 1995
TL;DR: The tradeoff between speedup and efficiency that is inherent to a software system is investigated in this paper, and the extent to which this tradeoff is determined by the average parallelism of the software system, as contrasted with other, more detailed, characterizations, is shown.
Abstract: The tradeoff between speedup and efficiency that is inherent to a software system is investigated. The extent to which this tradeoff is determined by the average parallelism of the software system, as contrasted with other, more detailed, characterizations, is shown. The extent to which both speedup and efficiency can simultaneously be poor is bound: it is shown that for any software system and any number of processors, the sum of the average processor utilization (i.e. efficiency) and the attained fraction of the maximum possible speedup must exceed one. Bounds are given on speedup and efficiency, and on the incremental benefit and cost of allocating additional processors. An explicit formulation, as well as bounds, are given for the location of the knee of the execution time-efficiency profile, where the benefit per unit cost is maximized. >

422 citations