scispace - formally typeset
Search or ask a question
Author

Dale L. Zimmerman

Bio: Dale L. Zimmerman is an academic researcher from University of Iowa. The author has contributed to research in topics: Kriging & Covariance function. The author has an hindex of 20, co-authored 21 publications receiving 2956 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a factorial, computational experiment was conducted to compare the spatial interpolation accuracy of ordinary and universal kriging and two types of inverse squared-distance weighting.
Abstract: A factorial, computational experiment was conducted to compare the spatial interpolation accuracy of ordinary and universal kriging and two types of inverse squared-distance weighting. The experiment considered, in addition to these four interpolation methods, the effects of four data and sampling characteristics: surface type, sampling pattern, noise level, and strength of small-scale spatial correlation. Interpolation accuracy was measured by the natural logarithm of the mean squared interpolation error. Main effects of all five factors, all two-factor interactions, and several three-factor interactions were highly statistically significant. Among numerous findings, the most striking was that the two kriging methods were substantially superior to the inverse distance weighting methods over all levels of surface type, sampling pattern, noise, and correlation.

463 citations

Journal ArticleDOI
TL;DR: It is concluded that the geographical masks described, when appropriately used, protect the confidentiality of health records while permitting many important geographically-based analyses, but that further research is needed to determine how the power of tests for clustering or the strength of other associative relationships are adversely affected by the characteristics of different masks.
Abstract: The conventional approach to preserving the confidentiality of health records aggregates all records within a geographical area that has a population large enough to ensure prevention of disclosure. Though this approach normally protects the privacy of individuals, the use of such aggregated data limits the types of research one can conduct and makes it impossible to address many important health problems. In this paper we discuss the design and implementation of geographical masks that not only preserve the security of individual health records, but also support the investigation of questions that can be answered only with some knowledge about the location of health events. We describe several alternative methods of masking individual-level data, evaluate their performance, and discuss both the degree to which we can analyse masked data validly as well as the relative security of each approach, should anyone attempt to recover the identity of an individual from the masked data. We conclude that the geographical masks we describe, when appropriately used, protect the confidentiality of health records while permitting many important geographically-based analyses, but that further research is needed to determine how the power of tests for clustering or the strength of other associative relationships are adversely affected by the characteristics of different masks.

301 citations

Journal ArticleDOI
TL;DR: In this article, the authors compare the performance of two Gaussian intrinsic random-field models, and compare it with a Monte Carlo simulation study, by comparing their performance with the performance in the context of kriging.
Abstract: Predicting values of a spatially distributed variable, such as the concentration of a mineral throughout an ore body or the level of contamination around a toxic-waste dump, can be accomplished by a regression procedure known as kriging. Kriging and other types of statistical inference for spatially distributed variables are based on models of stochastic processes {Y t: t ∊ D} called random-field models. A commonly used class of random-field models are the intrinsic models, for which the mean is constant, and half of the variance of Yt , – Ys . is a function, called the semivariogram, of the difference t – s. The type of kriging corresponding to an intrinsic model is called ordinary kriging. The semivariogram, which typically is taken to depend on one or more unknown parameters, must be estimated prior to ordinary kriging. Various estimators of the semivariogram's parameters have been proposed. For two Gaussian intrinsic random-field models, we compare, by a Monte Carlo simulation study, the performance o...

219 citations

Journal ArticleDOI
TL;DR: Mixtures of bivariate t distributions with few components appear to be flexible enough to fit many positional error datasets associated with geocoding, yet parsimonious enough to be feasible for nascent applications of measurement-error methodology to spatial epidemiology.
Abstract: The assignment of a point-level geocode to subjects' residences is an important data assimilation component of many geographic public health studies. Often, these assignments are made by a method known as automated geocoding, which attempts to match each subject's address to an address-ranged street segment georeferenced within a streetline database and then interpolate the position of the address along that segment. Unfortunately, this process results in positional errors. Our study sought to model the probability distribution of positional errors associated with automated geocoding and E911 geocoding. Positional errors were determined for 1423 rural addresses in Carroll County, Iowa as the vector difference between each 100%-matched automated geocode and its true location as determined by orthophoto and parcel information. Errors were also determined for 1449 60%-matched geocodes and 2354 E911 geocodes. Huge (> 15 km) outliers occurred among the 60%-matched geocoding errors; outliers occurred for the other two types of geocoding errors also but were much smaller. E911 geocoding was more accurate (median error length = 44 m) than 100%-matched automated geocoding (median error length = 168 m). The empirical distributions of positional errors associated with 100%-matched automated geocoding and E911 geocoding exhibited a distinctive Greek-cross shape and had many other interesting features that were not capable of being fitted adequately by a single bivariate normal or t distribution. However, mixtures of t distributions with two or three components fit the errors very well. Mixtures of bivariate t distributions with few components appear to be flexible enough to fit many positional error datasets associated with geocoding, yet parsimonious enough to be feasible for nascent applications of measurement-error methodology to spatial epidemiology.

218 citations

Journal ArticleDOI
TL;DR: It is concluded that selection of one particular type of geographic area as the geocode may unnecessarily constrain future work.

191 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework for estimating Output from Computer Experiments-Predicting Output from Training Data and Criteria Based Designs for computer Experiments.
Abstract: Many scientific phenomena are now investigated by complex computer models or codes A computer experiment is a number of runs of the code with various inputs A feature of many computer experiments is that the output is deterministic--rerunning the code with the same inputs gives identical observations Often, the codes are computationally expensive to run, and a common objective of an experiment is to fit a cheaper predictor of the output to the data Our approach is to model the deterministic output as the realization of a stochastic process, thereby providing a statistical basis for designing experiments (choosing the inputs) for efficient prediction With this model, estimates of uncertainty of predictions are also available Recent work in this area is reviewed, a number of applications are discussed, and we demonstrate our methodology with an example

6,583 citations

Journal ArticleDOI

6,278 citations

Book
25 Aug 2008
TL;DR: An overview of model-based geostatistics can be found in this paper, where a generalized linear model is proposed for estimating geometrical properties of geometrically constrained data.
Abstract: An overview of model-based geostatistics.- Gaussian models for geostatistical data.- Generalized linear models for geostatistical data.- Classical parameter estimation.- Spatial prediction.- Bayesian inference.- Geostatistical design.

2,397 citations

Journal ArticleDOI
TL;DR: It is shown that, using an approximate stochastic weak solution to (linear) stochastically partial differential equations, some Gaussian fields in the Matérn class can provide an explicit link, for any triangulation of , between GFs and GMRFs, formulated as a basis function representation.
Abstract: Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in R-2 only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matern class, provide an explicit link, for any triangulation of R-d, between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere. (Less)

2,212 citations

Journal Article
TL;DR: It is proved that the problem of finding the configuration that maximizes mutual information is NP-complete, and a polynomial-time approximation is described that is within (1-1/e) of the optimum by exploiting the submodularity of mutual information.
Abstract: When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A-, D-, or E-optimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NP-complete. To address this issue, we describe a polynomial-time approximation that is within (1-1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodularity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two real-world data sets.

1,593 citations