scispace - formally typeset
Search or ask a question

Showing papers in "AStA Advances in Statistical Analysis in 2013"


Journal ArticleDOI
TL;DR: This work considers a hierarchical spatio-temporal model for particulate matter (PM) concentration in the North-Italian region Piemonte and proposes a strategy to represent a GF with Matérn covariance function as a Gaussian Markov Random Field (GMRF) through the SPDE approach.
Abstract: In this work, we consider a hierarchical spatio-temporal model for particulate matter (PM) concentration in the North-Italian region Piemonte. The model involves a Gaussian Field (GF), affected by a measurement error, and a state process characterized by a first order autoregressive dynamic model and spatially correlated innovations. This kind of model is well discussed and widely used in the air quality literature thanks to its flexibility in modelling the effect of relevant covariates (i.e. meteorological and geographical variables) as well as time and space dependence. However, Bayesian inference—through Markov chain Monte Carlo (MCMC) techniques—can be a challenge due to convergence problems and heavy computational loads. In particular, the computational issue refers to the infeasibility of linear algebra operations involving the big dense covariance matrices which occur when large spatio-temporal datasets are present. The main goal of this work is to present an effective estimating and spatial prediction strategy for the considered spatio-temporal model. This proposal consists in representing a GF with Matern covariance function as a Gaussian Markov Random Field (GMRF) through the Stochastic Partial Differential Equations (SPDE) approach. The main advantage of moving from a GF to a GMRF stems from the good computational properties that the latter enjoys. In fact, GMRFs are defined by sparse matrices that allow for computationally effective numerical methods. Moreover, when dealing with Bayesian inference for GMRFs, it is possible to adopt the Integrated Nested Laplace Approximation (INLA) algorithm as an alternative to MCMC methods giving rise to additional computational advantages. The implementation of the SPDE approach through the R-library INLA ( www.r-inla.org ) is illustrated with reference to the Piemonte PM data. In particular, providing the step-by-step R-code, we show how it is easy to get prediction and probability of exceedance maps in a reasonable computing time.

337 citations


Journal ArticleDOI
TL;DR: This work reviews existing methods and compares them on a set of designs that exhibit few bumps and exponentially falling tails and finds that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.
Abstract: On the one hand, kernel density estimation has become a common tool for empirical studies in any research area. This goes hand in hand with the fact that this kind of estimator is now provided by many software packages. On the other hand, since about three decades the discussion on bandwidth selection has been going on. Although a good part of the discussion is about nonparametric regression, this parameter choice is by no means less problematic for density estimation. This becomes obvious when reading empirical studies in which practitioners have made use of kernel densities. New contributions typically provide simulations only to show that the own selector outperforms some of the existing methods. We review existing methods and compare them on a set of designs that exhibit few bumps and exponentially falling tails. We concentrate on small and moderate sample sizes because for large ones the differences between consistent methods are often negligible, at least for practitioners. As a byproduct we find that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.

212 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define a surface called a quantile sheet, on the domain of the independent variable and the probability, and any desired quantile curve is obtained by evaluating the sheet for a fixed probability.
Abstract: The results of quantile smoothing often show crossing curves, in particular, for small data sets. We define a surface, called a quantile sheet, on the domain of the independent variable and the probability. Any desired quantile curve is obtained by evaluating the sheet for a fixed probability. This sheet is modeled by $$P$$ -splines in form of tensor products of $$B$$ -splines with difference penalties on the array of coefficients. The amount of smoothing is optimized by cross-validation. An application for reference growth curves for children is presented.

52 citations


Journal ArticleDOI
TL;DR: In this article, a Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure, and all waves are consistently imputed using the new method.
Abstract: Questions about monetary variables (such as income, wealth or savings) are key components of questionnaires on household finances. However, missing information on such sensitive topics is a well-known phenomenon which can seriously bias any inference based only on complete-case analysis. Many imputation techniques have been developed and implemented in several surveys. Using the German SAVE data, a new estimation technique is necessary to overcome the upward bias of monetary variables caused by the initially implemented imputation procedure. The upward bias is the result of adding random draws to the implausible negative values predicted by OLS regressions until all values are positive. To overcome this problem the logarithm of the dependent variable is taken and the predicted values are retransformed to the original scale by Duan’s smearing estimate. This paper evaluates the two different techniques for the imputation of monetary variables implementing a simulation study, where a random pattern of missingness is imposed on the observed values of the variables of interest. A Monte-Carlo simulation based on the observed data shows the superiority of the newly implemented smearing estimate to construct the missing data structure. All waves are consistently imputed using the new method.

33 citations


Journal ArticleDOI
TL;DR: A state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework is provided.
Abstract: Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice.

27 citations


Journal ArticleDOI
TL;DR: In this paper, exact formulae for the critical values of Mandel's h and k and approximate formulas for the Single Grubbs test, Double Grubbing test, and the Cochran test are derived.
Abstract: According to ISO 5725-2 (1994), measurement results obtained in an interlaboratory experiment are inspected for consistency by plotting Mandel’s h and k statistics and for outliers by application of the Grubbs test and the Cochran test. Critical values of these statistics for significance levels α=5% and α=1% and for some numbers p of laboratories and n of repeated measurements in the laboratories are supplied in ISO 5725-2 without reference to methods for their calculation. In this paper, exact formulae for the critical values of Mandel’s h and k and approximate formulae for the critical values of the Single Grubbs test, the Double Grubbs test and the Cochran test are derived.

20 citations


Journal ArticleDOI
Xiaofeng Lv1, Rui Li1
TL;DR: In this article, the estimation and inference of the parameters and the nonparametric part in partially linear quantile regression models with responses that are missing at random is considered. But the asymptotic covariance matrices of NA-based methods are difficult to estimate, which complicates inference.
Abstract: In this paper, we consider the estimation and inference of the parameters and the nonparametric part in partially linear quantile regression models with responses that are missing at random. First, we extend the normal approximation (NA)-based methods of Sun (2005) to the missing data case. However, the asymptotic covariance matrices of NA-based methods are difficult to estimate, which complicates inference. To overcome this problem, alternatively, we propose the smoothed empirical likelihood (SEL)-based methods. We define SEL statistics for the parameters and the nonparametric part and demonstrate that the limiting distributions of the statistics are Chi-squared distributions. Accordingly, confidence regions can be obtained without the estimation of the asymptotic covariance matrices. Monte Carlo simulations are conducted to evaluate the performance of the proposed method. Finally, the NA- and SEL-based methods are applied to real data.

18 citations


Journal ArticleDOI
TL;DR: In this paper, a scoring rule has been proposed to estimate pseudo-likelihood in a general estimation technique based on proper scoring rules, which can be extended to allow for missing data.
Abstract: We display pseudo-likelihood as a special case of a general estimation technique based on proper scoring rules. Such a rule supplies an unbiased estimating equation for any statistical model, and this can be extended to allow for missing data. When the scoring rule has a simple local structure, as in many spatial models, the need to compute problematic normalising constants is avoided. We illustrate the approach through an analysis of data on disease in bell pepper plants.

16 citations


Journal ArticleDOI
TL;DR: In this paper, a thorough multivariate geostatistical analysis is proposed, where different tools for testing the symmetry assumption of the spatio-temporal linear coregionalization model are considered, as well as a recent fitting procedure of the ST-LCM, based on the simultaneous diagonalization of symmetric real-valued matrix variograms, is adopted and two non-separable classes of variogram models, the product-sum and Gneiting classes, are fitted to the basic components.
Abstract: Vehicular traffic, industrial activity and street dust are important sources of atmospheric particles, which cause pollution and serious health problems, including respiratory illness. Hence, techniques for analyzing and modeling the spatio-temporal behavior of particulate matter (PM), in the recent statistical literature, represent an essential support for environmental and human health protection. In this paper, air pollution from particles with diameters smaller than 10 $${\rm \mu}$$ m and related meteorological variables, such as temperature and wind speed, measured during November 2009 in the south of Apulian region (Lecce, Brindisi, and Taranto districts) are studied. A thorough multivariate geostatistical analysis is proposed, where different tools for testing the symmetry assumption of the spatio-temporal linear coregionalization model (ST-LCM) are considered, as well as a recent fitting procedure of the ST-LCM, based on the simultaneous diagonalization of symmetric real-valued matrix variograms, is adopted and two non-separable classes of variogram models, the product–sum and Gneiting classes, are fitted to the basic components. The most significant aspects of this study are (a) the quantitative assessment of the assumption of symmetry of the ST-LCM, (b) the use of different non-separable spatio-temporal models for fitting the basic components of a ST-LCM and, more importantly, (c) the application of the spatio-temporal multivariate geostatistical analysis to predict particle pollution in one of the most polluted geographical area. Prediction maps for particle pollution levels with the corresponding validation results are given.

14 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new goodness-of-fit testing scheme for the marginal distribution of Markov regime-switching models, which is based on the Kolmogorov-Smirnov supremum-distance statistic and the concept of the weighted empirical distribution function.
Abstract: This paper complements a recently published study (Janczura and Weron in AStA-Adv Stat Anal 96(3):385–407, 2012) on efficient estimation of Markov regime-switching models. Here, we propose a new goodness-of-fit testing scheme for the marginal distribution of such models. We consider models with an observable (like threshold autoregressions) as well as a latent state process (like Markov regime-switching). The test is based on the Kolmogorov–Smirnov supremum-distance statistic and the concept of the weighted empirical distribution function. The motivation for this research comes from a recent stream of literature in energy economics concerning electricity spot price models. While the existence of distinct regimes in such data is generally unquestionable (due to the supply stack structure), the actual goodness-of-fit of the models requires statistical validation. We illustrate the proposed scheme by testing whether commonly used Markov regime-switching models fit deseasonalized electricity prices from the NEPOOL (US) day-ahead market.

12 citations


Journal ArticleDOI
TL;DR: A hierarchical regression log-Poisson space-time model within a Bayesian approach is presented to represent the incidence of malaria in Sucre state, Venezuela, during the period 1990–2002 in 15 municipalities of the state.
Abstract: Malaria is a parasitic infectious tropical disease that causes high mortality rates in the tropical belt. In Venezuela, Sucre state is considered the third state with most disease prevalence. This paper presents a hierarchical regression log-Poisson space-time model within a Bayesian approach to represent the incidence of malaria in Sucre state, Venezuela, during the period 1990–2002 in 15 municipalities of the state. Several additive models for the logarithm of the relative risk of the disease for each district were considered. These models differ in their structure by including different combinations of social-economic and climatic covariates in a multiple regression term. A random effect that captures the spatial heterogeneity in the study region, and a CAR (Conditionally Autoregressive) component that recognizes the effect of nearby municipalities in the transmission of the disease each year, are also included in the model. A simpler version without including the CAR component was also fitted to the data. Model estimation and predictive inference was carried out through the implementation of a computer code in the WinBUGS software, which makes use of Markov Chain Monte Carlo (MCMC) methods. For model selection the criterion of minimum posterior predictive loss (D) was used. The Moran I statistic was calculated to test the independence of the residuals of the resulting model. Finally, we verify the model fit by using the Bayesian p-value, and in most cases the selected model captures the spatial structure of the relative risks among the neighboring municipalities each year. For years with a poor model fit, the t-Student distribution is used as an alternative model for the spatial local random effect with better fit to the tail behavior of the data probability distribution.

Journal ArticleDOI
TL;DR: In this paper, the problem of testing for a copula parameter change in semiparametric copula-based multivariate dynamic models which cover ARMA-GARCH models is considered.
Abstract: In this article, we consider the problem of testing for a copula parameter change in semiparametric copula-based multivariate dynamic models which cover ARMA-GARCH models. We construct the test statistics based on a pseudo MLE of the copula parameter and derive its limiting null distribution. Simulation results are provided for illustration.

Journal ArticleDOI
TL;DR: In this paper, a geoadditive model for extreme value models and techniques are applied in environmental studies to define protection systems against the effects of extreme levels of environmental processes, and a certain importance is covered by the implication of changes in the hydrological cycle.
Abstract: Extreme value models and techniques are widely applied in environmental studies to define protection systems against the effects of extreme levels of environmental processes. Regarding the matter related to the climate science, a certain importance is covered by the implication of changes in the hydrological cycle. Among all hydrologic processes, rainfall is a very important variable as it is strongly related to flood risk assessment and mitigation, as well as to water resources availability and drought identification. We implement here a geoadditive model for extremes assuming that the observations follow a generalized extreme value distribution with spatially dependent location. The analyzed territory is the catchment area of the Arno River in Tuscany in Central Italy.

Journal ArticleDOI
TL;DR: In this article, the authors consider a hierarchical model in which non-Gaussian variables of different kind are handled simultaneously, and they show that when observations are assumed to be conditionally distributed as Poisson and Gamma, variograms and cross-variograms have convenient simple forms, and estimation of the parameters of the model can be carried out by Monte Carlo EM.
Abstract: To improve the quality of prediction of radioactive contamination, geostatistical methods, and in particular multivariate geostatistical models, are increasingly being used. These methods, however, are optimal only in the case in which the data may be assumed Gaussian and do not properly cope with data measurements that are discrete, nonnegative or show some degree of skewness. To deal with these situations, here we consider a hierarchical model in which non-Gaussian variables of different kind are handled simultaneously. We show that when observations are assumed to be conditionally distributed as Poisson and Gamma, variograms and cross-variograms have convenient simple forms, and estimation of the parameters of the model can be carried out by Monte Carlo EM. This work was inspired by radioactive contamination data from the Maddalena Archipelago (Sardinia, Italy).

Journal ArticleDOI
TL;DR: In this article, the hazard rate is estimated nonparametrically by kernel smoothing with the nearest-neighbor bandwidth, and strong uniform consistency of the estimate from Hoeffding's inequality, applied to a generalized empirical distribution function.
Abstract: Duration data often suffer from both left-truncation and right-censoring. We show how both deficiencies can be overcome at the same time when estimating the hazard rate nonparametrically by kernel smoothing with the nearest-neighbor bandwidth. Smoothing Turnbull’s estimator of the cumulative hazard rate, we derive strong uniform consistency of the estimate from Hoeffding’s inequality, applied to a generalized empirical distribution function. We also apply our estimator to rating transitions of corporate loans in Germany.

Journal ArticleDOI
TL;DR: In this article, the validity of established panel unit root tests applied to panels in which the individual time series are of different lengths is discussed, and a Monte Carlo study reveals that in unbalanced panels, procedures involving the computation of individual $$p$$ -values for each cross-section unit (or the combination thereof) are mostly superior to those relying on a pooled Dickey-Fuller regression framework.
Abstract: This paper is about the validity of established panel unit root tests applied to panels in which the individual time series are of different lengths, a case often encountered in practice. Most of the tests considered work well under various types of cross-correlation which is true for both, their application in balanced as well as in unbalanced panels. A Monte Carlo study reveals that in unbalanced panels, procedures involving the computation of individual $$p$$ -values for each cross-section unit (or the combination thereof) are mostly superior to those relying on a pooled Dickey–Fuller regression framework. As the former are able to consider each unit separately, they do not require cutting back the “longer” time series so as to obtain the smallest “balanced” quadrangle which in turn means that no potentially valuable information is lost.

Journal ArticleDOI
TL;DR: This article used Monte Carlo (MC) methods to generate time trajectories of mortality tables, which form a more comprehensive basis for estimating the root-mean-square error (RMSE) of different mortality forecasts.
Abstract: Mortality projections are of special interest in many applications. For example, they are essential in life insurances to determine the annual contributions of their members as well as for population predictions. Due to their importance, there exists a huge variety of mortality forecasting models from which to seek the best approach. In the demographic literature, statements about the quality of the various models are mostly based on empirical ex-post examinations of mortality data for very few populations. On the basis of such a small number of observations, it is impossible to precisely estimate statistical forecasting measures. We use Monte Carlo (MC) methods here to generate time trajectories of mortality tables, which form a more comprehensive basis for estimating the root-mean-square error (RMSE) of different mortality forecasts.

Journal ArticleDOI
TL;DR: The gap between different published null distributions of the corresponding restricted likelihood ratio test under different assumptions is filled and it is shown that the asymptotic scenario is determined by the choice of the penalty and not by the choices of the spline basis or number of knots.
Abstract: Penalized spline regression using a mixed effects representation is one of the most popular nonparametric regression tools to estimate an unknown regression function $$f(\cdot )$$ . In this context testing for polynomial regression against a general alternative is equivalent to testing for a zero variance component. In this paper, we fill the gap between different published null distributions of the corresponding restricted likelihood ratio test under different assumptions. We show that: (1) the asymptotic scenario is determined by the choice of the penalty and not by the choice of the spline basis or number of knots; (2) non-standard asymptotic results correspond to common penalized spline penalties on derivatives of $$f(\cdot )$$ , which ensure good power properties; and (3) standard asymptotic results correspond to penalized spline penalties on $$f(\cdot )$$ itself, which lead to sizeable power losses under smooth alternatives. We provide simple and easy to use guidelines for the restricted likelihood ratio test in this context.

Journal ArticleDOI
TL;DR: This paper proposes a method for evaluating and comparing models that progressively include group differences, and Hierarchical modeling under a Bayesian perspective is followed, allowing flexible models and the statistical assessment of results based on posterior predictive distributions.
Abstract: Hierarchical spatio-temporal models allow for the consideration and estimation of many sources of variability. A general spatio-temporal model can be written as the sum of a spatio-temporal trend and a spatio-temporal random effect. When spatial locations are considered to be homogeneous with respect to some exogenous features, the groups of locations may share a common spatial domain. Differences between groups can be highlighted both in the large-scale, spatio-temporal component and in the spatio-temporal dependence structure. When these differences are not included in the model specification, model performance and spatio-temporal predictions may be weak. This paper proposes a method for evaluating and comparing models that progressively include group differences. Hierarchical modeling under a Bayesian perspective is followed, allowing flexible models and the statistical assessment of results based on posterior predictive distributions. This procedure is applied to tropospheric ozone data in the Italian Emilia–Romagna region for 2001, where 30 monitoring sites are classified according to environmental laws into two groups by their relative position with respect to traffic emissions.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated whether codependence restrictions can be uniquely imposed on VAR models via the so-called pseudo-structural form used in the literature and showed that this is not generally the case, but that unique imposition is guaranteed in several important special cases.
Abstract: This paper investigates whether codependence restrictions can be uniquely imposed on VAR models via the so-called pseudo-structural form used in the literature. Codependence of order q is given if a linear combination of autocorrelated variables eliminates the serial correlation after q lags. Importantly, maximum likelihood estimation and likelihood ratio testing are only possible if the codependence restrictions can be uniquely imposed. Applying the pseudo-structural form, our study reveals that this is not generally the case, but that unique imposition is guaranteed in several important special cases.

Journal ArticleDOI
TL;DR: The illustration suggests that the often recommended way to use panel data for longitudinal analyses, data from total respondents and weights from the last wave analysed may not be the best way to go.
Abstract: A researcher using complex longitudinal survey data for event history analysis has to make several choices that affect the analysis results. These choices include the following: whether a design-based or a model-based approach for the analysis is taken, which subset of data to use and, if a design-based approach is chosen, which weights to use. We discuss different choices and illustrate their effects using longitudinal register data linked at person-level with the Finnish subset of the European Community Household Panel data. The use of register data enables us to construct an event history data set without nonresponse and attrition. Design-based estimates from these data are used as benchmarks against design-based and model-based estimates from subsets of data usually available for a survey data analyst. Our illustration suggests that the often recommended way to use panel data for longitudinal analyses, data from total respondents and weights from the last wave analysed may not be the best way to go. Instead, using all available data and weights from the first survey wave appears to be a safe choice for longitudinal analyses based on multipurpose survey data.

Journal ArticleDOI
TL;DR: The second edition of the European Regional Conference of The International Environmetrics Society (TIES) and a satellite of the 58th World Statistics Congress of the International Statistical Institute (ISI) was held in Baia delle Zagare, Italy.
Abstract: This special issue follows the conference “Spatial Data Methods for Environmental and Ecological Processes—2nd Edition” which started at the University of Foggia, Italy, and continued in the beautiful scenery of Baia delle Zagare on the 1st and 2nd of September 2011. The conference was the 2011 European Regional Conference of The International Environmetrics Society (TIES) and a satellite of the 58th World Statistics Congress of the International Statistical Institute (ISI). Importantly, the conference was structured on the basis of a largely interdisciplinary project with the aim of creating a space for the exchange of experiences and ideas among researchers from different scientific backgrounds working on spatial and spatiotemporal environmental problems. The theme of the workshop provided ample space for contributions covering a wide range of territorial, ecological and environmental topics. The Conference’s Scientific Committee tailored the program in such a way as to provide the fruitful interaction of various different fields, under the common banner of ‘spatial analysis’. The seven papers of this special issue have a strong motivation on clearly stated environmental problems and consider advanced spatial modeling issues. Three of them are related to advanced spatiotemporal modeling of air quality data at regional scale. Another two papers consider spatial epidemiology, in particular plant infections and malaria. Moreover, one work faces spatial modeling of rainfall extremes at