Showing papers in "Journal of The Royal Statistical Society Series C-applied Statistics in 2008"

PDF

Open Access

Journal Article•DOI•

A simple monotone process with application to radiocarbon-dated depth chronologies

[...]

John Haslett¹, Andrew C. Parnell¹•Institutions (1)

01 Sep 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a Markov continuous Markov stochastic process is used to make inference on a partially observed monotone stochastically varying rate of sedimentation in lake sediment cores.

...read moreread less

Abstract: Summary. We propose a new and simple continuous Markov monotone stochastic process and use it to make inference on a partially observed monotone stochastic process. The process is piecewise linear, based on additive independent gamma increments arriving in a Poisson fashion. An independent increments variation allows very simple conditional simulation of sample paths given known values of the process. We take advantage of a reparameterization involving the Tweedie distribution to provide efficient computation. The motivating problem is the establishment of a chronology for samples taken from lake sediment cores, i.e. the attribution of a set of dates to samples of the core given their depths, knowing that the age–depth relationship is monotone. The chronological information arises from radiocarbon (14C) dating at a subset of depths. We use the process to model the stochastically varying rate of sedimentation.

...read moreread less

454 citations

Journal Article•DOI•

Time series forecasting with neural networks: a comparative study using the air line data

[...]

Julian J. Faraway¹, Chris Chatfield²•Institutions (2)

University of Michigan¹, University of Bath²

28 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: This case-study fits a variety of neural network models to the well-known airline data and compares the resulting forecasts with those obtained from the Box‐Jenkins and Holt‐ Winters methods, finding that an NN model which fits well may give poor out-of-sample forecasts.

...read moreread less

Abstract: Summary. This case-study fits a variety of neural network (NN) models to the well-known airline data and compares the resulting forecasts with those obtained from the Box‐Jenkins and Holt‐ Winters methods. Many potential problems in fitting NN models were revealed such as the possibility that the fitting routine may not converge or may converge to a local minimum. Moreover it was found that an NN model which fits well may give poor out-of-sample forecasts. Thus we think it is unwise to apply NN models blindly in ‘black box’ mode as has sometimes been suggested. Rather, the wise analyst needs to use traditional modelling skills to select a good NN model, e.g. to select appropriate lagged variables as the ‘inputs’. The Bayesian information criterion is preferred to Akaike’s information criterion for comparing different models. Methods of examining the response surface implied by an NN model are examined and compared with the results of alternative nonparametric procedures using generalized additive models and projection pursuit regression. The latter imposes less structure on the model and is arguably easier to understand.

...read moreread less

382 citations

Journal Article•DOI•

Diagnostics for multivariate imputations

[...]

Kobi Abayomi¹, Andrew Gelman¹, Marc A. Levy¹•Institutions (1)

Columbia University¹

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: This work considers three sorts of diagnostics for random imputations: displays of the completed data, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations.

...read moreread less

Abstract: Summary. We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries.

...read moreread less

209 citations

Journal Article•DOI•

Estimating low pay transition probabilities accounting for endogenous selection mechanisms

[...]

Lorenzo Cappellari¹, Stephen P. Jenkins¹•Institutions (1)

University of Essex¹

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: This article proposed a model of transitions into and out of low paid employment that accounts for non-ignorable panel dropout, employment retention and base year low pay status (initial conditions).

...read moreread less

Abstract: We propose a model of transitions into and out of low paid employment that accounts for non-ignorable panel dropout, employment retention and base year low pay status (‘initial conditions’). The model is fitted to data for men from the British Household Panel Survey. Initial conditions and employment retention are found to be non-ignorable selection processes. Whether panel dropout is found to be ignorable depends on how item non-response on pay is treated. Notwithstanding these results, we also find that models incorporating a simpler approach to accounting for non-ignorable selections provide estimates of covariate effects that differ very little from the estimates from the general model.

...read moreread less

72 citations

Journal Article•DOI•

A latent Gaussian model for compositional data with zeros

[...]

Adam Butler, Chris A. Glasbey

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A latent Gaussian model for the analysis of compositional data which contain zero values is presented, based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex.

...read moreread less

Abstract: Summary. Compositional data record the relative proportions of different components within a mixture and arise frequently in many fields. Standard statistical techniques for the analysis of such data assume the absence of proportions which are genuinely zero. However, real data can contain a substantial number of zero values. We present a latent Gaussian model for the analysis of compositional data which contain zero values, which is based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex. We propose an iterative algorithm to simulate values from this model and apply the model to data on the proportions of fat, protein and carbohydrate in different groups of food products. Finally, evaluation of the likelihood involves the calculation of difficult integrals if the number of components is more than 3, so we present a hybrid Gibbs rejection sampling scheme that can be used to draw inferences about the parameters of the model when the number of components is arbitrarily large.

...read moreread less

52 citations

Journal Article•DOI•

Probabilistic evaluation of handwriting evidence: likelihood ratio for authorship

[...]

Silvia Bozza, Franco Taroni¹, Raymond Marquis, Matthieu Schmittbuhl²•Institutions (2)

University of Lausanne¹, University of Strasbourg²

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The evaluation of handwritten characters that are selected from an anonymous letter and written material from a suspect is an open problem in forensic science and numerical procedures are implemented to handle the complexity and to compute the marginal likelihood under competing propositions.

...read moreread less

Abstract: Summary. The evaluation of handwritten characters that are selected from an anonymous letter and written material from a suspect is an open problem in forensic science. The individualiza tion of handwriting is largely dependent on examiners who evaluate the characteristics in a qualitative and subjective way. Precise individual characterization of the shape of handwritten characters is possible through Fourier analysis: each handwritten character can be described through a set of variables such as the surface and harmonics as demonstrated by Marquis and co-workers in 2005. The assessment of the value of the evidence is performed through the deri vation of a likelihood ratio for multivariate data. The methodology allows the forensic scientist to take into account the correlation between variables, and the non-constant variability within sources (i.e. individuals). Numerical procedures are implemented to handle the complexity and to compute the marginal likelihood under competing propositions.

...read moreread less

51 citations

Journal Article•DOI•

Identifying and diagnosing population declines: a Bayesian assessment of lapwings in the UK

[...]

Ruth King¹, Stephen Brooks², Chiara Mazzetta³, Stephen N. Freeman⁴, Byron J. T. Morgan³ - Show less +1 more•Institutions (4)

University of St Andrews¹, University of Cambridge², University of Kent³, British Trust for Ornithology⁴

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The observed decline in population size is demonstrated in relation to directly interpretable parameters describing the demographic characteristics of the population using a descriptive state space model.

...read moreread less

Abstract: Summary. We combine data from separate ring recovery and survey studies to provide indices of estimated abundance for the UK lapwing Vanellus vanellus population. Using a descriptive state space model, we demonstrate the observed decline in population size in relation to directly interpretable parameters describing the demographic characteristics of the population. The Bayesian approach readily provides information that is directly relevant to the conservation of this important bird species. The method proposed extends previous work in this area in several ways. Restrictive normality assumptions that have traditionally been imposed are removed, in addition to the assumption of constant measurement error by using information relating to the index variability across time to account fully for this source of uncertainty within the model. We also provide model-averaged inference to help to inform management policy and uses.

...read moreread less

49 citations

Journal Article•DOI•

The Muscatine children's obesity data reanalysed using pattern mixture models

[...]

Anders Ekholm¹, Chris J. Skinner²•Institutions (2)

University of Helsinki¹, University of Southampton²

28 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A saturated model is fit for the distribution of response patterns and it is found that non‐response is missing completely at random for boys but that the probability of obesity is consistently higher among girls who provided incomplete records than among girlswho provided complete records.

...read moreread less

Abstract: A set of longitudinal binary, partially incomplete, data on obesity among children in the USA is reanalysed. The multivariate Bernoulli distribution is parameterized by the univariate marginal probabilities and dependence ratios of all orders, which together support maximum likelihood inference. The temporal association of obesity is strong and complex but stationary. We fit a saturated model for the distribution of response patterns and find that non-response is missing completely at random for boys but that the probability of obesity is consistently higher among girls who provided incomplete records than among girls who provided complete records. We discuss the statistical and substantive features of, respectively, pattern mixture and selection models for this data set.

...read moreread less

49 citations

Journal Article•DOI•

Grouped data exponentially weighted moving average control charts

[...]

Stefan H. Steiner¹•Institutions (1)

University of Waterloo¹

28 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A version of exponentially weighted moving average (EWMA) control charts that are applicable to monitoring the grouped data for process shifts that are shown to be nearly as efficient as variables-based EWMA charts and are thus an attractive alternative when the collection of variables data is not feasible.

...read moreread less

Abstract: In the manufacture of metal fasteners in a progressive die operation, and other industrial situations, important quality dimensions cannot be measured on a continuous scale, and manufactured parts are classified into groups by using a step gauge. This paper proposes a version of exponentially weighted moving average (EWMA) control charts that are applicable to monitoring the grouped data for process shifts. The run length properties of this new grouped data EWMA chart are compared with similar results previously obtained for EWMA charts for variables data and with those for cumulative sum (CUSUM) schemes based on grouped data. Grouped data EWMA charts are shown to be nearly as efficient as variables-based EWMA charts and are thus an attractive alternative when the collection of variables data is not feasible. In addition, grouped data EWMA charts are less affected by the discreteness that is inherent in grouped data than are grouped data CUSUM charts. In the metal fasteners application, grouped data EWMA charts were simple to implement and allowed the rapid detection of undesirable process shifts.

...read moreread less

44 citations

Journal Article•DOI•

A spatiotemporal auto-regressive moving average model for solar radiation

[...]

Chris A. Glasbey, David J. Allcroft

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a first-order spatiotemporal auto-regressive (STAR(1)) process with a firstorder neighbourhood structure and a Matern noise process was used to simulate realizations of energy output.

...read moreread less

Abstract: Summary. To investigate the variability in energy output from a network of photovoltaic cells, solar radiation was recorded at 10 sites every 10 min in the Pentland Hills to the south of Edin burgh. We identify spatiotemporal auto-regressive moving average models as the most appro priate to address this problem. Although previously considered computationally prohibitive to work with, we show that by approximating using toroidal space and fitting by matching auto correlations, calculations can be substantially reduced. We find that a first-order spatiotemporal auto-regressive (STAR(1)) process with a first-order neighbourhood structure and a Matern noise process provide an adequate fit to the data, and we demonstrate its use in simulating realizations of energy output.

...read moreread less

41 citations

Journal Article•DOI•

A new mixture model for capture heterogeneity

[...]

Byron J. T. Morgan¹, Martin S. Ridout¹•Institutions (1)

University of Kent¹

01 Sep 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a mixture of binomial and beta-binomial distributions for estimating the size of closed populations is proposed, which is applied to several real capture-recapture data sets and provides a convenient, objective framework for model selection.

...read moreread less

Abstract: Summary. We propose a mixture of binomial and beta–binomial distributions for estimating the size of closed populations. The new mixture model is applied to several real capture–recapture data sets and is shown to provide a convenient, objective framework for model selection. The new model is compared with three alternative models in a simulation study, and the results shed light on the general performance of models in this area. The new model provides a robust flexible analysis, which automatically deals with small capture probabilities.

...read moreread less

Journal Article•DOI•

Modelling cell generation times by using the tempered stable distribution

[...]

Karen J. Palmer¹, Martin S. Ridout¹, Byron J. T. Morgan¹•Institutions (1)

University of Kent¹

01 Sep 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors show that the family of tempered stable distributions has considerable potential for modelling cell generation time data and demonstrate how these distributions can improve on currently assumed models, including the gamma and inverse Gaussian distributions which arise as special cases.

...read moreread less

Abstract: We show that the family of tempered stable distributions has considerable potential for modelling cell generation time data. Several real examples illustrate how these distributions can improve on currently assumed models, including the gamma and inverse Gaussian distributions which arise as special cases. Our applications concentrate on the generation times of oligodendrocyte progenitor cells and the yeast Saccharomyces cerevisiae. Numerical inversion of the Laplace transform of the probability density function provides fast and accurate approximations to the tempered stable density, for which no closed form generally exists. We also show how the asymptotic population growth rate is easily calculated under a tempered stable model.

...read moreread less

Journal Article•DOI•

Modelling stochastic order in the analysis of receiver operating characteristic data : Bayesian non-parametric approaches

[...]

Timothy Hanson¹, Athanasios Kottas², Adam J. Branscum³•Institutions (3)

University of Minnesota¹, University of California, Santa Cruz², University of Kentucky³

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: Bayesian non‐parametric models that use Dirichlet process mixtures and mixtures of Polya trees for the analysis of continuous serologic data incorporate a stochastic ordering constraint for the distributions of serologic values for the infected and non‐infected populations.

...read moreread less

Abstract: Summary. The evaluation of the performance of a continuous diagnostic measure is a commonly encountered task in medical research. We develop Bayesian non-parametric models that use Dirichlet process mixtures and mixtures of Polya trees for the analysis of continuous serologic data. The modelling approach differs from traditional approaches to the analysis of receiver operating characteristic curve data in that it incorporates a stochastic ordering constraint for the distributions of serologic values for the infected and non-infected populations. Biologically such a constraint is virtually always feasible because serologic values from infected individuals tend to be higher than those for non-infected individuals. The models proposed provide data-driven inferences for the infected and non-infected population distributions, and for the receiver operating characteristic curve and corresponding area under the curve. We illustrate and compare the predictive performance of the Dirichlet process mixture and mixture of Polya trees approaches by using serologic data for Johne's disease in dairy cattle.

...read moreread less

Journal Article•DOI•

Nonparametric and non-linear models and data mining in time series: a case-study on the Canadian lynx data

[...]

T. C. Lin¹, Mohsen Pourahmadi¹•Institutions (1)

Northern Illinois University¹

28 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a restricted class of additive and projection pursuit regression (PPR) models were used to estimate the prediction error variance of the Canadian lynx data. But, they were found to have the smallest variance of all the models fitted to these data in the literature.

...read moreread less

Abstract: Nonparametric regression methods are used as exploratory tools for formulating, identifying and estimating non-linear models for the Canadian lynx data, which have attained benchmark status in the time series literature since the work of Moran in 1953. To avoid the curse of dimensionality in the nonparametric analysis of this short series with 114 observations, we confine attention to the restricted class of additive and projection pursuit regression (PPR) models and rely on the estimated prediction error variance to compare the predictive performance of various (non-) linear models. A PPR model is found to have the smallest (in-sample) estimated prediction error variance of all the models fitted to these data in the literature. We use a data perturbation procedure to assess and adjust for the effect of data mining on the estimated prediction error variances; this renders most models fitted to the lynx data comparable and nearly equivalent. However, on the basis of the mean-squared error of out-of-sample prediction error, the semiparametric model X t = 1.08+1.37X t-1 +f(X 1-2 )+e t and Tong's self-exciting threshold autoregressive model perform much better than the PPR and other models known for the lynx data.

...read moreread less

Journal Article•DOI•

A Bayesian statistical model for end member analysis of sediment geochemistry, incorporating spatial dependences

[...]

M. Palmer, Grant Douglas

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a Bayesian approach, utilizing an end member model, was developed to estimate the proportion of various sources of sediments in samples taken from a dam, which not only allows for the incorporation of prior knowledge about the geochemical compositions of the sources but also allows for correlation between spatially contiguous samples and the prediction of the sediment's composition at unsampled locations.

...read moreread less

Abstract: Summary. An important problem in the management of water supplies is identifying the sources of sediment. The paper develops a Bayesian approach, utilizing an end member model, to esti mate the proportion of various sources of sediments in samples taken from a dam. This approach not only allows for the incorporation of prior knowledge about the geochemical compositions of the sources (or end members) but also allows for correlation between spatially contiguous samples and the prediction of the sediment's composition at unsampled locations. Sediments that were sampled from the North Pine Dam in south-east Queensland, Australia, are analysed to illustrate the approach.

...read moreread less

Journal Article•DOI•

Measuring firm performance by using linear and non-parametric quantile regressions

[...]

Manuel Landajo¹, Javier De Andrés¹, Pedro Lorca¹•Institutions (1)

University of Oviedo¹

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors examined quantile regression models from the standpoint of their suitability to analyse company profitability and proposed linear and non-linear (B-spline) structures.

...read moreread less

Abstract: Summary. Quantile regression models are examined from the standpoint of their suitability to analyse company profitability. Some linear and non-linear (B-spline) structures are proposed. Linear conditional quantile models provide an intuitive framework which permits conventional statistical inference tools to be applied. Non-parametric spline-based quantile regression is a flexible approach, allowing a different grade of curvature for each conditional quantile, thus providing the possibility of capturing certain non-linear effects that are predicted by economic theory. The behaviour of these variants of the quantile framework is tested on a representative database, which was obtained from the Spanish book publishing industry. Our results confirm the usefulness of the quantile regression approach. Linear models seem to provide suitable descriptions for the behaviour of average performing firms, whereas non-parametric estimates provide the best fit for the extreme conditional quantiles (i.e. companies which exhibit the highest and the lowest performance in terms of profitability).

...read moreread less

Journal Article•DOI•

Modelling the spread in space and time of an airborne plant disease

[...]

Samuel Soubeyrand¹, Leonhard Held², Michael Höhle¹, Ivan Sache³•Institutions (3)

Ludwig Maximilian University of Munich¹, University of Zurich², Institut national de la recherche agronomique³

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a spatiotemporal model is developed to analyse epidemics of airborne plant diseases which are spread by spores, and the model describes the joint distribution of the occurrence and severity of the disease.

...read moreread less

Abstract: Summary. A spatiotemporal model is developed to analyse epidemics of airborne plant diseases which are spread by spores. The observations consist of measurements of the severity of disease at different times, different locations in the horizontal plane and different heights in the vegetal cover. The model describes the joint distribution of the occurrence and the severity of the disease. The three-dimensional dispersal of spores is modelled by combining a horizontal and a vertical dispersal function. Maximum likelihood combined with a parametric bootstrap is suggested to estimate the model parameters and the uncertainty that is attached to them. The spatiotemporal model is used to analyse a yellow rust epidemic in a wheatfield. In the analysis we pay particular attention to the selection and the estimation of the dispersal functions.

...read moreread less

Journal Article•DOI•

Bayesian classification of Neolithic tools

[...]

Petros Dellaportas¹•Institutions (1)

Athens University of Economics and Business¹

28 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: Bayesian classification is adopted to analyse data which raise the question whether the observed variability, e.g. the shape and dimensions of the tools, is related to their use, and resulting mixing densities provide evidence that the morphological dimensional variability among tools isrelated to the existence of these two tool groups.

...read moreread less

Abstract: The classification of Neolithic tools by using cluster analysis enables archaeologists to understand the function of the tools and the technological and cultural conditions of the societies that made them. In this paper, Bayesian classification is adopted to analyse data which raise the question whether the observed variability, e.g. the shape and dimensions of the tools, is related to their use. The data present technical difficulties for the practitioner, such as the presence of mixed mode data, missing data and errors in variables. These complications are overcome by employing a finite mixture model and Markov chain Monte Carlo methods. The analysis uses prior information which expresses the archaeologist's belief that there are two tool groups that are similar to contemporary adzes and axes. The resulting mixing densities provide evidence that the morphological dimensional variability among tools is related to the existence of these two tool groups.

...read moreread less

Journal Article•DOI•

Analysis of interval‐censored data from clustered multistate processes: application to joint damage in psoriatic arthritis

[...]

Rinku Sutradhar, Richard J. Cook¹•Institutions (1)

University of Waterloo¹

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, a conditionally Markov multiplicative intensity model is described for the analysis of clustered progressive multistate processes under intermittent observation, motivated by a longterm prospective study of patients with psoriatic arthritis with the aim of characterizing progression of joint damage via an irreversible four-state model.

...read moreread less

Abstract: Summary. A conditionally Markov multiplicative intensity model is described for the analysis of clustered progressive multistate processes under intermittent observation. The model is motivated by a long-term prospective study of patients with psoriatic arthritis with the aim of characterizing progression of joint damage via an irreversible four-state model. The model accommodates heterogeneity in transition rates between different individuals and correlation in transition rates within patients. To do this we introduce subject-specific multivariate random effects in which each component acts multiplicatively on a specific transition intensity. Through the association between the components of the random effect, correlations in transition intensities are accommodated. A Monte Carlo EM algorithm is developed for estimation, which features closed form expressions for estimators at each M-step.

...read moreread less

Journal Article•DOI•

Statistical methods to evaluate health effects associated with major sources of air pollution: a case-study of breathing patterns during exposure to concentrated Boston air particles

[...]

Margaret C. Nikolov¹, Brent A. Coull¹, Paul J. Catalano¹, Edgar A. Diaz¹, John J. Godleski¹ - Show less +1 more•Institutions (1)

Harvard University¹

01 Jun 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a case-study evaluating the source-specific effects of particulate matter on respiratory function was conducted using a structural equation approach, which assessed the effect of different receptor models on the estimated sourcespecific effects for univariate respiratory response.

...read moreread less

Abstract: Summary. We conduct a case-study evaluating the source-specific effects of particulate matter on respiratory function. Using a structural equation approach, we assess the effect of different receptor models on the estimated source-specific effects for univariate respiratory response. Furthermore, we extend the structural equation model by placing a factor analysis model on the response to represent the measured respiratory responses in terms of underlying respiratory patterns. We estimate the particulate matter source-specific effects on respiratory rate, accentuated normal breathing and airway irritation and find a strong increase in airway irritation that is associated with exposure to motor vehicle particulate matter.

...read moreread less

Journal Article•DOI•

A statistical model for the temporal pattern of individual automated teller machine withdrawals

[...]

Adam R. Brentnall¹, Martin J. Crowder¹, David J. Hand¹•Institutions (1)

Imperial College London¹

01 Feb 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A random-effects point process model for automated teller machine withdrawals is developed that may be used to predict behaviour for an individual and to assess when state changes in individual behaviour have occurred and as a description of behaviour for a portfolio of accounts.

...read moreread less

Abstract: Summary. Models of consumer behaviour that are based purely on empirical relationships in data can perform well in the short term but often degrade rapidly with changing circumstances. Superior longer-term performance can sometimes be attained by developing models for the deeper processes underlying the consumer behaviour. We develop a random-effects point pro cess model for automated teller machine withdrawals. Estimation, prediction and computational issues are discussed. The model may be used to predict behaviour for an individual and to assess when state changes in individual behaviour have occurred and as a description of behaviour for a portfolio of accounts.

...read moreread less

Journal Article•DOI•

Variance estimation in complex survey sampling for generalized linear models

[...]

Sundar Natarajan¹, Stuart R. Lipsitz², Garrett M. Fitzmaurice², Charity G. Moore³, René Gonin⁴ - Show less +1 more•Institutions (4)

New York University¹, Brigham and Women's Hospital², University of North Carolina at Chapel Hill³, Westat⁴

01 Feb 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, a non-standard regression model for complex survey data is proposed to obtain consistent regression parameter and variance estimates; the method proposed can be implemented within any standard sample survey package.

...read moreread less

Abstract: Summary. Complex survey sampling is often used to sample a fraction of a large finite population. In general, the survey is conducted so that each unit (e.g. subject) in the sample has a different probability of being selected into the sample. For generalizability of the sample to the population, both the design and the probability of being selected into the sample must be incorporated in the analysis. In this paper we focus on non-standard regression models for complex survey data. In our motivating example, which is based on data from the Medical Expenditure Panel Survey, the outcome variable is the subject's ‘total health care expenditures in the year 2002’. Previous analyses of medical cost data suggest that the variance is approximately equal to the mean raised to the power of 1.5, which is a non-standard variance function. Currently, the regression parameters for this model cannot be easily estimated in standard statistical software packages. We propose a simple two-step method to obtain consistent regression parameter and variance estimates; the method proposed can be implemented within any standard sample survey package. The approach is applicable to complex sample surveys with any number of stages.

...read moreread less

Journal Article•DOI•

Simple non-parametric estimators for unemployment duration analysis

[...]

Laura Wichert¹, Ralf A. Wilke²•Institutions (2)

University of Konstanz¹, University of Nottingham²

01 Feb 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, an extension of conventional univariate Kaplan-Meier-type estimators for the hazard rate and the survivor function to multivariate censored data with a censored random regressor was proposed.

...read moreread less

Abstract: Summary. We consider an extension of conventional univariate Kaplan-Meier-type estimators for the hazard rate and the survivor function to multivariate censored data with a censored random regressor. It is an Akritas-type estimator which adapts the non-parametric conditional hazard rate estimator of Beran to more typical data situations in applied analysis. We show with simulations that the estimator has nice finite sample properties and our implementation appears to be fast. As an application we estimate non-parametric conditional quantile functions with German administrative unemployment duration data.

...read moreread less

Journal Article•DOI•

Modelling price paths in on‐line auctions: smoothing sparse and unevenly sampled curves by using semiparametric mixed models

[...]

Florian Reithinger¹, Wolfgang Jank², Gerhard Tutz¹, Galit Shmueli²•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Maryland, College Park²

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A novel way of modelling price paths in eBay's on‐line auctions by using functional data analysis using a semiparametric mixed model with boosting to recover the functional object and shows that the resulting functional objects are conceptually more appealing.

...read moreread less

Abstract: cult if the data are irregularly distributed. We present a new approach that can overcome this challenge. The approach is based on the ideas of mixed models. Specifically, we propose a semiparametric mixed model with boosting to recover the functional object. As well as being able to handle sparse and unevenly distributed data, the model also results in conceptually more meaningful functional objects. In particular, we motivate our method within the framework of eBay's on-line auctions. On-line auctions produce monotonic increasing price curves that are often correlated across auctions. The semiparametric mixed model accounts for this correlation in a parsimonious way. It also manages to capture the underlying monotonic trend in the data without imposing model constraints. Our application shows that the resulting functional objects are conceptually more appealing. Moreover, when used to forecast the outcome of an on-line auction, our approach also results in more accurate price predictions compared with standard approaches. We illustrate our model on a set of 183 closed auctions for Palm M515 personal digital assistants.

...read moreread less

Journal Article•DOI•

Receiver operating characteristic surfaces in the presence of verification bias

[...]

Yueh Yun Chi¹, Xiao-Hua Zhou²•Institutions (2)

University of Florida¹, University of Washington²

01 Feb 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The proposed method is applied to compare the diagnostic accuracy between mini‐mental state examination and clinical evaluation of dementia, in discriminating between three disease states of Alzheimer's disease.

...read moreread less

Abstract: on initial test measurements and induces verification bias in the assessment. We propose a non parametric likelihood-based approach to construct the empirical ROC surface in the presence of differential verification, and to estimate the volume under the ROC surface. Estimators of the standard deviation are derived by both the Fisher information and the jackknife method, and their relative accuracy is evaluated in an extensive simulation study. The methodology is further extended to incorporate discrete baseline covariates in the selection process, and to compare the accuracy of a pair of diagnostic tests. We apply the proposed method to compare the diag nostic accuracy between mini-mental state examination and clinical evaluation of dementia, in discriminating between three disease states of Alzheimer's disease.

...read moreread less

Journal Article•DOI•

A Bayesian model for longitudinal count data with non-ignorable dropout.

[...]

Niko Kaciroti¹, Trivellore E. Raghunathan¹, M. Anthony Schork¹, Noreen M. Clark¹•Institutions (1)

University of Michigan¹

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A pattern–mixture model is developed to evaluate the outcome of intervention on the number of hospitalizations with non‐ignorable dropouts and has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis.

...read moreread less

Abstract: Summary. Asthma is an important chronic disease of childhood. An intervention programme for managing asthma was designed on principles of self-regulation and was evaluated by a randomized longitudinal study.The study focused on several outcomes, and, typically, missing data remained a pervasive problem. We develop a pattern–mixture model to evaluate the outcome of intervention on the number of hospitalizations with non-ignorable dropouts. Pattern–mixture models are not generally identifiable as no data may be available to estimate a number of model parameters. Sensitivity analyses are performed by imposing structures on the unidentified parameters.We propose a parameterization which permits sensitivity analyses on clustered longitudinal count data that have missing values due to non-ignorable missing data mechanisms. This parameterization is expressed as ratios between event rates across missing data patterns and the observed data pattern and thus measures departures from an ignorable missing data mechanism. Sensitivity analyses are performed within a Bayesian framework by averaging over different prior distributions on the event ratios. This model has the advantage of providing an intuitive and flexible framework for incorporating the uncertainty of the missing data mechanism in the final analysis.

...read moreread less

Journal Article•DOI•

Dual response surface optimization with hard-to-control variables for sustainable gasifier performance

[...]

R.L.J. Coetzer¹, R.F. Rossouw¹, Dennis K.J. Lin²•Institutions (2)

Sasol¹, Pennsylvania State University²

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, the authors discussed the application of statistical robustness studies as a method for determining the optimal settings of process variables that might be hard-to-control during normal operation.

...read moreread less

Abstract: Summary. Dual response surface optimization of the Sasol–Lurgi fixed bed dry bottom gasification process was carried out by performing response surface modelling and robustness studies on the process variables of interest from a specially equipped full-scale test gasifier. Coal particle size distribution and coal composition are considered as hard-to-control variables during normal operation. The paper discusses the application of statistical robustness studies as a method for determining the optimal settings of process variables that might be hard to control during normal operation. Several dual response surface strategies are evaluated for determining the optimal process variable conditions. It is shown that a narrower particle size distribution is optimal for maximizing gasification performance which is robust against the variability in coal composition.

...read moreread less

Journal Article•DOI•

Generalized monotonic functional mixed models with application to modelling normal tissue complications

[...]

Matthew J. Schipper, Jeremy M. G. Taylor¹, Xihong Lin²•Institutions (2)

University of Michigan¹, Harvard University²

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A generalized monotonic functional mixed model is proposed to study the dose effect on a clinical outcome by estimating this weight function non‐parametrically by using splines and subject to the monotonicity constraint, while allowing for overdispersion and correlation of multiple obervations within the same subject.

...read moreread less

Abstract: Summary. Normal tissue complications are a common side effect of radiation therapy. They are the consequence of the dose of radiation that is received by the normal tissue surrounding the site of the tumour. Within a specified organ each voxel receives a certain dose of radiation, leading to a distribution of doses over the organ. It is often not known what aspect of the dose distribution drives the presence and severity of the complications. A summary measure of the dose distribution can be obtained by integrating a weighting function of dose (w.d/ )o ver the density of dose. For biological reasons the weight function should be monotonic. We propose a generalized monotonic functional mixed model to study the dose effect on a clinical outcome by estimating this weight function non-parametrically by using splines and subject to the monotonicity constraint, while allowing for overdispersion and correlation of multiple obervations within the same subject. We illustrate our method with data from a head and neck cancer study in which the irradiation of the parotid gland results in loss of saliva flow.

...read moreread less

Journal Article•DOI•

Modelling mercury deposition through latent space–time processes

[...]

Ana G. Rappold¹, Alan E. Gelfand², David M. Holland¹•Institutions (2)

United States Environmental Protection Agency¹, Duke University²

01 Apr 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: A space-time process model for total wet mercury deposition that enables spatial interpolation and temporal prediction of deposition as well as aggregation in space or time to see patterns and trends in deposition.

...read moreread less

Abstract: Summary The paper provides a space–time process model for total wet mercury deposition Key methodological features that are introduced include direct modelling of deposition rather than of expected deposition, the utilization of precipitation information (there is no deposition without precipitation) without having to construct a precipitation model and the handling of point masses at 0 in the distributions of both precipitation and deposition The result is a specification that enables spatial interpolation and temporal prediction of deposition as well as aggregation in space or time to see patterns and trends in deposition We use weekly deposition monitoring data from the National Atmospheric Deposition Program–Mercury Deposition Network for 2003 restricted to the eastern USA and Canada Our spatiotemporal hierarchical model allows us to interpolate to arbitrary locations and, hence, to an arbitrary grid, enabling weekly deposition surfaces (with associated uncertainties) for this region It also allows us to aggregate weekly depositions at coarser, quarterly and annual, temporal levels

...read moreread less

Journal Article•DOI•

Design and analysis of clustered, unmatched resource selection studies

[...]

Robert Graham Clark¹, Tanya C. Strevens¹•Institutions (1)

University of Wollongong¹

01 Dec 2008-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this paper, the authors investigate the use of marginal logistic models with robust variance estimators, by using a study of Australian bush rats as a case-study, and find that this is not feasible in studies where the sample of available points cannot readily be matched to specific animals.

...read moreread less

Abstract: Summary. Studies which measure animals’ positions over time are a vital tool in understanding the process of resource selection by animals. By comparing a sample of locations that are used by animals with a sample of available points, the types of locations that are preferred by animals can be analysed by using logistic regression. Random-effects logistic regression has been proposed to deal with the repeated measurements that are observed for each animal, but we find that this is not feasible in studies where the sample of available points cannot readily be matched to specific animals. Instead, we investigate the use of marginal logistic models with robust variance estimators, by using a study of Australian bush rats as a case-study. Simulation is used to check the properties of the approach and to explore alternative designs.

...read moreread less