scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 1995"


Journal ArticleDOI
TL;DR: In this article, the authors propose a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.
Abstract: Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. They have therefore not been available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed. This paper proposes a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive. It should therefore have wide applicability in model determination problems. The methodology is illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.

6,188 citations


Journal ArticleDOI
TL;DR: In this paper, a nonparametric framework for causal inference is proposed, in which diagrams are queried to determine if the assumptions available are sufficient for identifying causal effects from nonexperimental data.
Abstract: SUMMARY The primary aim of this paper is to show how graphical models can be used as a mathematical language for integrating statistical and subject-matter information. In particular, the paper develops a principled, nonparametric framework for causal inference, in which diagrams are queried to determine if the assumptions available are sufficient for identifying causal effects from nonexperimental data. If so the diagrams can be queried to produce mathematical expressions for causal effects in terms of observed distributions; otherwise, the diagrams can be queried to suggest additional observations or auxiliary experiments from which the desired inferences can be obtained.

2,209 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the properties of a semiparametric method for estimating the dependence parameters in a family of multivariate distributions and proposed an estimator, obtained as a solution of a pseudo-likelihood equation, which is consistent, asymptotically normal and fully efficient at independence.
Abstract: SUMMARY This paper investigates the properties of a semiparametric method for estimating the dependence parameters in a family of multivariate distributions. The proposed estimator, obtained as a solution of a pseudo-likelihood equation, is shown to be consistent, asymptotically normal and fully efficient at independence. A natural estimator of its asymptotic variance is proved to be consistent. Comparisons are made with alternative semiparametric estimators in the special case of Clayton's model for association in bivariate data.

1,280 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss standard and intrinsic autoregressions and describe how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification.
Abstract: SUMMARY Gaussian conditional autoregressions have been widely used in spatial statistics and Bayesian image analysis, where they are intended to describe interactions between random variables at fixed sites in Euclidean space. The main appeal of these distributions is in the Markovian interpretation of their full conditionals. Intrinsic autoregressions are limiting forms that retain the Markov property. Despite being improper, they can have advantages over the standard autoregressions, both conceptually and in practice. For example, they often avoid difficulties in parameter estimation, without apparent loss, or exhibit appealing invariances, as in texture analysis. However, on small arrays and in nonlattice applications, both forms of autoregression can lead to undesirable second-order characteristics, either in the variables themselves or in contrasts among them. This paper discusses standard and intrinsic autoregressions and describes how the problems that arise can be alleviated using Dempster's (1972) algorithm or an appropriate modification. The approach represents a partial synthesis of standard geostatistical and Gaussian Markov random field formulations. Some nonspatial applications are also mentioned.

737 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the optimal block size depends significantly on context, being equal to n"/3, n'/4 and nll5 in the cases of variance or bias estimation, estimation of a onesided distribution function, and estimation of two-sided distribution function.
Abstract: SUMMARY We address the issue of optimal block choice in applications of the block bootstrap to dependent data It is shown that optimal block size depends significantly on context, being equal to n"/3, n"/4 and nll5 in the cases of variance or bias estimation, estimation of a onesided distribution function, and estimation of a two-sided distribution function, respectively A clear intuitive explanation of this phenomenon is given, together with outlines of theoretical arguments in specific cases It is shown that these orders of magnitude of block sizes can be used to produce a simple, practical rule for selecting block size empirically That technique is explored numerically

635 citations


Journal ArticleDOI
TL;DR: The simulation smoother is introduced, which draws from the multivariate posterior distribution of the disturbances of the model, so avoiding the degeneracies inherent in state samplers.
Abstract: SUMMARY Recently suggested procedures for simulating from the posterior density of states given a Gaussian state space time series are refined and extended. We introduce and study the simulation smoother, which draws from the multivariate posterior distribution of the disturbances of the model, so avoiding the degeneracies inherent in state samplers. The technique is important in Gibbs sampling with non-Gaussian time series models, and for performing Bayesian analysis of Gaussian time series.

587 citations


Journal ArticleDOI
TL;DR: In this paper, a simple kernel procedure based on marginal integration that estimates the relevant univariate quantity in both additive and multiplicative nonparametric regression is defined, which is used as a preliminary diagnostic tool.
Abstract: SUMMARY We define a simple kernel procedure based on marginal integration that estimates the relevant univariate quantity in both additive and multiplicative nonparametric regression Nonparametric regression is frequently used as a preliminary diagnostic tool It is a convenient method of summarising the relationship between a dependent and a univariate independent variable However, when the explanatory variables are multidimensional, these methods are less satisfactory In particular, the rate of convergence of standard estimators is poorer, while simple plots are not available to aid model selection There are a number of simplifying structures that have been used to avoid these problems These include the regression tree structure of Gordon & Olshen (1980), the projection pursuit model of Friedman & Stuetzle (1981), semiparametric models such as considered

553 citations


Journal ArticleDOI
TL;DR: This paper derived general expressions for the asymptotic biases in three approximate estimators of regression coefficients and variance component, for small values of the variance component in generalised linear mixed models with canonical link function and a single source of extraneous variation.
Abstract: SUMMARY General expressions are derived for the asymptotic biases in three approximate estimators of regression coefficients and variance component, for small values of the variance component, in generalised linear mixed models with canonical link function and a single source of extraneous variation. The estimators involve first and second order Laplace expansions of the integrated likelihood and a related procedure known as penalised quasilikelihood. Numerical studies of a series of matched pairs of binary outcomes show that the first order estimators of the variance component are seriously biased. Easily computed correction factors produce satisfactory estimators of small variance components, comparable to those obtained with a second order Laplace expansion, and markedly improve the asymptotic performance for larger values. For a series of matched pairs of binomial observations, the variance correction factors rapidly approach one as the binomial denominators increase. These results greatly extend the range of parameter values for which the approximate estimation procedures have satisfactory asymptotic properties.

494 citations


Journal ArticleDOI
TL;DR: In this paper, a class of semi-parametric transformation models, under which an unknown transformation of the survival time is linearly related to the covariates with various completely specified error distributions, are considered.
Abstract: SUMMARY In this paper we consider a class of semi-parametric transformation models, under which an unknown transformation of the survival time is linearly related to the covariates with various completely specified error distributions. This class of regression models includes the proportional hazards and proportional odds models. Inference procedures derived from a class of generalised estimating equations are proposed to examine the covariate effects with censored observations. Numerical studies are conducted to investigate the properties of our proposals for practical sample sizes. These transformation models, coupled with the new simple inference procedures, provide many useful alternatives to the Cox regression model in survival analysis.

450 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider two estimators of the wavelet variance: the first based upon the discrete wavelet transform, and the second, called the maximal-overlap estimator, based upon a filtering interpretation of wavelets.
Abstract: SUMMARY The wavelet variance decomposes the variance of a time series into components associated with different scales. We consider two estimators of the wavelet variance: the first based upon the discrete wavelet transform, and the second, called the maximal-overlap estimator, based upon a filtering interpretation of wavelets. We determine the large sample distribution for both estimators and show that the maximal-overlap estimator is more efficient for a class of processes of interest in the physical sciences. We discuss methods for determining an approximate confidence interval for the wavelet variance. We demonstrate through Monte Carlo experiments that the large sample distribution for the maximal-overlap estimator is a reasonable approximation even for the moderate sample size of 128 observations. We apply our proposed methodology to a series of observations related to vertical shear in the ocean.

447 citations


Journal ArticleDOI
TL;DR: In this paper, a method is proposed to detect jumps and sharp cusps in a function which is observed with noise, by checking if the wavelet transformation of the data has significantly large absolute values across fine scale levels.
Abstract: SUMMARY A method is proposed to detect jumps and sharp cusps in a function which is observed with noise, by checking if the wavelet transformation of the data has significantly large absolute values across fine scale levels. Asymptotic theory is established and practical implementation is discussed. The method is tested on simulated examples, and applied to stock market return data. The analysis of change-points, which describe sudden localised changes, has recently found increasing interest. Change-points can be used to model practical problems arising in fields such as quality control, economics, medicine, signal and image processing, and physical sciences. For example, in electroencephalogram signals, sharp cusps exhibit the accelerations and decelerations in the beating of the hearts. Many practical problems like this involve functions which have jumps and sharp cusps. The recently developed theory of wavelets has drawn much attention from both math- ematicians, statisticians and engineers. In the seminal work of Donoho (1993), Donoho & Johnstone (1994, 1995a,b) and Donoho, Johnstone et al. (1995), orthonormal bases of compactly supported wavelets have been used to estimate functions. The theory of wavelets permits decomposition of functions into localised oscillating components. This is an ideal tool to study localised changes such as jumps and sharp cusps in one dimension as well as several dimensions. Unlike traditional smoothing methods based on a fixed spatial scale, the wavelet method is a multiresolution approach and has local adaptivity. In this

Journal ArticleDOI
TL;DR: In this paper, the authors present simple hierarchical centring reparametrisations that often give improved convergence for a broad class of normal linear mixed models, including the Laird-Ware model, and a general structure for hierarchically nested linear models.
Abstract: SUMMARY The generality and easy programmability of modern sampling-based methods for maximisation of likelihoods and summarisation of posterior distributions have led to a tremendous increase in the complexity and dimensionality of the statistical models used in practice. However, these methods can often be extremely slow to converge, due to high correlations between, or weak identifiability of, certain model parameters. We present simple hierarchical centring reparametrisations that often give improved convergence for a broad class of normal linear mixed models. In particular, we study the two-stage hierarchical normal linear model, the Laird-Ware model for longitudinal data, and a general structure for hierarchically nested linear models. Using analytical arguments, simulation studies, and an example involving clinical markers of acquired immune deficiency syndrome (AIDS), we indicate when reparametrisation is likely to provide substantial gains in efficiency.

Journal ArticleDOI
TL;DR: In this paper, Liang and Zeger extended the use of generalised linear models to repeated measures data and used a working correlation matrix to avoid the uncertainty of definition of the parameters involved in this matrix.
Abstract: SUMMARY In a seminal paper Liang & Zeger (1986) extended the use of generalised linear models to repeated measures data. They based the analysis on specifications for the means and variances of the observations, as usual for generalised linear models, but showed how specifications for the correlations between measurements made on the same unit could be avoided by using a 'working' correlation matrix. In some cases the parameters involved in this matrix are subject to an uncertainty of definition which can lead to a breakdown of the asymptotic properties of the estimators.

Journal ArticleDOI
TL;DR: In this paper, the authors show that if the copula is known, then the competing risks data are sufficient to identify the marginal survival functions and construct a suitable estimator, which is consistent and reduces to the Kaplan-Meier estimator when death and censoring times are independent.
Abstract: SUMMARY When time to death and time to censoring are associated one may be appreciably misled when the marginal survival functions are estimated using the product-limit estimators, which assume independent censoring. If no assumption about the relationship between the two times is made, the marginal survival functions are not identifiable. A natural function that defines the association between the two random variables is the copula. We show that if this function is known then the competing risks data are sufficient to identify the marginal survival functions and construct a suitable estimator. This estimator is consistent and reduces to the Kaplan-Meier estimator when death and censoring times are independent. This statistic can be used to provide bounds on the marginal survival functions based on a range of possible associations between the competing risks.

Journal ArticleDOI
TL;DR: In this paper, the authors consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariates of interest have been ascertained.
Abstract: SUMMARY We consider regression analysis when incomplete or auxiliary covariate data are available for all study subjects and, in addition, for a subset called the validation sample, true covariate data of interest have been ascertained. The term auxiliary data refers to data not in the regression model, but thought to be informative about the true missing covariate data of interest. We discuss a method which is nonparametric with respect to the association between available and missing data, allows missingness to depend on available response and covariate values, and is applicable to both cohort and case-control study designs. The method previously proposed by Flanders & Greenland (1991) and by Zhao & Lipsitz (1992) is generalised and asymptotic theory is derived. Our expression for the asymptotic variance of the estimator provides intuition regarding performance of the method. Optimal sampling strategies for the validation set are also suggested by the asymptotic results.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a semiparametric estimation procedure for estimating the regression of an outcome Y, measured at the end of a fixed follow-up period, on baseline explanatory variables X, measured prior to start of followup, in the presence of dependent censoring given X.
Abstract: SUMMARY We propose a semiparametric estimation procedure for estimating the regression of an outcome Y, measured at the end of a fixed follow-up period, on baseline explanatory variables X, measured prior to start of follow-up, in the presence of dependent censoring given X. The proposed estimators are consistent when the data are 'missing at random' but not 'missing completely at random' (Rubin, 1976), and do not require full specification of the complete data likelihood. Specifically, we assume that the probability of censoring at time t is independent of the outcome Y conditional on the recorded history up to t of a vector of time-dependent covariates that are correlated with Y. Our estimators can be used to adjust for dependent censoring and nonrandom noncompliance in randomised trials studying the effect of a treatment on the mean of a response variable of interest. Even with independent censoring, our methods allow the investigator to increase efficiency by exploiting the correlation of the outcome with a vector of time-dependent covariates.

Journal ArticleDOI
TL;DR: In this paper, a new method for bias reduction in nonparametric density estimation is proposed, which is a simple, two-stage multiplicative bias correction, and its theoretical properties are investigated, and simulations indicate its practical potential.
Abstract: A new method for bias reduction in nonparametric density estimation is proposed. The method is a simple, two-stage multiplicative bias correction. Its theoretical properties are investigated, and simulations indicate its practical potential. The method is easy to compute and to analyse, and extends simply to multivariate and other estimation problems.

Journal ArticleDOI
TL;DR: In this paper, a weighted partial likelihood estimating equation is proposed for the estimation of marginal hazard ratio parameters based on correlated failure time data, and asymptotic distribution theory is derived for the solution to such equations using martingale convergence results and inverse function theory.
Abstract: SUMMARY Weighted partial likelihood estimating equations are proposed for the estimation of marginal hazard ratio parameters based on correlated failure time data. Asymptotic distribution theory is derived for the solution to such equations using martingale convergence results and inverse function theory. Simulation studies and theoretical efficiency calculations indicate that the inclusion of weights in the estimating equation produces important efficiency gains only if the dependencies among the failure times are strong.

Journal ArticleDOI
TL;DR: In this paper, the asymptotic critical values of the MOSUM test with recursive residuals and with least-squares residuals were derived from already existing tables for the moving-estimate test.
Abstract: SUMMARY In this paper, tests for structural change based on moving sums (MosuMs) of recursive and least-squares residuals are investigated. We obtain and tabulate the asymptotic critical values of the MOSUM test with recursive residuals and show that the asymptotic critical values of the MOSUM test with least-squares residuals can easily be obtained from already existing tables for the moving-estimates test. We also show that these MOSUM tests are consistent and have nontrivial local power against a general class of alternatives. Our simulations further indicate that the proposed MOSUM tests can complement other tests when there is a single structural change and have power advantage when there are certain double structural changes.

Journal ArticleDOI
TL;DR: In this article, an estimated partial likelihood method is proposed for estimating relative risk parameters, which is an extension of the estimated likelihood regression analysis method for uncensored data (Pepe, 1992; Pepe & Fleming, 1991).
Abstract: SUMMARY We consider the problem of missing covariate data in the context of censored failure time relative risk regression. Auxiliary covariate data, which are considered informative about the missing data but which are not explicitly part of the relative risk regression model, may be available. Full covariate information is available for a validation set. An estimated partial likelihood method is proposed for estimating relative risk parameters. This method is an extension of the estimated likelihood regression analysis method for uncensored data (Pepe, 1992; Pepe & Fleming, 1991). A key feature of the method is that it is nonparametric with respect to the association between the missing and observed, including auxiliary, covariate components. Asymptotic distribution theory is derived for the proposed estimated partial likelihood estimator in the case where the auxiliary or mismeasured covariates are categorical. Asymptotic efficiencies are calculated for exponential failure times using an exponential relative risk model. The estimated partial likelihood estimator compares favourably with a fully parametric maximum likelihood analysis. Comparisons are also made with a standard partial likelihood analysis which ignores the incomplete observations. Important efficiency gains can be made with the estimated partial likelihood method. Small sample properties are investigated through simulation studies.

Journal ArticleDOI
TL;DR: Pooled testing not only reduces the probability of this occurring, but also improves the accuracy of the estimator during screening for the human immunodeficiency virus.
Abstract: SUMMARY When screening for a rare disease the large number of expected false positives makes it difficult to obtain a positive estimate of prevalence using current screening methodology. Pooled testing not only reduces the probability of this occurring, but also improves the accuracy of the estimator. These findings are illustrated by screening for the human immunodeficiency virus.

Journal ArticleDOI
TL;DR: In this article, the authors present a counter-matching design for sampling from risk sets in which each sampled risk set includes the failure and a random sample of controls from "sampling strata" defined by exposure information available for all cohort subjects.
Abstract: SUMMARY The counter-matching design for sampling from risk sets is presented in which each sampled risk set includes the failure and a random sample of controls from 'sampling strata' defined by exposure information available for all cohort subjects. Asymptotic relative efficiency comparisons indicate that this type of sampling has superior efficiency to simple nested case-control sampling in situations of practical interest. A simple extension of the method is given which allows for nonrepresentative sampling of failures. Analysis of data from counter-matched studies may be performed using standard conditional logistic likelihood software which allows for an 'offset' in the model.

Journal ArticleDOI
TL;DR: In this paper, a random effects model is used to derive mean and variance models for estimated disease rates and covariate data from random samples of individuals from each of several cohorts, which are then developed by replacing cohort covariate averages by corresponding sample averages.
Abstract: SUMMARY Statistical methods are proposed for estimating relative rate parameters, based on estimated disease rates and covariate data from random samples of individuals from each of several cohorts. A random effects model is used to derive mean and variance models for estimated disease rates. Estimating equations for relative rate parameters are then developed by replacing cohort covariate averages by corresponding sample averages. The asymptotic distribution of regression parameter estimates is derived, and the asymptotic bias is shown to be small, even if covariates are contaminated by classical random measurement errors, provided the covariate sample size in each cohort is not small. Simulation studies, motivated by international data on diet and breast cancer, provide insights into the properties of the proposed estimators.

Journal ArticleDOI
TL;DR: In this article, the authors derive the differential equation that a prior must satisfy if the posterior probability of a one-sided credibility interval for a parametric function and its frequentist probability agree up to O(n-1).
Abstract: SUMMARY We derive the differential equation that a prior must satisfy if the posterior probability of a one-sided credibility interval for a parametric function and its frequentist probability agree up to O(n-1). This equation turns out to be identical with Stein's equation for a slightly different problem, for which also our method provides a rigorous justification. Our method is different in details from Stein's but similar in spirit to Dawid (1991) and Bickel & Ghosh (1990). Some examples are provided.

Journal ArticleDOI
TL;DR: In this article, Bayesian residuals have continuous-valued posterior distributions which can be graphed to learn about outlying observations for binary regression data and can be used for outlier detection.
Abstract: SUMMARY In a binary response regression model, classical residuals are difficult to define and interpret due to the discrete nature of the response variable In contrast, Bayesian residuals have continuous-valued posterior distributions which can be graphed to learn about outlying observations Two definitions of Bayesian residuals are proposed for binary regression data Plots of the posterior distributions of the basic 'observed - fitted' residuals can be helpful in outlier detection Alternatively, the notion of a tolerance random variable can be used to define latent data residuals that are functions of the tolerance random variables and the parameters In the probit setting, these residuals are attractive in that a priori they are a sample from a standard normal distribution, and therefore the corresponding posterior distributions are easy to interpret These residual definitions are illustrated in examples and contrasted with classical outlier detection methods for binary data

Journal ArticleDOI
TL;DR: In this paper, the authors introduce tests of linearity for time series based on nonparametric estimates of the conditional mean and the conditional variance, which are compared to a number of parametric tests and to non-parametric tests based on the bispectrum.
Abstract: SUMMARY We introduce tests of linearity for time series based on nonparametric estimates of the conditional mean and the conditional variance. The tests are compared to a number of parametric tests and to nonparametric tests based on the bispectrum. Asymptotic expressions give bad approximations, and the null distribution under linearity is constructed using resampling of the best linear approximation. The new tests perform well on the examples tested.

Journal ArticleDOI
TL;DR: In this article, an adaptive procedure based on estimating the optimal member of the class of score statistics is explored for interval-censored data, where survival times are not known exactly exactly exactly, but are only known to have occurred between intermittent examination times.
Abstract: SUMMARY Interval-censored data result when survival times are not known exactly, but are only known to have occurred between intermittent examination times. Here the accelerated failure time model is treated for interval-censored data. A class of score statistics that may be used for estimation and confidence procedures is proposed. An adaptive procedure based on estimating the optimal member of the class of score statistics is explored.

Journal ArticleDOI
TL;DR: A Gibbs sampling algorithm is implemented from which Bayesian estimates and credible intervals for survival and movement probabilities are derived and Convergence of the algorithm is proved using a duality principle.
Abstract: SUMMARY The Arnason-Schwarz model is usually used for estimating survival and movement probabilities of animal populations from capture-recapture data. The missing data structure of this capture-recapture model is exhibited and summarised via a directed graph representation. Taking advantage of this structure we implement a Gibbs sampling algorithm from which Bayesian estimates and credible intervals for survival and movement probabilities are derived. Convergence of the algorithm is proved using a duality principle. We illustrate our approach through a real example.

Journal ArticleDOI
TL;DR: In this article, a linearization variance estimator was proposed to make more complete use of the sample data than a standard one, and a jackknife variance estimation and its linearised version were also obtained.
Abstract: SUMMARY Ratio estimation under two-phase simple random sampling is studied. A new linearisation variance estimator that makes more complete use of the sample data than a standard one is proposed. A jackknife variance estimator and its linearised version are also obtained. Unconditional and conditional repeated sampling properties of these variance estimators are studied through simulation. Applications to 'mass' imputation under two-phase sampling and deterministic imputation for missing data are also given.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method to analyse competing risks survival data when failure types are missing for some individuals, based on a standard proportional hazards structure for each of the failure types, and involves the solution to estimating equations.
Abstract: We propose a method to analyse competing risks survival data when failure types are missing for some individuals. Our approach is based on a standard proportional hazards structure for each of the failure types, and involves the solution to estimating equations. We present consistent and asymptotically normal estimators of the regression coefficients and related score tests. An appealing feature is that individuals with known failure types make the same contributions as they would to a standard proportional hazards analysis. Contributions of individuals with unknown failure types are weighted according to the probability that they failed from the cause of interest. Efficiency and robustness are discussed. Results are illustrated with data from a breast cancer trial.