scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 1989"


Journal ArticleDOI
TL;DR: In this article, a bias correction to the Akaike information criterion, called AICC, is derived for regression and autoregressive time series models, which is of particular use when the sample size is small, or when the number of fitted parameters is a moderate to large fraction of the sample sample size.
Abstract: SUMMARY A bias correction to the Akaike information criterion, AIC, is derived for regression and autoregressive time series models. The correction is of particular use when the sample size is small, or when the number of fitted parameters is a moderate to large fraction of the sample size. The corrected method, called AICC, is asymptotically efficient if the true model is infinite dimensional. Furthermore, when the true model is of finite dimension, AICC is found to provide better model order choices than any other asymptotically efficient method. Applications to nonstationary autoregressive and mixed autoregressive moving average time series models are also discussed.

5,867 citations


Journal ArticleDOI
TL;DR: In this article, a procedure is given for estimating the size of a closed population in the presence of heterogeneous capture probabilities using capture-recapture data when it is possible to model the capture probabilities of individuals in the population using covariates.
Abstract: SUMMARY A procedure is given for estimating the size of a closed population in the presence of heterogeneous capture probabilities using capture-recapture data when it is possible to model the capture probabilities of individuals in the population using covariates. The results include the estimation of the parameters associated with the model of the capture probabilities and the use of these estimated capture probabilities to estimate the population size. Confidence intervals for the population size using both the asymptotic normality of the estimator and a bootstrap procedure for small samples are given.

701 citations


Journal ArticleDOI
TL;DR: Concepts of v-fold cross-validation and repeated learning-testing methods are introduced here and are computationally much less expensive than ordinary cross- validation and can be used in its place in many problems.
Abstract: SUMMARY Concepts of v-fold cross-validation and repeated learning-testing methods have been introduced here. In many problems, these methods are computationally much less expensive than ordinary cross-validation and can be used in its place. A comparative study of these three methods has been carried out in detail.

637 citations


Journal ArticleDOI
TL;DR: In this paper, a broad class of nonparametric statistics for comparing two diagnostic markers is studied, and test procedures and confidence intervals are based on asymptotic normality.
Abstract: SUMMARY In this paper we study a broad class of nonparametric statistics for comparing two diagnostic markers. One can compare the sensitivities of these diagnostic markers over restricted ranges of specificity by selecting an appropriate statistic from this class. As special cases, one can compare the entire area under the receiver-operator curve (Hanley & McNeil, 1982), or one can compare the sensitivities at a fixed common specificity. Usually we would recommend a comparison based on an average of sensitivities over a restricted high level of specificities. Test procedures and confidence intervals are based on asymptotic normality. These procedures are applicable for paired data, in which both diagnostic markers are performed on each subject, and for unpaired data. The procedures may be used to compare two real functions of multiple diagnostic markers as well as to compare individual markers.

361 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the consequences for identifiability of introducing regressors into the competing risks model of multistate duration analysis and establish conditions under which access to regressors overturns the nonidentification theorem of Cox and Tsiatis for both proportional and accelerated failure time models.
Abstract: SUMMARY This paper considers the consequences for identifiability of introducing regressors into the competing risks model of multistate duration analysis. We establish conditions under which access to regressors overturns the nonidentification theorem of Cox and Tsiatis for both proportional and accelerated failure time models.

316 citations


Journal ArticleDOI
TL;DR: In this paper, the multivariate log normal mixture of independent Poisson distributions is studied and the properties of this discrete multivariate distribution are studied and its uses in a variety of applications to multivariate count data are illustrated.
Abstract: SUMMARY The statistical analysis of multivariate counts has proved difficult because of the lack of a parametric class of distributions supporting a rich enough correlation structure. With increasing availability of powerful computing facilities an obvious candidate for consideration is now the multivariate log normal mixture of independent Poisson distributions, the multivariate Poisson-log normal distribution. The properties of this discrete multivariate distribution are studied and its uses in a variety of applications to multivariate count data are illustrated.

315 citations


Journal ArticleDOI
TL;DR: In this article, simple Monte Carlo significance testing has many applications, particularly in the preliminary analysis of spatial data, where the value of the test statistic is ranked among a random sample of values generated according to the null hypothesis.
Abstract: SUMMARY Simple Monte Carlo significance testing has many applications, particularly in the preliminary analysis of spatial data. The method requires the value of the test statistic to be ranked among a random sample of values generated according to the null hypothesis. However, there are situations in which a sample of values can only be conveniently generated using a Markov chain, initiated by the observed data, so that independence is violated. This paper describes two methods that overcome the problem of dependence and allow exact tests to be carried out. The methods are applied to the Rasch model, to the finite lattice Ising model and to the testing of association between spatial processes. Power is discussed in a simple case.

301 citations


Journal ArticleDOI
TL;DR: In this article, the problem of constructing a prior that is "non-informative" for a single parameter in the presence of nuisance parameters is considered, and a general form for the class of priors satisfying Stein's condition is given.
Abstract: SUMMARY We consider the problem of constructing a prior that is 'noninformative' for a single parameter in the presence of nuisance parameters. Our approach is to require that the resulting marginal posterior intervals have accurate frequentist coverage. Stein (1985) derived nonrigorously a sufficient condition for such a prior. Through the use of orthogonal parameters, we give a general form for the class of priors satisfying Stein's condition. The priors are proportional to the square root of the information element for the parameter of interest times an arbitrary function of the nuisance parameters. This is in contrast to Jeffreys (1946) invariant prior for the overall parameter, which is proportional to the square root of the determinant of the information matrix. Several examples are given and comparisons are made to the reference priors of Bernardo (1979).

284 citations


Journal ArticleDOI
Yehuda Vardi1
TL;DR: In this paper, an estimateur du maximum de vraisemblance non-parametrique for a distribution of durees de vie is presented, based on un modele d'observations censurees of facon multiplicative and generalise des problemes statistiques courants.
Abstract: On donne un estimateur du maximum de vraisemblance non-parametrique pour une distribution de durees de vie. Il est base sur un modele d'observations censurees de facon multiplicative et il generalise des problemes statistiques courants tels que l'estimation d'une fonction de repartition sous une contrainte de densite decroissante, la deconvolution non-parametrique d'une variable aleatoire exponentielle ou un probleme d'estimation dans les processus de renouvellement

208 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider likelihood ratio tests to detect a change-point in simple linear regression (a) when the alternative specifies that only the intercept changes and (b) if the alternative permits the intercept and the slope to change.
Abstract: SUMMARY We consider likelihood ratio tests to detect a change-point in simple linear regression (a) when the alternative specifies that only the intercept changes and (b) when the alternative permits the intercept and the slope to change. Approximations for the significance level are obtained under reasonably general assumptions about the empirical distribution of the independent variable. The approximations are compared with simulations in order to assess their accuracy. For the model in which only the intercept is allowed to change, a confidence region for the change-point and an approximate joint confidence region for the change-point, the difference in intercepts, and the slope are obtained by inversion of the appropriate likelihood ratio tests.

207 citations


Journal ArticleDOI
TL;DR: In this article, the authors explored the use of nonparametric regression to check the fit of a parametric regression model and developed a pseudo likelihood ratio test to provide a global assessment of fit and simulation bands to indicate the nature of departures from the model.
Abstract: SUMMARY The use of nonparametric regression is explored to check the fit of a parametric regression model. The principal aim is to check the validity of the regression curve rather than necessarily to detect outliers. A pseudo likelihood ratio test is developed to provide a global assessment of fit and simulation bands are used to indicate the nature of departures from the model. The types of data considered include discrete response variables, where standard diagnostic techniques are often not appropriate, and first-order autoregressive series. Several numerical examples are given. Nonparametric regression can be used in an informal graphical way to assess the relationship between a response and an explanatory variable. In this paper we aim to develop more formal methods of assessing the assumptions of a parametric model, in particular when regression diagnostics of the type developed for normal linear models are not readily available. The principal aim is to check the validity of the systematic part of the model by comparing a nonparametric estimate of the regression curve with a parametric one. Such a comparison may also identify outliers, although the distinction between outliers and model inadequacy is not always easy. Two techniques are used to assess the fit of a parametric model. In ? 2, confidence bands are constructed around the fitted regression curve by simulation. A comparison of these with the nonparametric curve gives an indication of the nature of any departures from the model. In ? 3, a pseudo likelihood ratio test is developed. This provides a quantitative global assessment of fit. In applying these ideas, special emphasis is given to discrete data, and notably logistic regression, because of the difficulty in applying standard residual-based model checking techniques to this type of response variable. A Poisson regression example is discussed in ? 4. However, the underlying ideas have wider applications. Autoregressive time series of order 1 are discussed in ? 6. Sections 5 and 7 discuss general issues. We first discuss the context of binary regression with a single covariate and the difficulties caused by the discreteness of the response variable. The observed data are assumed to be of the form (xi, yi, ni), where xi is a covariate value, and yi has a binomial

Journal ArticleDOI
TL;DR: In this paper, a decision rule for the choice of a model which is strongly consistent for the true model as n -> oo is presented. But the decision rule is not applicable to the case where the distribution of the components of En is unknown.
Abstract: SUMMARY We consider the multiple regression model Yn = Xn, + En, where Yn and En are n-vector random variables, Xn is an n x m matrix and 83 is an m-vector of unknown regression parameters. Each component of ,3 may be zero or nonzero, which gives rise to 2' possible models for multiple regression. We provide a decision rule for the choice of a model which is strongly consistent for the true model as n -> oo. The result is proved under certain mild conditions, for instance without assuming normality of the distribution of the components of En.

Journal ArticleDOI
TL;DR: In this paper, it was shown that Hommel's modified Bonferroni procedure is at least as powerful as Hochberg's (1988) procedure, and in general, more powerful.
Abstract: SUMMARY It is shown that Hommel's (1988) modified Bonferroni procedure is at least as powerful as Hochberg's (1988) procedure, and, in general, more powerful.

Journal ArticleDOI
TL;DR: In this article, the iterative least-squares procedure for estimating the parameters in a general multilevel random coefficients linear model can be modified to produce unbiased estimates of the random parameters.
Abstract: SUMMARY It is shown that the iterative least-squares procedure for estimating the parameters in a general multilevel random coefficients linear model can be modified to produce unbiased estimates of the random parameters. In the multivariate normal case these are equivalent to restricted maximum likelihood estimates.

Journal ArticleDOI
TL;DR: In this paper, an alternative resampling plan, based on the bootstrap, is proposed in an attempt to estimate mean integrated squared error, which leads to a further data-based choice of smoothing parameter.
Abstract: SUMMARY Cross-validation based on integrated squared error has already been applied to the choice of smoothing parameter in the kernel method of density estimation. In this paper, an alternative resampling plan, based on the bootstrap, is proposed in an attempt to estimate mean integrated squared error. This leads to a further data-based choice of smoothing parameter. The two methods are compared and some simulations and examples demonstrate the relative merits. For large samples, the bootstrap performs better than cross-validation for many distributions.

Journal ArticleDOI
TL;DR: In this paper, a power covariance with range parameter is proposed for the spatial linear model and a convenient profile likelihood is introduced and studied in view of potential multimodal likelihoods for small samples.
Abstract: SUMMARY A popular covariance scheme used for the spatial linear model in geostatistics has spherical form. However, the likelihood is not twice differentiable with respect to the range parameter, and this raises some questions regarding the unimodality of the likelihood. We compare the likelihoods of the spatial linear model for small samples under this scheme and the doubly geometric scheme. Also, a power covariance with range parameter is proposed. In view of potential multimodal likelihoods for small samples for this model, a convenient profile likelihood is introduced and studied.

Journal ArticleDOI
TL;DR: A fast algorithm is developed for computing the conditional mean and variance of the signal given the observations in a signal plus noise model, and a new set of recursions which are more efficient than those for the fixed interval smoothing algorithm are introduced.
Abstract: SUMMARY A fast algorithm is developed for computing the conditional mean and variance of the signal given the observations in a signal plus noise model. The resulting recursions can be applied immediately to provide new and efficient formulae for smoothing part or all of the state vector. The ideas of studentized residuals and leverage from regression analysis are generalized to state space models, and the new algorithm is used to compute the various measures. The results are also applied to obtain a new efficient algorithm for polynomial spline smoothing. Suppose observations are generated by a Gaussian signal plus noise process, with the signal described by a state space model. Such a model for the signal occurs often in practice, for example when the signal is the output of a stochastic difference or differential equation. This paper presents a new algorithm for signal extraction, that is, the computa- tion of the conditional mean and variance of the signal given the observations. The usual method for signal extraction is to use the Kalman filter, see, for example, Anderson & Moore (1979, p. 105), followed by a smoothing step using an algorithm such as the fixed interval smoothing algorithm (Anderson & Moore, 1979, p. 187). Our approach also uses the Kalman filter, but for the smoothing step introduces a new set of recursions which are more efficient than those for the fixed interval smoothing algorithm. As for the fixed interval smoothing algorithm, recursions for estimating the signal and its variance are carried out separately, so that considerable additional savings can be made if the signal estimate alone is required. The development uses ideas introduced by Ansley & Kohn (1987a). A remarkable property of the new recursions is that they can be applied immediately

Journal ArticleDOI
TL;DR: An asymptotic approximation for the marginal density of a nonlinear function g(0) that is applicable when the joint density of 0 is dominated by a single mode and the Jacobian of g is of full rank near that mode is presented.
Abstract: This paper presents an asymptotic approximation for the marginal density of a nonlinear function g(0) that is applicable when the joint density of 0 is dominated by a single mode and the Jacobian of g is of full rank near that mode. The approximation is based on Laplace's method and its asymptotic properties are similar to those of the saddlepoint approximation. The approximation is applied to the computation of a marginal posterior density, a marginal sampling density and a marginal density based on a multivariate saddlepoint approximation to a joint density.

Journal ArticleDOI
TL;DR: The authors use asymptotic approximations to posterior expectations in deriving diagnostics for case influence and prior sensitivity, and obtain an easily-interpreted expression for the change in posterior expectation following perturbation of the problem, which also produces a useful interpretation of the likelihood displacement discussed by Cook & Weisberg.
Abstract: We use asymptotic approximations to posterior expectations in deriving diagnostics for case influence and prior sensitivity. We obtain an easily-interpreted expression for the change in posterior expectation following perturbation of the problem, which also produces a useful interpretation of the likelihood displacement discussed by Cook & Weisberg. In addition, we propose two simple diagnostics of posterior nonnormality. The leukaemia data analysed by Feigl & Zelen are used to illustrate the diagnostic measures we discuss

Journal ArticleDOI
TL;DR: In this article, the limiting distribution of the instrumental variable estimator when the estimated model is either (i) the true model, (ii) a random walk with shift in mean and a linear time trend, or (iii) a nonlinear time trend.
Abstract: SUMMARY In this paper we propose a new approach to testing for unit roots in a time series {Yt} with moving average innovations based on an instrumental variable estimator. If {Yt} is a random walk with moving average innovations, we derive the limiting distribution of the instrumental variable estimator when the estimated model is either (i) the true model, (ii) a random walk with shift in mean, or (iii) a random walk with shift in mean and a linear time trend. These distributions are identical to those tabulated by Dickey & Fuller (1979, 1981) in some cases, and easily transformed, in the spirit of Phillips (1987), to the Dickey & Fuller distributions in others.

Journal ArticleDOI
TL;DR: In this paper, a multivariate statistic, designed to be sensitive to the alternative hypothesis that a mean vector lies in the positive orthant, rather than to all alternatives or to alternatives on a line, is proposed.
Abstract: SUMMARY A new multivariate statistic, designed to be sensitive to the alternative hypothesis that a mean vector lies in the positive orthant, rather than to all alternatives or to alternatives on a line (O'Brien, 1984), is proposed. The statistic is shown to approximate the likelihood ratio test statistic for such alternatives (Kud5, 1963; Perlman, 1969). Its null distribution is a special case of the chi-bar-squared distribution, which allows critical values to be obtained. Such a statistic may be used in two-armed clinical trials where we are interested in showing that one treatment is better than another with respect to several responses. As an example, a clinical trial is reanalyzed, with stronger conclusions than had been drawn originally.

Journal ArticleDOI
TL;DR: In this paper, a design-based estimation of the variance of the general regression estimator of the finite population total has been proposed, which is nearly unbiased under a suitably chosen regression model, and works well for conditional inference.
Abstract: SUMMARY The paper deals with design based estimation of the variance of the general regression estimator of the finite population total. The usual Taylor linearization variance estimator is an expression in the design weighted regression residuals; in many applications the resulting expression is counterintuitive from a model based standpoint. The improved variance estimator in this paper attaches another simple weight, called 'g-weight', to each individual residual. This new variance estimator (i) gives valid design-based confidence intervals, (ii) is nearly unbiased uiider a suitably chosen regression model, and (iii) works well for conditional inference. Examples are given.

Journal ArticleDOI
TL;DR: In this article, the Fisher information matrix for the normal approximation is given and the hypothesis of shape change is considered, and a practical example based on digitized shapes is analyzed using a real-world example.
Abstract: SUMMARY The paper deals with the statistical shape analysis of landmark data. Bookstein (1986) introduced a plausible model and obtained a useful normal approximation for three landmarks. We use exact distribution theory and a normal approximation to consider maximum likelihood estimation. The Fisher information matrix for the normal approximation is given and we consider the hypothesis of shape change. Using these results, we analyse a practical example based on digitized shapes. This example motivated the work and the general direction for future work is indicated.

Journal ArticleDOI
TL;DR: In this article, the authors apply the local influence method of Cook (1986) to assess the effect of small perturbations of the data, including the vector of responses, case weights, explanatory variables, and the components of one case.
Abstract: SUMMARY Suggested diagnostics for influence on the estimated regression coefficients in a generalized linear model have generally approximated the effect of deleting a single case. We apply the local influence method of Cook (1986) to assess the effect of small perturbations of the data, including the vector of responses, case weights, explanatory variables, and the components of one case. The resulting diagnostics allow one to check for different kinds of influence, and may give insight into its workings. Two examples illustrate some of the diagnostics.

Journal ArticleDOI
TL;DR: In this article, orthogonal regression analogues of M estimates of regression have been defined, which provide a robust estimate for the scale of the Orthogonal residuals, a crucial quantity in the computation of ORLM estimates.
Abstract: SUMMARY Orthogonal regression analogues of M estimates of regression, called here orthogonal regression M estimates, are defined. These estimates are shown to be consistent at elliptical errors-in-variables models and robust, if the corresponding loss function is bounded. The orthogonal regression analogues of regression scale estimates, called here orthogonal regression S estimates, are considered as well. In particular, they provide a robust estimate for the scale of the orthogonal residuals, a crucial quantity in the computation of orthogonal regression M estimates. Finally we present an algorithm for computing orthogonal regression S and M estimates and the results of a small Monte Carlo experiment.

Journal ArticleDOI
TL;DR: In this article, the properties of optimum designs when there are both qualitative and quantitative factors are investigated, and algorithms for the construction of exact designs are described, which are of general applicability.
Abstract: SUMMARY The properties of optimum designs are investigated when there are both qualitative and quantitative factors. Algorithms for the construction of exact designs are described, which are of general applicability. The case when the qualitative factors represent blocking variables is given special treatment; a modified algorithm provides the opportunity to divide the experimental trials into blocks of specified sizes. Examples are given of second-order models in the quantitative factors divided into two or three blocks of a variety of sizes. Use of the adjustment algorithm to improve such designs generated from a list of candidate points is demonstrated.

Journal ArticleDOI
TL;DR: In this paper, the authors show how decompositions of marked graphs induce corresponding decomposition of maximum likelihood estimates in mixed graphical interaction models and use this result to obtain explicit formulae for the estimates in decomposable models and to obtain factorizations of likelihood ratio statistics.
Abstract: SUMMARY We show how decompositions of marked graphs induce corresponding decompositions of maximum likelihood estimates in mixed graphical interaction models. We use this result to obtain explicit formulae for the estimates in decomposable models and to obtain factorizations of likelihood ratio statistics.

Journal ArticleDOI
TL;DR: In this paper, it is shown that three new, efficient bootstrap algorithms are asymptotically equivalent and that each reduces the order of magnitude of mean squared error by the factor n-1, where n is sample size.
Abstract: SUMMARY It is shown that three new, efficient bootstrap algorithms are asymptotically equivalent. This is done in two ways. First, asymptotic formulae for variances and mean squared errors are derived, and shown to be identical. Secondly, it is demonstrated that two of the methods may be viewed as approximations to the third. The three algorithms considered are the balanced bootstrap and the linear approximation method proposed by Davison, Hinkley & Schechtman (1986), and a centring method proposed by Efron in the context of bias estimation. It is shown that each reduces the order of magnitude of mean squared error by the factor n-1, where n is sample size. These results apply to smooth functions of means.

Journal ArticleDOI
TL;DR: In this paper, the authors examined predictive distributions, concentrating on measuring their fit to the true distribution by average Kullback-Leibler divergence, and the notion of an "averaged bootstrap" predictive distribution was introduced.
Abstract: SUMMARY The paper examines predictive distributions, concentrating on measuring their fit to the true distribution by average Kullback-Leibler divergence. The notion of an 'averaged bootstrap' predictive distribution is introduced. This predictive distribution is shown to be asymptotically superior to the estimative distribution, in terms of average KullbackLeibler divergence, when the true distribution is in a natural exponential family. Smallsample results are presented for the Poisson and binomial distributions which suggest that the bootstrap distribution performs well in these cases.

Journal ArticleDOI
TL;DR: In this article, the authors define a bootstrap sampling plan for the case-cohort design and present bootstrap variance estimates and confidence intervals for the log relative hazard, which are found to be near nominal levels in simulations.
Abstract: SUMMARY We define a bootstrap sampling plan for the case-cohort design and present bootstrap variance estimates and confidence intervals for the log relative hazard. The coverages of these confidence intervals are found to be near nominal levels in simulations. A simple null variance calculation, based on a superpopulation model, is shown to provide good estimates of the power and efficiency of the case-cohort design.