scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 2000"


Journal ArticleDOI
TL;DR: In this article, an extension of the propensity score methodology is proposed that allows for estimation of average casual effects with multi-valued treatments, which is not necessary to divide the population into sub-populations where causal comparisons are valid.
Abstract: SUMMARY Estimation of average treatment effects in observational studies often requires adjustment for differences in pre-treatment variables. If the number of pre-treatment variables is large, standard covariance adjustment methods are often inadequate. Rosenbaum & Rubin (1983) propose an alternative method for adjusting for pre-treatment variables for the binary treatment case based on the so-called propensity score. Here an extension of the propensity score methodology is proposed that allows for estimation of average casual effects with multi-valued treatments. Estimation of average treatment effects in observational studies often requires adjustment for differences in pre-treatment variables. If the number of pre-treatment variables is large and their distribution varies substantially with treatment status, standard adjustment methods such as covari- ance adjustment are often inadequate. Rosenbaum & Rubin (1983, 1984) propose an alternative method for adjusting for pre-treatment variables based on the propensity score, the conditional probability of receiving the treatment given pre-treatment variables. They demonstrate that adjusting solely for the propensity score removes all bias associated with differences in the pre- treatment variables. The Rosenbaum-Rubin proposals deal exclusively with binary-valued treat- ments. In many cases of interest, however, treatments take on more than two values. Here an extension of the propensity score methodology is proposed that allows for estimation of average causal effects with multi-valued treatments. The key insight is that for estimation of average causal effects it is not necessary to divide the population into subpopulations where causal comparisons are valid, as the propensity score does; it is sufficient to divide the population into subpopulations where average potential outcomes can be estimated.

1,382 citations


Journal ArticleDOI
TL;DR: The purpose of the current paper is to explore ways in which runs from several levels of a code can be used to make inference about the output from the most complex code.
Abstract: S We consider prediction and uncertainty analysis for complex computer codes which can be run at different levels of sophistication. In particular, we wish to improve efficiency by combining expensive runs of the most complex versions of the code with relatively cheap runs from one or more simpler approximations. A Bayesian approach is described in which prior beliefs about the codes are represented in terms of Gaussian processes. An example is presented using two versions of an oil reservoir simulator. 1. C  Complex mathematical models, implemented in large computer codes, have been used to study real systems in many areas of scientific research (Sacks et al., 1989), usually because physical experimentation is too costly and sometimes impossible, as in the case of large environmental systems. A ‘computer experiment’ involves running the code with various input values for the purpose of learning something about the real system. Often a simulator can be run at different levels of complexity, with versions ranging from the most sophisticated high level code to the most basic. For example, in § 4 we consider two codes which simulate oil pressure at a well of a hydrocarbon reservoir. Both codes use finite element analysis, in which the rocks comprising the reservoir are represented by small interacting grid blocks. The flow of oil within the reservoir can be simulated by considering the interaction between the blocks. The two codes differ in the resolution of the grid, so that we have a very accurate, slow version using many small blocks and a crude approximation using large blocks which runs much faster. Alternatively, a mathematical model could be expanded to include more of the scientific laws underlying the physical processes. Simple, fast versions of the code may well include the most important features, and are useful for preliminary investigations. In real-time applications the number of runs from a high level simulator may be limited by expense. Then there is a need to trade-off the complexity of the expensive code with the availability of the simpler approximations. The purpose of the current paper is to explore ways in which runs from several levels of a code can be used to make inference about the output from the most complex code. We may also have uncertainty about values for the input parameters which apply in any given application. Uncertainty analysis of computer codes describes how this uncertainty on the inputs affects our uncertainty about the output.

1,260 citations


Journal ArticleDOI
TL;DR: In this paper, a new power transformation family is introduced for reducing skewness and approximate normality. But the power transformation is not suitable for the real line. And the large-sample properties of the transformation are investigated in the contect of a single random sample.
Abstract: SUMMARY We introduce a new power transformation family which is well defined on the whole real line and which is appropriate for reducing skewness and to approximate normality. It has properties similar to those of the Box-Cox transformation for positive variables. The large-sample properties of the transformation are investigated in the contect of a single random sample.

1,047 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed empirical Bayes selection criteria that use hyperparameter estimates instead of fixed choices for variable selection for the normal linear model, and their performance is seen to approximate adaptively the performance of the best fixed penalty criterion across a variety of orthogonal and nonorthogonal set-ups, including wavelet regression.
Abstract: For the problem of variable selection for the normal linear model, selection criteria such as AIC, C p , BIC and RIC have fixed dimensionality penalties. Such criteria are shown to correspond to selection of maximum posterior models under implicit hyperparameter choices for a particular hierarchical Bayes formulation. Based on this calibration, we propose empirical Bayes selection criteria that use hyperparameter estimates instead of fixed choices. For obtaining these estimates, both marginal and conditional maximum likelihood methods are considered. As opposed to traditional fixed penalty criteria, these empirical Bayes criteria have dimensionality penalties that depend on the data. Their performance is seen to approximate adaptively the performance of the best fixed-penalty criterion across a variety of orthogonal and nonorthogonal set-ups, including wavelet regression. Empirical Bayes shrinkage estimators of the selected coefficients are also proposed.

493 citations


Journal ArticleDOI
TL;DR: In this paper, the inverse of the working correlation matrix is represented by the linear combination of basis matrices, a representation that is valid for the working correlations most commonly used, and the test statistic follows a chi-squared distribution asymptotically whether or not the correlation structure is correctly specified.
Abstract: SUMMARY Generalised estimating equations enable one to estimate regression parameters consistently in longitudinal data analysis even when the correlation structure is misspecified. However, under such misspecification, the estimator of the regression parameter can be inefficient. In this paper we introduce a method of quadratic inference functions that does not involve direct estimation of the correlation parameter, and that remains optimal even if the working correlation structure is misspecified. The idea is to represent the inverse of the working correlation matrix by the linear combination of basis matrices, a representation that is valid for the working correlations most commonly used. Both asymptotic theory and simulation show that under misspecified working assumptions these estimators are more efficient than estimators from generalised estimating equations. This approach also provides a chi-squared inference function for testing nested models and a chi-squared regression misspecification test. Furthermore, the test statistic follows a chi-squared distribution asymptotically whether or not the working correlation structure is correctly specified.

380 citations


Journal ArticleDOI
TL;DR: In this article, the problem of measuring bilateral symmetry of objects analytically for landmark data was formulated and various new testing procedures and exploratory data analyses were developed, which are linked with the fundamental biological problem of measure directional asymmetry and fluctuating asymmetry.
Abstract: SUMMARY This paper formulates the problem of measuring bilateral symmetry of objects analytically for landmark data, and develops various new testing procedures and exploratory data analyses. The development is linked with the fundamental biological problem of measuring directional asymmetry and fluctuating asymmetry. We distinguish two types of symmetry, object symmetry and matching symmetry, and provide tests under assumptions of isotropic landmark variability as well as non-isotropy. The tests require novel statistical and geometrical analyses of the Procrustes shape manifold. For describing components of symmetry and asymmetry within samples, an extension of principal component analysis is introduced and illustrated by appropriate deformation techniques. Various real examples are used to illustrate the practical relevance of this work.

371 citations


Journal ArticleDOI
TL;DR: In this article, a class of weighted estimators which account appropriately for censoring is introduced, and the efficiency of these estimators is studied with the goal of finding as efficient an estimator for the mean medical cost as is feasible.
Abstract: Incompleteness of follow-up data is a common problem in estimating medical costs. Naive analysis using summary statistics on the collected data can result in severely misleading statistical inference. This paper focuses on the problem of estimating the mean medical cost from a sample of individuals whose medical costs may be right censored. A class of weighted estimators which account appropriately for censoring are introduced. Our estimators are shown to be consistent and asymptotically normal with easily estimated variances. The efficiency of these estimators is studied with the goal of finding as efficient an estimator for the mean medical cost as is feasible. Extensive simulation studies are used to show that our estimators perform well in finite samples, even with heavily censored data, for a variety of circumstances. The methods are applied to a set of cost data from a cardiology trial conducted by the Duke University Medical Center. Extensions to other censored data problems are also discussed.

347 citations


Journal ArticleDOI
TL;DR: In this paper, the influence functions and the corresponding asymptotic variances for these robust estimators of eigenvalues and eigenvectors are investigated by a simulation study, and it turns out that the theoretical results and simulations favor the use of S-estimators since they combine a high efficiency with appealing robustness properties.
Abstract: A robust principal component analysis can be easily performed by computing the eigenvalues and eigenvectors of a robust estimator of the covariance or correlation matrix. In this paper we derive the influence functions and the corresponding asymptotic variances for these robust estimators of eigenvalues and eigenvectors. The behaviour of several of these estimators is investigated by a simulation study. It turns out that the theoretical results and simulations favour the use of S-estimators, since they combine a high efficiency with appealing robustness properties. © 2000 Biometrika Trust.

335 citations


Journal ArticleDOI
TL;DR: It is found that a certain beta two-parameter process may be suitable for finite mixture modelling because the distinct number of sampled values from this process tends to match closely the number of components of the underlying mixture distribution.
Abstract: SUMMARY We present some easy-to-construct random probability measures which approximate the Dirichlet process and an extension which we will call the beta two-parameter process. The nature of these constructions makes it simple to implement Markov chain Monte Carlo algorithms for fitting nonparametric hierarchical models and mixtures of nonparametric hierarchical models. For the Dirichlet process, we consider a truncation approximation as well as a weak limit approximation based on a mixture of Dirichlet processes. The same type of truncation approximation can also be applied to the beta two-parameter process. Both methods lead to posteriors which can be fitted using Markov chain Monte Carlo algorithms that take advantage of blocked coordinate updates. These algorithms promote rapid mixing of the Markov chain and can be readily applied to normal mean mixture models and to density estimation problems. We prefer the truncation approximations, since a simple device for monitoring the adequacy of the approximation can be easily computed from the output of the Gibbs sampler. Furthermore, for the Dirichlet process, the truncation approximation offers an exponentially higher degree of accuracy over the weak limit approximation for the same computational effort. We also find that a certain beta two-parameter process may be suitable for finite mixture modelling because the distinct number of sampled values from this process tends to match closely the number of components of the underlying mixture distribution.

300 citations


Journal ArticleDOI
TL;DR: This paper derived an estimator of the asymptotic variance of both single and multiple imputation estimators, assuming a parametric imputation model but allowing for non-and semiparametric analysis models.
Abstract: We derive an estimator of the asymptotic variance of both single and multiple imputation estimators. We assume a parametric imputation model but allow for non- and semiparametric analysis models. Our variance estimator, in contrast to the estimator proposed by Rubin (1987), is consistent even when the imputation and analysis models are misspecified and incompatible with one another.

289 citations


Journal ArticleDOI
TL;DR: In this article, the maximum likelihood estimators of the parameters of a generalised linear model for the covariance matrix, their consistency and their asymptotic normality are studied when the observations are normally distributed.
Abstract: The positive-definiteness constraint is the most awkward stumbling block in modelling the covariance matrix. Pourahmadi's (1999) unconstrained parameterisation models covariance using covariates in a similar manner to mean modelling in generalised linear models. The new covariance parameters have statistical interpretation as the regression coefficients and logarithms of prediction error variances corresponding to regressing a response on its predecessors. In this paper, the maximum likelihood estimators of the parameters of a generalised linear model for the covariance matrix, their consistency and their asymptotic normality are studied when the observations are normally distributed. These results along with the likelihood ratio test and penalised likelihood criteria such as BIC for model and variable selection are illustrated using a real dataset.

Journal ArticleDOI
TL;DR: In this article, the authors consider situations where a step function with a variable number of steps provides an adequate model for a regression relationship, while the variance of the observations depends on their mean.
Abstract: SUMMARY We consider situations where a step function with a variable number of steps provides an adequate model for a regression relationship, while the variance of the observations depends on their mean. This model provides for discontinuous jumps at changepoints and for constant means and error variances in between changepoints. The basic statistical problem consists of identification of the number of changepoints, their locations and the levels the function assumes in between. We embed this problem into a quasilikelihood formulation and utilise the minimum deviance criterion to fit the model; for the choice of the number of changepoints, we discuss a modified Schwarz criterion. A dynamic programming algorithm makes the segmentation feasible for sequences of moderate length. The performance of the segmentation method is demonstrated in an application to the segmentation of the Bacteriophage )l sequence.

Journal ArticleDOI
TL;DR: In this article, the authors provide a formal justification of Binder's method and present an alternative approach which regards the survey population as a random sample from an infinite universe and accounts for this randomness in the statistical inference.
Abstract: SUMMARY Binder (1992) proposed a method of fitting Cox's proportional hazards models to survey data with complex sampling designs. He defined the regression parameter of interest as the solution to the partial likelihood score equation based on all the data values of the survey population under study, and developed heuristically a procedure to estimate the regression parameter and the corresponding variance. In this paper, we provide a formal justification of Binder's method. Furthermore, we present an alternative approach which regards the survey population as a random sample from an infinite universe and accounts for this randomness in the statistical inference. Under the alternative approach, the regression parameter retains its original interpretation as the log hazard ratio, and the statistical conclusion applies to other populations. The related problem of survival function estimation is also studied.

Journal ArticleDOI
TL;DR: The generalised Gibbs sampler provides a framework encompassing a class of recently proposed tricks such as parameter expansion and reparameterisation and is applied to Bayesian inference problems for nonlinear state-space models, ordinal data and stochastic differential equations with discrete observations.
Abstract: SUMMARY Although Monte Carlo methods have frequently been applied with success, indiscriminate use of Markov chain Monte Carlo leads to unsatisfactory performances in numerous applications. We present a generalised version of the Gibbs sampler that is based on conditional moves along the traces of groups of transformations in the sample space. We explore its connection with the multigrid Monte Carlo method and its use in designing more efficient samplers. The generalised Gibbs sampler provides a framework encompassing a class of recently proposed tricks such as parameter expansion and reparameterisation. To illustrate, we apply this new method to Bayesian inference problems for nonlinear state-space models, ordinal data and stochastic differential equations with discrete observations.

Journal ArticleDOI
TL;DR: In this article, the authors generalize four types of disturbance commonly used in univariate time series analysis to the multivariate case, highlight the differences between univariate and multivariate outliers, and investigate dynamic effects of a multivariate outlier on individual components.
Abstract: SUMMARY This paper generalises four types of disturbance commonly used in univariate time series analysis to the multivariate case, highlights the differences between univariate and multivariate outliers, and investigates dynamic effects of a multivariate outlier on individual components. The effect of a multivariate outlier depends not only on its size and the underlying model, but also on the interaction between the size and the dynamic structure of the model. The latter factor does not appear in the univariate case. A multivariate outlier can introduce various types of outlier for the marginal component models. By comparing and contrasting results of univariate and multivariate outlier detections, one can gain insights into the characteristics of an outlier. We use real examples to demonstrate the proposed analysis.

Journal ArticleDOI
TL;DR: In this article, the authors developed a practical approach to diagnose the existence of a latent stochastic process in the mean of a Poisson regression model and derived the asymptotic distribution of standard generalised linear model estimators for the case where an autocorrelated latent process is present.
Abstract: SUMMARY This paper develops a practical approach to diagnosing the existence of a latent stochastic process in the mean of a Poisson regression model. The asymptotic distribution of standard generalised linear model estimators is derived for the case where an autocorrelated latent process is present. Simple formulae for the effect of autocovariance on standard errors of the regression coefficients are also provided. Methods for adjusting for the severe bias in previously proposed estimators of autocovariance are derived and their behaviour is investigated. Applications of the methods to time series of monthly polio counts in the U.S.A. and daily asthma presentations at a hospital in Sydney are used to illustrate the results and methods.

Journal ArticleDOI
TL;DR: In this paper, the centred L2-discrepancy measure for uniformity in terms of the word-length pattern has been shown to be related to minimum aberration.
Abstract: SUMMARY We show a link between the two apparently unrelated areas of uniformity and minimum aberration. With reference to regular fractions of two-level factorials, we derive an expression for the centred L2-discrepancy measure for uniformity in terms of the word-length pattern. This result indicates, in particular, excellent behaviour of minimum aberration designs with regard to uniformity and provides further justification for the popular criterion of minimum aberration.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the upper triangular matrix Φ obtained by the Cholesky decomposition Σ - 1 = Φ T Φ and provide an interesting alternative parameterisation of decomposable models since its upper triangle has the same zero structure as Σ − 1 and its elements have an interpretation as parameters of certain conditional distributions.
Abstract: The canonical parameter of a covariance selection model is the inverse covariance matrix Σ -1 whose zero pattern gives the conditional independence structure characterising the model. In this paper we consider the upper triangular matrix Φ obtained by the Cholesky decomposition Σ - 1 = Φ T Φ. This provides an interesting alternative parameterisation of decomposable models since its upper triangle has the same zero structure as Σ -1 and its elements have an interpretation as parameters of certain conditional distributions. For a distribution for Σ, the strong hyper-Markov property is shown to be characterised by the mutual independence of the rows of Φ. This is further used to generalise to the hyper inverse Wishart distribution some well-known properties of the inverse Wishart distribution. In particular we show that a hyper inverse Wishart matrix can be decomposed into independent normal and chi-squared random variables, and we describe a family of transformations under which the family of hyper inverse Wishart distributions is closed.

Journal ArticleDOI
TL;DR: In this article, the authors used case-cohort data to estimate the regression parameter of the additive hazards model, which specifies that the conditional hazard function given a set of covariates is the sum of an arbitrary baseline hazard function and a regression function of the covariates.
Abstract: SUMMARY The case-cohort design is a common means of reducing cost in large epidemiological cohort studies. Under this design, covariates are measured only on the cases and a subcohort randomly selected from the entire cohort. In this paper, we demonstrate how to use the case-cohort data to estimate the regression parameter of the additive hazards model, which specifies that the conditional hazard function given a set of covariates is the sum of an arbitrary baseline hazard function and a regression function of the covariates. The proposed estimator is shown to be consistent and asymptotically normal with an easily estimated variance. The subcohort may be selected by independent Bernoulli sampling with arbitrary selection probabilities or by stratified simple random sampling. The efficiencies of various sampling schemes are investigated both analytically and by simulation. A real example is provided.

Journal ArticleDOI
TL;DR: In this paper, statistical methods to analyse such correlated observations are proposed for the Cox proportional hazards and odds models, and they use the data from a recent study of the genetic aetiology of alcoholism to illustrate the new procedures for estimation, prediction and model selection.
Abstract: Inference procedures based on the partial likelihood function for the Cox proportional hazards model have been generalised to the case in which the data consist of a large number of independent small groups of correlated failure time observations (Lee, Wei & Amato, 1992; Liang, Self & Chang, 1993; Cai & Prentice, 1997). However, the Cox model may not fit the data well. A class of linear transformation models, which includes the proportional hazards and odds models as special cases, has been studied extensively for univariate event times. In this paper, statistical methods to analyse such correlated observations are proposed for these models. We use the data from a recent study of the genetic aetiology of alcoholism to illustrate the new procedures for estimation, prediction and model selection.

Journal ArticleDOI
TL;DR: In this paper, the orthogonal multitaper framework for cross-spectral estimators provides a simple unifying structure for determining the corresponding statistical properties, including mean, smoothing and leakage biases, variances and asymptotic distributions.
Abstract: SUMMARY The orthogonal multitaper framework for cross-spectral estimators provides a simple unifying structure for determining the corresponding statistical properties. Here crossspectral estimators are represented by a weighted average of orthogonally-tapered crossperiodograms, with the weights corresponding to a set of rescaled eigenvalues. Such a structure not only encompasses the Thomson estimators, using Slepian and sine tapers, but also Welch's weighted overlapped segment averaging estimator and lag window estimators including frequency-averaged cross-periodograms. The means, smoothing and leakage biases, variances and asymptotic distributions of such estimators can all be formulated in a common way; comparisons are made for a fixed number of degrees of freedom. The common structure of the estimators also provides a necessary condition for the invertibility of an estimated cross-spectral matrix, namely that the weight matrix of the estimator written in bilinear form must have rank greater than or equal to the dimension of the cross-spectral matrix. An example is given showing the importance of small leakage and thus illustrating that the various estimators need not be equivalent in practice.

Journal ArticleDOI
TL;DR: In this paper, a two-stage estimation procedure was proposed to estimate the association parameter which is related to Kendall's tau, and asymptotic properties of the proposed semiparametric estimator showed that, although the first-stage marginal estimators have a convergence rate of only n 1/3, the resulting parameter estimator still converges to a normal random variable with the usual n 1 /2 rate.
Abstract: Assuming that the two failure times of interest with bivariate current status data follow a bivariate copula model, we propose a two-stage estimation procedure to estimate the association parameter which is related to Kendall's tau. Asymptotic properties of the proposed semiparametric estimator show that, although the first-stage marginal estimators have a convergence rate of only n 1/3 , the resulting parameter estimator still converges to a normal random variable with the usual n 1/2 rate. The variance of the proposed estimator can be consistently estimated. Simulation results are presented, and a community-based study of cardiovascular diseases in Taiwan provides an illustrative example.

Journal ArticleDOI
TL;DR: In this article, empirical likelihood is considered in conjunction with the local linear smoother to construct confidence intervals for a nonparametric regression function with bounded support, and the coverage error of the empirical likelihood confidence intervals is evaluated and is shown to be of the same order throughout the support of the regression function.
Abstract: SUMMARY Empirical likelihood is considered in conjunction with the local linear smoother to construct confidence intervals for a nonparametric regression function with bounded support. The coverage error of the empirical likelihood confidence intervals is evaluated and is shown to be of the same order throughout the support of the regression function. This is a significant improvement over confidence intervals based directly on the asymptotic normal distribution of the local linear estimator, which have a larger order of coverage error near the boundary. This improvement is attributable to the natural variance estimator that empirical likelihood implicitly chooses for the local linear smoother.

Journal ArticleDOI
TL;DR: This article showed that the combined likelihood can effectively incorporate auxiliary information like this as long as it can be summarised as unbiased estimating equations, and proved a Wilks' type theorem for combined likelihood ratio statistic.
Abstract: Imbens & Lancaster (1994) pointed out that census reports can be interpreted as providing nearly exact knowledge of moments of the marginal distribution of economic variables. In this paper we show that empirical likelihood can effectively incorporate auxiliary information like this as long as it can be summarised as unbiased estimating equations. By combining empirical and parametric likelihoods, we show that the combined likelihood can produce valid inferences for the underlying parameters. A Wilks' type theorem is proved for the combined likelihood ratio statistic. Simulation results demonstrate that the performance of the combined likelihood ratio confidence intervals is better than conventional confidence intervals that use a normal approximation.

Journal ArticleDOI
TL;DR: In this article, nonparametric methods for estimating both the period and the amplitude function from noisy observations of a periodic function made at irregularly spaced times were studied. But the shape of the periodic function is unknown and the first-order properties of the amplitude functions are identical to those that would obtain if the period were known.
Abstract: SUMMARY Motivated by applications to brightness data on periodic variable stars, we study nonparametric methods for estimating both the period and the amplitude function from noisy observations of a periodic function made at irregularly spaced times It is shown that nonparametric estimators of period converge at parametric rates and attain a semiparametric lower bound which is the same if the shape of the periodic function is unknown as if it were known Also, first-order properties of nonparametric estimators of the amplitude function are identical to those that would obtain if the period were known Numerical simulations and applications to real data show the method to work well in practice

Journal ArticleDOI
TL;DR: A mixture model originally developed for regression models with independent data for the more general case of correlated outcome data, which includes longitudinal data as a special case is adapted, and the systematic component of this mixture of marginal models is more flexible than the conventional linear function.
Abstract: SUMMARY In this paper, we adapt a mixture model originally developed for regression models with independent data for the more general case of correlated outcome data, which includes longitudinal data as a special case. The estimation is performed by a generalisation of the EM algorithm which we call the Expectation-Solution (Es) algorithm. In this ES algorithm the M-step of the EM algorithm is replaced by a step requiring the solution of a series of generalised estimating equations. The ES algorithm, a general algorithm for solving generalised estimating equations with incomplete data, is then applied to the present problem of mixtures of marginal models. In addition to allowing for correlation inherent in correlated outcome data, the systematic component of this mixture of marginal models is more flexible than the conventional linear function. The methodology is applied in the contexts of normal and Poisson response data. Some theory regarding the ES algorithm is presented.

Journal ArticleDOI
TL;DR: The rank correlation coefficient between two functions is a generalisation of the rank correlation between two finite sets of numbers and is equal to one if and only if the two functions have the same shape as mentioned in this paper.
Abstract: SUMMARY Does a regression function follow a specified shape? Do two regression functions have the same shape? How can regression functions be grouped, based on shape? These questions can occur when investigating monotonicity, when counting local maxima or when studying variation in families of curves. One can address these questions by considering the rank correlation coefficient between two functions. This correlation is a generalisation of the rank correlation between two finite sets of numbers and is equal to one if and only if the two functions have the same shape. A sample rank correlation based on smoothed estimates of the regression functions consistently estimates the true correlation. This sample rank correlation can be used as a measure of similarity between functions in cluster analysis and as a measure of monotonicity or modality.

Journal ArticleDOI
TL;DR: In this article, the authors study lack-of-fit tests based on orthogonal series estimators, which are functions of score statistics that employ data-driven model dimensions.
Abstract: SUMMARY We study lack-of-fit tests based on orthogonal series estimators. A common feature of these tests is that they are functions of score statistics that employ data-driven model dimensions. The criteria used to select the dimension are score-based versions of AIC and BIC. The tests can be applied in a wide variety of settings, including both continuous and discrete data. With two or more covariates, a model sequence, i.e. a path in the alternative models space, has to be chosen. Critical points and p-values of the lack-of-fit tests can be obtained via asymptotic distribution theory or by use of the bootstrap. Data examples and a simulation study illustrate the applicability of the tests.

Journal ArticleDOI
TL;DR: In this article, a semiparametric model which relates the mean of the response variable at each time point proportionally to a function of a time-dependent covariate vector is proposed.
Abstract: SUMMARY In a longitudinal study, suppose that, for each subject, repeated measurements of the response variable and covariates are collected at a set of distinct, irregularly spaced time points. We consider a semiparametric model which relates the mean of the response variable at each time point proportionally to a function of a time-dependent covariate vector to analyse such panel data. Inference procedures for regression parameters are proposed without involving any nonparametric function estimation for the nuisance mean function. A dataset from a recent AIDS clinical trial is used to illustate the new proposal.

Journal ArticleDOI
TL;DR: In this article, a nonparametric procedure for testing for monotonicity of a regression mean with guaranteed level is proposed, based on signs of differences of observations from the response variable.
Abstract: In this paper a nonparametric procedure for testing for monotonicity of a regression mean with guaranteed level is proposed. The procedure is based on signs of differences of observations from the response variable. The test is calibrated against the most difficult null hypothesis, when the regression function is constant, and produces an exact test in this context. In general, the test is conservative. The power of the test is good, and comparable with that of other nonparametric tests. It is shown that the testing procedure has asymptotic power 1 against certain local alternatives. The method is also robust against heavy-tailed error distributions, and even maintains good power when the errors are for example Cauchy distributed. A simulation study is provided to demonstrate finite-sample behaviour of the testing procedure.