scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 1995"


Journal ArticleDOI
TL;DR: In this article, the authors proposed a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on the vector of explanatory variables in the presence of missing response data.
Abstract: We propose a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on a vector of explanatory variables in the presence of missing response data. The proposed estimators do not require full specification of the likelihood. They can be viewed as an extension of generalized estimating equations estimators that allow for the data to be missing at random but not missing completely at random. These estimators can be used to correct for dependent censoring and nonrandom noncompliance in randomized clinical trials studying the effect of a treatment on the evolution over time of the mean of a response variable. The likelihood-based parametric G-computation algorithm estimator may also be used to attempt to correct for dependent censoring and nonrandom noncompliance. But because of possible model misspecification, the parametric G-computation algorithm estimator, in contrast with the proposed w...

1,510 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the properties of a semiparametric method for estimating the dependence parameters in a family of multivariate distributions and proposed an estimator, obtained as a solution of a pseudo-likelihood equation, which is consistent, asymptotically normal and fully efficient at independence.
Abstract: SUMMARY This paper investigates the properties of a semiparametric method for estimating the dependence parameters in a family of multivariate distributions. The proposed estimator, obtained as a solution of a pseudo-likelihood equation, is shown to be consistent, asymptotically normal and fully efficient at independence. A natural estimator of its asymptotic variance is proved to be consistent. Comparisons are made with alternative semiparametric estimators in the special case of Clayton's model for association in bivariate data.

1,280 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the efficiency bound for the estimation of the parameters of semiparametric models defined solely by restrictions on the means of a vector of correlated outcomes, Y, when the data on Y are missing at random.
Abstract: We consider the efficiency bound for the estimation of the parameters of semiparametric models defined solely by restrictions on the means of a vector of correlated outcomes, Y, when the data on Y are missing at random. We show that the semiparametric variance bound is the asymptotic variance of the optimal estimator in a class of inverse probability of censoring weighted estimators and that this bound is unchanged if the data are missing completely at random. For this case we study the asymptotic performance of the generalized estimating equations (GEE) estimators of mean parameters and show that the optimal GEE estimator is inefficient except for special cases. The optimal weighted estimator depends on unknown population quantities. But for monotone missing data, we propose an adaptive estimator whose asymptotic variance can achieve the bound.

937 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed semiparametric procedures to make inferences for median regression models with possibly censored observations using simulated annealing algorithm, which can be implemented efficiently using a simulated annesaling algorithm.
Abstract: The median is a simple and meaningful measure for the center of a long-tailed survival distribution. To examine the covariate effects on survival, a natural alternative to the usual mean regression model is to regress the median of the failure time variable or a transformation thereof on the covariates. In this article we propose semiparametric procedures to make inferences for such median regression models with possibly censored observations. Our proposals can be implemented efficiently using a simulated annealing algorithm. Numerical studies are conducted to show the advantage of the new procedures over some recently developed methods for the accelerated failure time model, a special type of mean regression models in the survival analysis. The proposals discussed in the article are illustrated with a lung cancer data set.

300 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose two consistent one-sided specification tests for parametric regression models, one based on the sample covariance between the residual from the parametric model and the discrepancy between parametric and nonparametric fitted values, and the other based on a difference in sums of squared residuals between the parameterized and non-parametric models, which can be viewed as a test of the joint hypothesis that the true parameters of a series regression model are zero.
Abstract: This paper proposes two consistent one-sided specification tests for parametric regression models, one based on the sample covariance between the residual from the parametric model and the discrepancy between the parametric and nonparametric fitted values ; the other based on the difference in sums of squared residuals between the parametric and nonparametric models. We estimate the nonparametric model by series regression. The new test statistics converge in distribution to a unit normal under correct specification and grow to infinity faster than the parametric rate (n -1/2 ) under misspecification, while avoiding weighting, sample splitting, and non-nested testing procedures used elsewhere in the literature. Asymptotically, our tests can be viewed as a test of the joint hypothesis that the true parameters of a series regression model are zero, where the dependent variable is the residual from the parametric model, and the series terms are functions of the explanatory variables, chosen so as to support nonparametric estimation of a conditional expectation. We specifically consider Fourier series and regression splines, and present a Monte Carlo study of the finite sample performance of the new tests in comparison to consistent tests of Bierens (1990), Eubank and Spiegelman (1990), Jayasuriya (1990), Wooldridge (1992), and Yatchew (1992) ; the results show the new tests have good power, performing quite well in some situations. We suggest a joint Bonferroni procedure that combines a new test with those of Bierens and Wooldridge to capture the best features of the three approaches.

237 citations


Journal ArticleDOI
TL;DR: In this paper, a number of consistency results for nonparametric kernel estimators of density and regression functions and their derivatives are presented, which allow for near-epoch dependent, nonidentically distributed random variables, data-dependent bandwidth sequences, preliminary estimation of parameters, and non-parametric regression on index functions.
Abstract: This paper presents a number of consistency results for nonparametric kernel estimators of density and regression functions and their derivatives. These results are particularly useful in semiparametric estimation and testing problems that rely on preliminary nonparametric estimators, as in Andrews (1994, Econometrica 62, 43–72). The results allow for near-epoch dependent, nonidentically distributed random variables, data-dependent bandwidth sequences, preliminary estimation of parameters (e.g., nonparametric regression based on residuals), and nonparametric regression on index functions.

235 citations


Journal ArticleDOI
TL;DR: The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982), and a logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part ofThe model.
Abstract: A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the endpoint. The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982). A logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part of the model. The estimator arises naturally out of the EM algorithm approach for fitting failure time mixture models as described by Larson and Dinse (1985). The procedure is applied to some experimental data from radiation biology and is evaluated in a Monte Carlo simulation study. The simulation study suggests the semi-parametric procedure is almost as efficient as the correct fully parametric procedure for estimating the regression coefficient in the incidence, but less efficient for estimating the latency distribution.

234 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a class of semiparametric methods that are designed to work better than the kernel estimator in a broad nonparametric neighbourhood of a given parametric class of densities, for example, the normal, while not losing much in precision when the true density is far from the parametric classes.
Abstract: The traditional kernel density estimator of an unknown density is by construction completely nonparametric in the sense that it has no preferences and will work reasonably well for all shapes. The present paper develops a class of semiparametric methods that are designed to work better than the kernel estimator in a broad nonparametric neighbourhood of a given parametric class of densities, for example, the normal, while not losing much in precision when the true density is far from the parametric class. The idea is to multiply an initial parametric density estimate with a kernel-type estimate of the necessary correction factor. This works well in cases where the correction factor function is less rough than the original density itself. Extensive comparisons with the kernel estimator are carried out, including exact analysis for the class of all normal mixtures. The new method, with a normal start, wins quite often, even in many cases where the true density is far from normal. Procedures for choosing the smoothing parameter of the estimator are also discussed. The new estimator should be particularly useful in higher dimensions, where the usual nonparametric methods have problems. The idea is also spelled out for nonparametric regression.

232 citations


Journal ArticleDOI
TL;DR: In this article, an estimated partial likelihood method is proposed for estimating relative risk parameters, which is an extension of the estimated likelihood regression analysis method for uncensored data (Pepe, 1992; Pepe & Fleming, 1991).
Abstract: SUMMARY We consider the problem of missing covariate data in the context of censored failure time relative risk regression. Auxiliary covariate data, which are considered informative about the missing data but which are not explicitly part of the relative risk regression model, may be available. Full covariate information is available for a validation set. An estimated partial likelihood method is proposed for estimating relative risk parameters. This method is an extension of the estimated likelihood regression analysis method for uncensored data (Pepe, 1992; Pepe & Fleming, 1991). A key feature of the method is that it is nonparametric with respect to the association between the missing and observed, including auxiliary, covariate components. Asymptotic distribution theory is derived for the proposed estimated partial likelihood estimator in the case where the auxiliary or mismeasured covariates are categorical. Asymptotic efficiencies are calculated for exponential failure times using an exponential relative risk model. The estimated partial likelihood estimator compares favourably with a fully parametric maximum likelihood analysis. Comparisons are also made with a standard partial likelihood analysis which ignores the incomplete observations. Important efficiency gains can be made with the estimated partial likelihood method. Small sample properties are investigated through simulation studies.

163 citations


Journal ArticleDOI
TL;DR: Empirical modeling of high-frequency currency market data reveals substantial evidence for nonnormality, stochastic volatility, and other nonlinearities and develops a new method for estimation of structural economic models.

160 citations


Journal ArticleDOI
TL;DR: In this article, the authors apply OLS, the kernel nonparametric regression estimator, and the semi-parametric estimator of Powell, Stock, and Stoker (1989) to a data set, which should, based on theory and previous empirical work, yield positive coefficients.
Abstract: Parametric estimators, such as OLS, attain high efficiency for well-specified models. Nonparametric estimators greatly reduce specification error but at the cost of efficiency. Semiparametric estimators compromise between these dual goals of efficiency and specification error. Semiparametric estimators can assume general forms within classes of functional forms. This paper applies OLS, the kernel nonparametric regression estimator, and the semi-parametric estimator of Powell, Stock, and Stoker (1989) to a data set, which should, based on theory and previous empirical work, yield positive coefficients. The semiparametric estimator, on average, displayed the performance most consistent with prior expectations followed by the nonparametric and parametric estimators. In addition, the paper shows how the semiparametric estimator can provide insights into the form of misspecification and suggest data transformations.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new class of estimators that remains consistent and asymptotically normal even when the probability that X is missing depends on the observed V and Y.
Abstract: SUMMARY Pepe and Fleming, and Carroll and Wand have recently proposed estimators in a parametric model for the density of a random variable Y conditional on a vector of covariates (X, V) when data on one of the regressors X is missing for some study subjects. We propose a new class of estimators that remains consistent and asymptotically normal even when the probability that X is missing depends on the observed V and Y, includes an estimator whose asymptotic variance attains the semiparametric variance bound for the model and, when the data are missing completely at random, includes an estimator that is asymptotically equivalent to the inefficient estimators proposed by Pepe and Fleming and by Carroll and Wand. The optimal estimator in our class depends on the unknown probability law generating the data. When the vector V of non-missing regressors has at most two continuous components, we propose an adaptive semiparametric efficient estimator and compare the performance of the proposed semiparametric efficient estimator with the estimators proposed by Pepe and Fleming and Carroll and Wand in a small simulation study. When V has many continuous components, we propose an alternative class of adaptive estimators that should have high efficiency.

Journal ArticleDOI
TL;DR: In this article, a semiparametric estimation of discrete choice models with indices of which the density function is not bounded away from zero, issues on trimming indices, the correction of asymptotic bias, and issues of negative kernel density estimation are considered.

Journal ArticleDOI
TL;DR: In this paper, an iterative algorithm for estimating individual parameters as well as the model function is introduced under the assumption of a certain shape invariance: the individual regression curves are obtained from a common shape function by linear transformations of the axes.
Abstract: Given data from a sample of noisy curves, we consider a nonlinear parametric regression model with unknown model function. An iterative algorithm for estimating individual parameters as well as the model function is introduced under the assumption of a certain shape invariance: the individual regression curves are obtained from a common shape function by linear transformations of the axes. Our algorithm is based on least-squares methods for parameter estimation and on nonparametric kernel methods for curve estimation. Asymptotic distributions are derived for the individual parameter estimators as well as for the estimator of the shape function. An application to human growth data illustrates the method.

Journal ArticleDOI
TL;DR: In this article, the rate of contraction/expansion is formulated by a parametric function of the covariate process while the baseline failure time distribution is unspecified Estimating functions for the vector of regression parameters are motivated by likelihood score functions and take the form of log rank statistics with time-dependent covariates.

Journal ArticleDOI
TL;DR: In this paper, a local likelihood function is proposed to give more weight to observations near a region of interest in the sample space, which can be used for assessing local departures from a parametric model, and for semiparametric density estimation.
Abstract: By drawing an analogy with likelihood for censored data, a local likelihood function is proposed which gives more weight to observations near a region of interest in the sample space. Resulting methods can be used for assessing local departures from a parametric model, and for semiparametric density estimation. Some theory, and three examples, is given

Journal ArticleDOI
TL;DR: In this article, it was shown that a simple average of the observations of the dependent variable with the same regressor value will yield a root-n-consistent conditional expectation estimate, even when the discrete regressors have infinite support.
Abstract: This note is concerned with nonparametric and semiparametric inference in regression models where regressors are not continuous. When regressors are discrete with finite support, a mere average of those observations of the dependent variable with the same regressor value will yield a root-n-consistent conditional expectation estimate. We show that sequences of weights constructed in this way are consistent in the sense of Stone (1977), even when the discrete regressors have infinite support. The results are applied to the estimation of semiparametric models.

Journal ArticleDOI
TL;DR: The semiparametric maximum likelihood estimation is considered in a three-state duration dependent Markov process when times of the intermediate transition are interval censored and the times of transitions to an absorbing state are known exactly or are right censored.
Abstract: The semiparametric maximum likelihood estimation is considered in a three-state duration dependent Markov process when times of the intermediate transition (e.g., onset of a disease) are interval censored and the times of transitions to an absorbing state (e.g., death) are known exactly or are right censored. It is assumed that the intensity of the transition to an absorbing state depends both on chronological time and duration in the intermediate state. De Gruttola and Lagakos (1989, Biometrics 45, 1-11) and Frydman (1992, Journal of the Royal Statistical Society, Series B 54, 853-866) discussed non-parametric estimation from the same sampling scheme under the assumption that the intensity of transition to an absorbing state depends only on the duration in the intermediate state or only on the chronological time respectively. The approach taken here generalizes, but in discrete time framework, the results from Frydman (1992). The distribution of the time to the intermediate transition is modelled nonparametrically and the intensity of onset of terminal condition semiparametrically. The algorithm is developed for the computation of the estimators. The methods are illustrated with AIDS data.

Journal ArticleDOI
TL;DR: In this paper, a general semiparametric variance function model was proposed for fixed design regression, where the regression function is assumed to be smooth and is modelled nonparametrically, whereas the relation between the variance and the mean regression function was assumed to follow a generalized linear model.
Abstract: We propose a general semiparametric variance function model in a fixed design regression setting. In this model, the regression function is assumed to be smooth and is modelled nonparametrically, whereas the relation between the variance and the mean regression function is assumed to follow a generalized linear model. Almost all variance function models that were considered in the literature emerge as special cases. Least-squares-types estimates for the parameters of this model and the simultaneous estimation of the unknown regression and variance functions by means of nonparametric kernel estimates are combined to infer the parametric and nonparametric components of the proposed model. The asymptotic distribution of the parameter estimates is derived and is shown to follow usual parametric rates in spite of the presence of the nonparametric component in the model. This result is applied to obtain a data-based test for heteroscedasticity under minimal assumptions on the shape of the regression function.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the convergence rate of the finite-sample distribution to the normal limit distribution can equal that of standard parametric statistics, and that the rate of convergence of semiparametric averaged derivative estimates can also be asymptotically normal.
Abstract: With the same normalization as that for standard parametric statistics, and centered at a parameter of interest, many semiparametric estimates based on n observations have been shown to be root-n-consistent and asymptotically normal. In the context of semiparametric averaged derivative estimates, we go further by showing that the rate of convergence of the finite-sample distribution to the normal limit distribution can equal that of standard parametric statistics.

Journal ArticleDOI
TL;DR: In this paper, two types of semiparametric models are proposed for comparing two samples with possibly heterogeneous treatment effects, and the approximate finite sample precision of the estimators can be easily evaluated.
Abstract: Two types of semiparametric model are proposed for comparing two samples with possibly heterogeneous treatment effects. One model is a two-sample location-scale model which assumes that the Q-Q-plot of the two distributions involved is linear; the other is the class of two-sample transformation models which assume parametric forms for a probability plot-the receiver operating characteristic curve. The empirical process approach is used to construct strong approximations for the empirical curves of both types of plot and a convenient generalized least squares (GLS) estimator is derived. The approximate finite sample precision of the estimators can be easily evaluated. Asymptotically, the GLS estimator is shown to be efficient for the first model. A difficulty involved in developing the theory of asymptotic efficiency is discussed for the transformation models. The GLS estimator is shown numerically to approach the Fisher information bound for the proportional hazards model.

Journal ArticleDOI
TL;DR: In this article, a scale-change model is proposed to incorporate unobserved heterogeneity through a random effect that enters the baseline hazard function to change the time scale of the hazard function.
Abstract: Frailty models are effective in broadening the class of survival models and inducing dependence in multivariate survival distributions. In proportional hazards, the random effect multiplies the hazard function. The scale-change model incorporates unobserved heterogeneity through a random effect that enters the baseline hazard function to change the time scale. We interpret this random effect as frailty, or other unobserved risks that create heterogeneity in the population. This model produces a wide range of shapes for univariate survival and hazard functions. We extend this model to multivariate survival data by assuming that members of a group share a common random effect. This structure induces association among the survival times in a group and provides alternative association structures to the proportional hazards frailty model. We present parametric and semiparametric estimation techniques and illustrate these methods with an example.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a semiparametric estimation method for general regression models when some of the predictors are measured with error, and they show that the usual theory is essentially as good as one can do with this technique.
Abstract: We consider a semiparametric estimation method for general regression models when some of the predictors are measured with error. The technique relies on a kernel regression of the "true" covariate on all the observed covariates and surrogates. This requires a nonparametric regression in as many dimensions as there are covariates and surrogates. The usual theory copes with such higher-dimensional problems by using higher-order kernels, but this is unrealistic for most problems. We show that the usual theory is essentially as good as one can do with this technique. Instead of regression with higher-order kernels, we propose the use of dimension reduction techniques. We assume that the "true" covariate depends only on a linear combination of the observed covariates and surrogates. If this linear combination were known, we could apply the one-dimensional versions of the semiparametric problem, for which standard kernels are applicable. We show that if one can estimate the linear directions at the root-$n$ rate, then asymptotically the resulting estimator of the parameters in the main regression model behaves as if the linear combination were known. Simulations lend some credence to the asymptotic results.

Journal Article
TL;DR: In this paper, a class of inverse probability-of-censoring-weighted estimating equations for jointly estimating the parameters of models for the conditional mean and covariance of a vector of responses given a set of regressors in the presence of monotone missing outcome data is presented.
Abstract: In this article we describe a class of inverse-probability-of-censoring-weighted estimating equations for jointly estimating the parameters of models for the conditional mean and covariance of a vector of responses given a set of regressors in the presence of monotone missing outcome data. Our methods are valid when the data are missing at random in the sense of Rubin (1976) and do not require a parametric model for the joint distribution of the data. However, they do require a model for the non-responsive probabilities. We show that the solution to the optimal estimating equation in our class has asymptotic variance equal to the semiparametric variance bound. Because the optimal estimating equation depends on unknown population parameters, we propose an adaptive locally efficient estimator whose asymptotic variance can achieve the semiparametric variance bound.

Journal ArticleDOI
TL;DR: In this article, a regression method is developed for a general class of functionals, and the regression parameters are estimated by maximizing a profiled nonparametric or empirical likelihood based on a local estimate of the conditional distribution function.
Abstract: A regression method is developed for a general class of functionals. A semiparametric linear model is adopted, and the regression parameters are estimated by maximizing a profiled nonparametric or empirical likelihood based on a local estimate of the conditional distribution function. Simulated and real data examples are shown, including an application of quantile regression to censored survival data from a clinical trial for myeloma.

Journal ArticleDOI
TL;DR: In this article, empirical Bayes estimation is used to improve linear smoothing estimates when multiple curves are available, which can be used as an alternative to parametric models for growth curve analysis.
Abstract: Nonparametric and semiparametric regression have been suggested as alternatives to parametric models for growth curve analysis. In this article we demonstrate that empirical Bayes estimation can be used to improve linear smoothing estimates when multiple curves are available.

Journal ArticleDOI
TL;DR: In this article, a nonparametric estimator of the transformation in the TBS model allowing general smooth monotonic transformations is proposed, and asymptotic properties of this estimator are discussed.
Abstract: The transform-both-sides (TBS) regression model developed by Carroll and Ruppert is applicable when the relationship between the median response and the independent variables has been identified. Several different families of transformations, such as the Box-Cox power family, have been considered in the parametric approach to this model. In this article, we propose a nonparametric estimator of the transformation in the TBS model allowing general smooth monotonic transformations. Asymptotic properties of this estimator are discussed.

Journal ArticleDOI
TL;DR: In this article, the Fisher information is calculated based on the solution of a pair of integral equations which are derived from a class of more general semiparametric models and a one-step estimate is constructed using an initial N$-consistent estimate and shown to be asymptotically efficient.
Abstract: This paper considers efficient estimation of the Euclidean parameter $\theta$ in the proportional odds model $G(1 - G)^{-1} = \theta F(1 - F)^{-1}$ when two independent i.i.d. samples with distributions $F$ and $G$, respectively, are observed. The Fisher information $I(\theta)$ is calculated based on the solution of a pair of integral equations which are derived from a class of more general semiparametric models. A one-step estimate is constructed using an initial $\sqrt N$-consistent estimate and shown to be asymptotically efficient in the sense that its asymptotic risk achieves the corresponding minimax lower bound.

Journal ArticleDOI
TL;DR: In this article, the authors used the maximum likelihood and regression splines to derive estimates of the parametric and nonparametric components of semiparametric generalized linear models, and the resulting estimators of both components are shown to be consistent.
Abstract: We use the method of maximum likelihood and regression splines to derive estimates of the parametric and nonparametric components of semiparametric generalized linear models. The resulting estimators of both components are shown to be consistent. Also, the asymptotic theory for the estimator of the parametric component is derived, indicating that the parametric component can be estimated efficiently without under-smoothing the nonparametric component.

Journal ArticleDOI
TL;DR: An extension of the existing models, which removes this constraint, is proposed and the resulting model is semi-parametric and requires computationally intensive techniques for likelihood evaluation.
Abstract: Threshold methods for multivariate extreme values are based on the use of asymptotically justified approximations of both the marginal distributions and the dependence structure in the joint tail. Models derived from these approximations are fitted to a region of the observed joint tail which is determined by suitably chosen high thresholds. A drawback of the existing methods is the necessity for the same thresholds to be taken for the convergence of both marginal and dependence aspects, which can result in inefficient estimation. In this paper an extension of the existing models, which removes this constraint, is proposed. The resulting model is semi-parametric and requires computationally intensive techniques for likelihood evaluation. The methods are illustrated using a coastal engineering application.