scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2009"


Book
07 Aug 2009
TL;DR: In this article, the authors propose single index models, nonparametric additive models and partially linear models, Binary-Response Models, Statistical Inverse Problems, Transformation Models, and Nonparametric Additive Models.
Abstract: Single-Index Models.- Nonparametric Additive Models and Semiparametric Partially Linear Models.- Binary-Response Models.- Statistical Inverse Problems.- Transformation Models.

252 citations


Journal ArticleDOI
TL;DR: A fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology allows more streamlined handling of longitudinal and spatial correlation.
Abstract: Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

222 citations


Journal ArticleDOI
TL;DR: In this paper, a class of marginal partially linear quantile models with possibly varying coefficients is studied, where the functional coefficients are estimated by basis function approximations, and rank score tests for hypotheses on the coefficients are developed.
Abstract: Semiparametric models are often considered for analyzing longitudinal data for a good balance between flexibility and parsimony. In this paper, we study a class of marginal partially linear quantile models with possibly varying coefficients. The functional coefficients are estimated by basis function approximations. The estimation procedure is easy to implement, and it requires no specification of the error distributions. The asymptotic properties of the proposed estimators are established for the varying coefficients as well as for the constant coefficients. We develop rank score tests for hypotheses on the coefficients, including the hypotheses on the constancy of a subset of the varying coefficients. Hypothesis testing of this type is theoretically challenging, as the dimensions of the parameter spaces under both the null and the alternative hypotheses are growing with the sample size. We assess the finite sample performance of the proposed method by Monte Carlo simulation studies, and demonstrate its value by the analysis of an AIDS data set, where the modeling of quantiles provides more comprehensive information than the usual least squares approach.

203 citations


Journal ArticleDOI
TL;DR: In this paper, the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values.
Abstract: Summary. Spline-based approaches to non-parametric and semiparametric regression, as well as to regression of scalar outcomes on functional predictors, entail choosing a parameter controlling the extent to which roughness of the fitted function is penalized. We demonstrate that the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values. These ideas are illustrated by application of functional principal component regression, a method for regressing scalars on functions, to two chemometric data sets.

201 citations


Journal ArticleDOI
04 Nov 2009-Test
TL;DR: The authors provide a review on the empirical likelihood method for regression-type inference problems, including parametric, semiparametric, and nonparametric models, and both missing data and censored data are accommodated.
Abstract: We provide a review on the empirical likelihood method for regression-type inference problems. The regression models considered in this review include parametric, semiparametric, and nonparametric models. Both missing data and censored data are accommodated.

152 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered semiparametric efficient estimation of conditional moment models with nonsmooth residuals in unknown parametric components and unknown functions of endogenous variables, and showed that the penalized sieve minimum distance (PSMD) estimator can simultaneously achieve root-n asymptotic normality of and nonparametric optimal convergence rate.

141 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide theoretical justifications for the use of bootstrap as a semiparametric inferential tool, and show that the bootstrap is asymptotically consistent in estimating the distribution of the $M$-estimate of Euclidean parameter.
Abstract: Consider $M$-estimation in a semiparametric model that is characterized by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. As a general purpose approach to statistical inferences, the bootstrap has found wide applications in semiparametric $M$-estimation and, because of its simplicity, provides an attractive alternative to the inference approach based on the asymptotic distribution theory. The purpose of this paper is to provide theoretical justifications for the use of bootstrap as a semiparametric inferential tool. We show that, under general conditions, the bootstrap is asymptotically consistent in estimating the distribution of the $M$-estimate of Euclidean parameter; that is, the bootstrap distribution asymptotically imitates the distribution of the $M$-estimate. We also show that the bootstrap confidence set has the asymptotically correct coverage probability. These general conclusions hold, in particular, when the nuisance parameter is not estimable at root-$n$ rate, and apply to a broad class of bootstrap methods with exchangeable bootstrap weights. This paper provides a first general theoretical study of the bootstrap in semiparametric models.

118 citations


Journal ArticleDOI
TL;DR: In this paper, a method of inference for general stochastic volatility models containing price jumps is proposed, which is based on treating realized multipower variation statistics calculated from high-frequency data as their unobservable (fill-in) asymptotic limits.

112 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric profile least-square based estimation procedure is developed for parametric and nonparametric components after calibrating the error-prone covariates, and asymptotic properties of the proposed estimators are established.
Abstract: We study semiparametric varying-coefficient partially linear models when some linear covariates are not observed, but ancillary variables are available. Semiparametric profile least-square based estimation procedures are developed for parametric and nonparametric components after we calibrate the error-prone covariates. Asymptotic properties of the proposed estimators are established. We also propose the profile least-square based ratio test and Wald test to identify significant parametric and nonparametric components. To improve accuracy of the proposed tests for small or moderate sample sizes, a wild bootstrap version is also proposed to calculate the critical values. Intensive simulation experiments are conducted to illustrate the proposed approaches.

94 citations


Journal ArticleDOI
TL;DR: The intrinsic regression model, which is a semiparametric model, uses a link function to map from the Euclidean space of covariates to the Riemannian manifold of positive-definite matrices, and develops an estimation procedure to calculate parameter estimates and establish their limiting distributions.
Abstract: The aim of this paper is to develop an intrinsic regression model for the analysis of positive-definite matrices as responses in a Riemannian manifold and their association with a set of covariates, such as age and gender, in a Euclidean space. The primary motivation and application of the proposed methodology is in medical imaging. Because the set of positive-definite matrices do not form a vector space, directly applying classical multivariate regression may be inadequate in establishing the relationship between positive-definite matrices and covariates of interest, such as age and gender, in real applications. Our intrinsic regression model, which is a semiparametric model, uses a link function to map from the Euclidean space of covariates to the Riemannian manifold of positive-definite matrices. We develop an estimation procedure to calculate parameter estimates and establish their limiting distributions. We develop score statistics to test linear hypotheses on unknown parameters and develop a test procedure based on a resampling method to simultaneously assess the statistical significance of linear hypotheses across a large region of interest. Simulation studies are used to demonstrate the methodology and examine the finite sample performance of the test procedure for controlling the family-wise error rate. We apply our methods to the detection of statistical significance of diagnostic effects on the integrity of white matter in a diffusion tensor study of human immunodeficiency virus. Supplemental materials for this article are available online.

88 citations


Journal ArticleDOI
TL;DR: An efficient Bayesian method is presented for the analysis of semiparametric models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered.
Abstract: We analyze a semiparametric model for data that suffer from the problems of sample selection, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits great flexibility in the way covariates affect response variables. We present an efficient Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered. Estimation is by Markov chain Monte Carlo (MCMC) methods. The algorithm we propose does not require simulation of the outcomes that are missing due to the selection mechanism, which reduces the computational load and improves the mixing of the MCMC chain. The approach is applied to a model of women’s labor force participation and log-wage determination. Data and computer code us...

Journal ArticleDOI
TL;DR: A broad class of semiparametric transformation models with random effects for the joint analysis of recurrent events and a terminal event and the estimators are shown to be consistent, asymptotically normal, and asymptonically efficient.
Abstract: We propose a broad class of semiparametric transformation models with random effects for the joint analysis of recurrent events and a terminal event. The transformation models include proportional hazards/intensity and proportional odds models. We estimate the model parameters by the nonparametric maximum likelihood approach. The estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Simple and stable numerical algorithms are provided to calculate the parameter estimators and to estimate their variances. Extensive simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two HIV/AIDS studies are presented.

Journal ArticleDOI
TL;DR: In this paper, a two-step semiparametric maximum likelihood estimator is proposed for the coefficients of a single index binary choice model with endogenous regressors when identification is achieved via a control function approach.

Journal ArticleDOI
TL;DR: This article avoids any parametric assumption for the random effects distribution and leaves it completely unspecified, which leads to model misspecification with a potential effect on the parameter estimates and standard errors.
Abstract: Longitudinal studies often generate incomplete response patterns according to a missing not at random mechanism. Shared parameter models provide an appealing framework for the joint modelling of the measurement and missingness processes, especially in the nonmonotone missingness case, and assume a set of random effects to induce the interdependence. Parametric assumptions are typically made for the random effects distribution, violation of which leads to model misspecification with a potential effect on the parameter estimates and standard errors. In this article we avoid any parametric assumption for the random effects distribution and leave it completely unspecified. The estimation of the model is then made using a semi-parametric maximum likelihood method. Our proposal is illustrated on a randomized longitudinal study on patients with rheumatoid arthritis exhibiting nonmonotone missingness.

Journal ArticleDOI
TL;DR: A semiparametric Bayesian approach for assessing the relationship between functional predictors and a response is proposed and it is found that the model successfully predicts early pregnancy loss.
Abstract: Motivated by the need to understand and predict early pregnancy loss using hormonal indicators of pregnancy health, this article proposes a semiparametric Bayesian approach for assessing the relationship between functional predictors and a response. A multivariate adaptive spline model is used to describe the functional predictors, and a generalized linear model with a random intercept describes the response. Through specifying the random intercept to follow a Dirichlet process jointly with the random spline coefficients, we obtain a procedure that clusters trajectories according to shape and according to the parameters of the response model for each cluster. This very flexible method allows for the incorporation of covariates in the models for both the response and the trajectory. We apply the method to postovulatory progesterone data from the Early Pregnancy Study and find that the model successfully predicts early pregnancy loss.

Journal ArticleDOI
TL;DR: A semiparametric nonmixture cure model for the regression analysis of interval-censored time-to-event data is presented and the strong consistency of the maximum likelihood estimators under the Hellinger distance is proved.
Abstract: Motivated by medical studies in which patients could be cured of disease but the disease event time may be subject to interval censoring, we presents a semiparametric non-mixture cure model for the regression analysis of interval-censored time-to-event datxa. We develop semiparametric maximum likelihood estimation for the model using the expectation-maximization method for interval-censored data. The maximization step for the baseline function is nonparametric and numerically challenging. We develop an efficient and numerically stable algorithm via modern convex optimization techniques, yielding a self-consistency algorithm for the maximization step. We prove the strong consistency of the maximum likelihood estimators under the Hellinger distance, which is an appropriate metric for the asymptotic property of the estimators for interval-censored data. We assess the performance of the estimators in a simulation study with small to moderate sample sizes. To illustrate the method, we also analyze a real data set from a medical study for the biochemical recurrence of prostate cancer among patients who have undergone radical prostatectomy. Supplemental materials for the computational algorithm are available online.

Journal ArticleDOI
TL;DR: This paper extends the induced smoothing procedure for the semiparametric accelerated failure time model to the case of clustered failure time data and proves that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing.
Abstract: SUMMARY This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton–Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost.

ReportDOI
TL;DR: In this article, the authors considered the efficient estimation of copula-based semiparametric strictly stationary Markov models and proposed a sieve maximum likelihood estimation (MLE) for the copula parameter, the invariant distribution and the conditional quantiles.
Abstract: This paper considers the efficient estimation of copula-based semiparametric strictly stationary Markov models. These models are characterized by nonparametric invariant (one-dimensional marginal) distributions and parametric bivariate copula functions where the copulas capture temporal dependence and tail dependence of the processes. The Markov processes generated via tail dependent copulas may look highly persistent and are useful for financial and economic applications. We first show that Markov processes generated via Clayton, Gumbel and Student’s t copulas and their survival copulas are all geometrically ergodic. We then propose a sieve maximum likelihood estimation (MLE) for the copula parameter, the invariant distribution and the conditional quantiles. We show that the sieve MLEs of any smooth functional is root-n consistent, asymptotically normal and efficient and that their sieve likelihood ratio statistics are asymptotically chi-square distributed. Monte Carlo studies indicate that, even for Markov models generated via tail dependent copulas and fat-tailed marginals, our sieve MLEs perform very well.

Journal ArticleDOI
TL;DR: The approach of [1], which is based on comparing a "true distribution" to a convex mixture of perturbed distributions to a comparison of two convex mixtures, is extended and applied to two examples of semiparametric functionals: the estimation of a mean response when response data are missing at random, andThe estimation of an expected conditional covariance functional.
Abstract: We consider the minimax rate of testing (or estimation) of non-linear functionals defined on semiparametric models. Existing methods appear not capable of determining a lower bound on the minimax rate of testing (or estimation) for certain functionals of interest. In particular, if the semiparametric model is indexed by several infinite-dimensional parameters. To cover these examples we extend the approach of [1], which is based on comparing a "true distribution" to a convex mixture of perturbed distributions to a comparison of two convex mixtures. The first mixture is obtained by perturbing a first parameter of the model, and the second by perturbing in addition a second parameter. We apply the new result to two examples of semiparametric functionals:the estimation of a mean response when response data are missing at random, and the estimation of an expected conditional covariance functional.

Journal ArticleDOI
TL;DR: In this article, a likelihood-based estimator for a double-index, semiparametric binary response equation is proposed, which is based on density estimation under local smoothing.
Abstract: SUMMARY This paper formulates a likelihood-based estimator for a double-index, semiparametric binary response equation. A novel feature of this estimator is that it is based on density estimation under local smoothing. While the proofs differ from those based on alternative density estimators, the finite sample performance of the estimator is significantly improved. As binary responses often appear as endogenous regressors in continuous outcome equations, we also develop an optimal instrumental variables estimator in this context. For this purpose, we specialize the double-index model for binary response to one with heteroscedasticity that depends on an index different from that underlying the ‘mean response’. We show that such (multiplicative) heteroscedasticity, whose form is not parametrically specified, effectively induces exclusion restrictions on the outcomes equation. The estimator developed exploits such identifying information. We provide simulation evidence on the favorable performance of the estimators and illustrate their use through an empirical application on the determinants, and affect, of attendance at a government-financed school. Copyright  2009 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors considered a more realistic semi-parametric INAR(p) model where there are essentially no restrictions on the innovation distribution and provided an (semiparametrically) efficient estimator of both the auto-regression parameters and the distribution.
Abstract: Summary. Integer-valued auto-regressive (INAR) processes have been introduced to model non-negative integer-valued phenomena that evolve over time. The distribution of an INAR(p) process is essentially described by two parameters: a vector of auto-regression coefficients and a probability distribution on the non-negative integers, called an immigration or innovation distribution. Traditionally, parametric models are considered where the innovation distribution is assumed to belong to a parametric family. The paper instead considers a more realistic semiparametric INAR(p) model where there are essentially no restrictions on the innovation distribution. We provide an (semiparametrically) efficient estimator of both the auto-regression parameters and the innovation distribution.

Journal ArticleDOI
TL;DR: A bound on how close the solution is to a true sparse signal in the case where the number of covariates is large is established and this is applied to a breast cancer data set with gene expression recordings and to the primary biliary cirrhosis clinical data.
Abstract: This paper considers covariate selection for the additive hazards model. This model is particularly simple to study theoretically and its practical implementation has several major advantages to the similar methodology for the proportional hazards model. One complication compared with the proportional model is, however, that there is no simple likelihood to work with. We here study a least squares criterion with desirable properties and show how this criterion can be interpreted as a prediction error. Given this criterion, we define ridge and Lasso estimators as well as an adaptive Lasso and study their large sample properties for the situation where the number of covariates p is smaller than the number of observations. We also show that the adaptive Lasso has the oracle property. In many practical situations, it is more relevant to tackle the situation with large p compared with the number of observations. We do this by studying the properties of the so-called Dantzig selector in the setting of the additive risk model. Specifically, we establish a bound on how close the solution is to a true sparse signal in the case where the number of covariates is large. In a simulation study, we also compare the Dantzig and adaptive Lasso for a moderate to small number of covariates. The methods are applied to a breast cancer data set with gene expression recordings and to the primary biliary cirrhosis clinical data.

Journal ArticleDOI
TL;DR: In this paper, a variable selection procedure by combining basis function approximations with SCAD penalty for semiparametric varying coefficient partially linear models is presented, which simultaneously selects significant variables in the parametric components and the nonparametric components.

Journal ArticleDOI
TL;DR: In this article, the estimation methods of three-dimensional ROC surfaces with nonparametric and semiparametric estimators are provided as a basis for statistical inference, and simulation studies are performed to assess the validity of their proposed methods in finite samples.

Book ChapterDOI
01 Oct 2009
TL;DR: The intrinsic regression model, which is a semiparametric model, uses a link function to map from the Euclidean space of covariates to the Riemannian manifold of manifold data and develops an estimation procedure to calculate an intrinsic least square estimator and establish its limiting distribution.
Abstract: In medical imaging analysis and computer vision, there is a growing interest in analyzing various manifold-valued data including 3D rotations, planar shapes, oriented or directed directions, the Grassmann manifold, deformation field, symmetric positive definite (SPD) matrices and medial shape representations (m-rep) of subcortical structures. Particularly, the scientific interests of most population studies focus on establishing the associations between a set of covariates (e.g., diagnostic status, age, and gender) and manifold-valued data for characterizing brain structure and shape differences, thus requiring a regression modeling framework for manifold-valued data. The aim of this paper is to develop an intrinsic regression model for the analysis of manifold-valued data as responses in a Riemannian manifold and their association with a set of covariates, such as age and gender, in Euclidean space. Because manifold-valued data do not form a vector space, directly applying classical multivariate regression may be inadequate in establishing the relationship between manifold-valued data and covariates of interest, such as age and gender, in real applications. Our intrinsic regression model, which is a semiparametric model, uses a link function to map from the Euclidean space of covariates to the Riemannian manifold of manifold data. We develop an estimation procedure to calculate an intrinsic least square estimator and establish its limiting distribution. We develop score statistics to test linear hypotheses on unknown parameters. We apply our methods to the detection of the difference in the morphological changes of the left and right hippocampi between schizophrenia patients and healthy controls using medial shape description.

Journal ArticleDOI
TL;DR: In this paper, an alternative individual claim loss model is proposed, which has a semiparametric structure and can be used to fit flexibly the claim loss reserving, and local likelihood is employed to estimate the parametric and nonparametric components of the model.
Abstract: The estimation of loss reserves for incurred but not reported (IBNR) claims presents an important task for insurance companies to predict their liabilities. Conventional methods, such as ladder or separation methods based on aggregated or grouped claims of the so-called “run-off triangle”, have been illustrated to have some drawbacks. Recently, individual claim loss models have attracted a great deal of interest in actuarial literature, which can overcome the shortcomings of aggregated claim loss models. In this paper, we propose an alternative individual claim loss model, which has a semiparametric structure and can be used to fit flexibly the claim loss reserving. Local likelihood is employed to estimate the parametric and nonparametric components of the model, and their asymptotic properties are discussed. Then the prediction of the IBNR claim loss reserving is investigated. A simulation study is carried out to evaluate the performance of the proposed methods.

Journal ArticleDOI
TL;DR: In this article, the semiparametric efficiency bound for finite dimensional parameters identified by models of sequential moment restrictions containing unknown functions was derived for two-stage plug-in problems and an optimally weighted, orthogonalized, sieve minimum distance estimator was presented.

Journal ArticleDOI
TL;DR: The standard error method represents a new method for estimating variability of nonparametric estimators in semiparametric problems, and it is found that for estimating the parametric part of the model, standard bandwidth choices of order O(n(-1/5)) are sufficient to ensure asymptotic normality, and undersmoothing is not required.
Abstract: SIMEX is a general-purpose technique for measurement error correction. There is a substantial literature on the application and theory of SIMEX for purely parametric problems, as well as for purely nonparametric regression problems, but there is neither application nor theory for semiparametric problems. Motivated by an example involving radiation dosimetry, we develop the basic theory for SIMEX in semiparametric problems using kernel-based estimation methods. This includes situations that the mismeasured variable is modeled purely parametrically, purely nonparametrically, or that the mismeasured variable has components that are modeled both parametrically and nonparametrically. Using our asymptotic expansions, easily computed standard error formulae are derived, as are the bias properties of the nonparametric estimator. The standard error method represents a new method for estimating variability of nonparametric estimators in semiparametric problems, and we show in both simulations and in our example that it improves dramatically on first order methods. We find that for estimating the parametric part of the model, standard bandwidth choices of order O(n−1/5) are sufficient to ensure asymptotic normality, and undersmoothing is not required. SIMEX has the property that it fits misspecified models, namely ones that ignore the measurement error. Our work thus also more generally describes the behavior of kernel-based methods in misspecified semiparametric problems.

Journal ArticleDOI
TL;DR: A semiparametric model is introduced to account for varying impacts of factors over clusters by using cluster-level covariates, which achieves the parsimony of parametrization and allows the explorations of nonlinear interactions.
Abstract: In the analysis of cluster data the regression coefficients are frequently assumed to be the same across all clusters. This hampers the ability to study the varying impacts of factors on each cluster. In this paper, a semiparametric model is introduced to account for varying impacts of factors over clusters by using cluster-level covariates. It achieves the parsimony of parametrization and allows the explorations of nonlinear interactions. The random effect in the semiparametric model accounts also for within cluster correlation. Local linear based estimation procedure is proposed for estimating functional coefficients, residual variance, and within cluster correlation matrix. The asymptotic properties of the proposed estimators are established and the method for constructing simultaneous confidence bands are proposed and studied. In addition, relevant hypothesis testing problems are addressed. Simulation studies are carried out to demonstrate the methodological power of the proposed methods in the finite sample. The proposed model and methods are used to analyse the second birth interval in Bangladesh, leading to some interesting findings.

Journal ArticleDOI
TL;DR: In this article, a semi-parametric approach for jointly estimating revealed and stated preference recreation demand models is presented, which allows for correlation across demand equations and incorporates unobserved heterogeneity.