scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2000"



Journal ArticleDOI
TL;DR: A general nonparametric mixture model that extends models and improves estimation methods proposed by other researchers and extends Cox's proportional hazards regression model by allowing a proportion of event-free patients and investigating covariate effects on that proportion.
Abstract: Nonparametric methods have attracted less attention than their parametric counterparts for cure rate analysis. In this paper, we study a general nonparametric mixture model. The proportional hazards assumption is employed in modeling the effect of covariates on the failure time of patients who are not cured. The EM algorithm, the marginal likelihood approach, and multiple imputations are employed to estimate parameters of interest in the model. This model extends models and improves estimation methods proposed by other researchers. It also extends Cox's proportional hazards regression model by allowing a proportion of event-free patients and investigating covariate effects on that proportion. The model and its estimation method are investigated by simulations. An application to breast cancer data, including comparisons with previous analyses using a parametric model and an existing nonparametric model by other researchers, confirms the conclusions from the parametric model but not those from the existing nonparametric model.

404 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a method for estimating home values by non-parametrically incorporating the physical location of the properties, allowing the parameters of the observed covariates to vary in space.
Abstract: This paper presents a method for estimating home values by non-parametrically incorporating the physical location of the properties. Specifically, I allow the parameters of the observed covariates to vary in space. This approach mitigates one of the biggest deficiencies inherent in hedonic pricing models–omitted variables. I demonstrate the advantages of the proposed method using real estate transaction data from Los Angeles County. The estimation finds a substantial spatial variation of the marginal values of the hedonic characteristics and provides an insight into the segmentation of the market. The proposed method is an extension of semi-parametric multi-dimensional k-nearest-neighbor smoothing. It alleviates a fundamental problem known as the curse of dimensionality by incorporating parametric components into a non-parametric estimation.

129 citations


Journal ArticleDOI
Qi Li1
TL;DR: In this paper, the problem of estimating an additive partially linear model using general series estimation methods with polynomial and splines as two leading cases was considered, and it was shown that the finite-dimensional parameter is identified under weak conditions.
Abstract: I consider the problem of estimating an additive partially linear model using general series estimation methods with polynomial and splines as two leading cases. I show that the finite-dimensional parameter is identified under weak conditions. I establish the root-n-normality result for the finite-dimensional parameter in the linear part of the model and show that it is asymptotically more efficient than a semiparametric estimator that ignores the additive structure. When the error is conditional homoskedastic, my finite-dimensional parameter estimator reaches the semiparametric efficiency bound. Efficient estimation when the error is conditional heteroskedastic is also discussed.

113 citations


Journal ArticleDOI
TL;DR: A semiparametric approach to the proportional hazards regression analysis of interval-censored data and found multiple imputation to yield an easily computed variance estimate that appears to be more reliable than asymptotic methods with small to moderately sized data sets.
Abstract: We propose a semiparametric approach to the proportional hazards regression analysis of interval-censored data. An EM algorithm based on an approximate likelihood leads to an M-step that involves maximizing a standard Cox partial likelihood to estimate regression coefficients and then using the Breslow estimator for the unknown baseline hazards. The E-step takes a particularly simple form because all incomplete data appear as linear terms in the complete-data log likelihood. The algorithm of Turnbull (1976, Journal of the Royal Statistical Society, Series B 38, 290-295) is used to determine times at which the hazard can take positive mass. We found multiple imputation to yield an easily computed variance estimate that appears to be more reliable than asymptotic methods with small to moderately sized data sets. In the right-censored survival setting, the approach reduces to the standard Cox proportional hazards analysis, while the algorithm reduces to the one suggested by Clayton and Cuzick (1985, Applied Statistics 34, 148-156). The method is illustrated on data from the breast cancer cosmetics trial, previously analyzed by Finkelstein (1986, Biometrics 42, 845-854) and several subsequent authors.

102 citations


Reference BookDOI
TL;DR: In this paper, a review of empirical fourier analysis in scientific problems modeling and inference for periodically correlated time series modeling time series of count data seasonal and cyclical long memory nonparametric specification procedures for time series parameter estimation and model selection for multistep prediction of a time series.
Abstract: Some examples of empirical fourier analysis in scientific problems modeling and inference for periodically correlated time series modeling time series of count data seasonal and cyclical long memory nonparametric specification procedures for time series parameter estimation and model selection for multistep prediction of a time series - a review nonlinear estimation for time series observed on arrays some contributions to multivariate nonlinear time series and to bilinear models optimal testing for semiparametric AR models - from Gaussian Lagrange multipliers to autoregression rank scores and adaptive tests statistical analysis based on functionals of nonparametric spectral density estimators efficient estimation in a semiparametric additive regression model with ARMA errors efficient estimation in Markov chain models - an introduction nonparametric functional estimation - an overview minimum distance and nonparametric dispersion functions estimators of changes on inverse estimation approaches for semiparametric Bayesian regression consistency issues in Bayesian nonparametrics breakdown theory for estimators based on bootstrap and other resampling schemes on second-order properties of the stationary bootstrap method for studentized statistics convergence to equilibrium of random dynamical systems generated by IID monotone maps, with applications to economics chi-squared tests of goodness-of-fit for dependent observations positive and negative dependence with some statistical applications second-order information loss due to nuisance parameters - a simple measure. Appendix: publications of Madan Lal Puri.

101 citations


Journal ArticleDOI
TL;DR: In this paper, a fully Bayesian approach to regression splines with automatic knot selection in generalized semiparametric models for fundamentally non-Gaussian responses is presented, which allows simultaneous estimation both of the number of knots and the knot placement, together with the unknown basis coefficients determining the shape of the spline.
Abstract: This article presents a fully Bayesian approach to regression splines with automatic knot selection in generalized semiparametric models for fundamentally non-Gaussian responses. In a basis function representation of the regression spline we use a B-spline basis. The reversible jump Markov chain Monte Carlo method allows for simultaneous estimation both of the number of knots and the knot placement, together with the unknown basis coefficients determining the shape of the spline. Since the spline can be represented as design matrix times unknown (basis) coefficients, it is straightforward to include additionally a vector of covariates with fixed effects, yielding a semiparametric model. The method is illustrated with datasets from the literature for curve estimation in generalized linear models, the Tokyo rainfall data, and the coal mining disaster data, and by a credit-scoring problem for generalized semiparametric models.

99 citations


Journal ArticleDOI
TL;DR: In this article, a nonparametric regression estimator is proposed that uses prior information on regression shape in the form of a parametric model. But it is not suitable for binary data.

90 citations


Journal ArticleDOI
Qi Li1
TL;DR: In this paper, the authors provide a central place for those who want to check out the semiparametric literature and present technical materials in a non-technical way so that graduate students and applied econometricians who often do not have expertise in this area can comprehend the new methods.
Abstract: Semiparametric and nonparametric estimation have attracted a great deal of attention from statisticians and theoretical econometricians in the past decade. Various new models have been proposed, and new methods for estimating those models have been suggested. These new models and methods are scattered in various academic journals and are not easily accessible to other researchers. Moreover, the new methods are technical, not easily understood by nonexperts such as graduate students and applied econometricians. Horowitz has two goals in his book Semiparametric Methods in Econometrics. First, he wants to provide a central place for those who want to check out the semiparametric literature. Second, he wants to present technical materials in a nontechnical way so that graduate students and applied econometricians who often do not have expertise in this area can comprehend the new methods. In my view, Horowitz has achieved both goals.

77 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric model which relates the mean of the response variable at each time point proportionally to a function of a time-dependent covariate vector is proposed.
Abstract: SUMMARY In a longitudinal study, suppose that, for each subject, repeated measurements of the response variable and covariates are collected at a set of distinct, irregularly spaced time points. We consider a semiparametric model which relates the mean of the response variable at each time point proportionally to a function of a time-dependent covariate vector to analyse such panel data. Inference procedures for regression parameters are proposed without involving any nonparametric function estimation for the nuisance mean function. A dataset from a recent AIDS clinical trial is used to illustate the new proposal.

61 citations


Journal ArticleDOI
TL;DR: In this article, a cross-sectional sampling is applied: at some point in time one identifies a random sample from the population under study and one registers the survival time up to this time-point Typically, the resulting reduced survival times do not have the same distributions as the true survival times.

Journal ArticleDOI
Mark Yuying An1
TL;DR: In this paper, a semiparametric willingness to pay distribution was proposed and several aspects of statistical inference with dichotomous choice contingent valuation data were discussed, including the likelihood-based estimation of the model parameters with and without controlling for unobserved heterogeneity.
Abstract: This paper proposes a semiparametric willingness to pay distribution and discusses several aspects of statistical inference with dichotomous choice contingent valuation data. We study likelihood-based estimation of the model parameters with and without controlling for unobserved heterogeneity, estimation of the mean and median willingness to pay, and specification tests. These statistical procedures are implemented using a data set. In this application we find that a parametric model is rejected in favor of our semiparametric model, that the heterogeneity can be adequately controlled using a simple density, and that the semiparametric model offers more robust mean willingness to pay estimates.

Journal ArticleDOI
TL;DR: In this article, the authors developed a new estimation procedure for characteristic-based factor models of stock returns for UK and US common stocks using book-to-price ratio, market capitalization, and dividend yield.
Abstract: This paper develops a new estimation procedure for characteristic-based factor models of stock returns. It describes a factor model in which the factor betas are smooth nonlinear functions of observed security characteristics. It develops an estimation procedure that combines nonparametric kernel methods for constructing mimicking portfolios with parametric nonlinear regression to estimate factor returns and factor betas. Factor models are estimated for UK and US common stocks using book-to-price ratio, market capitalization, and dividend yield.

Journal ArticleDOI
C. P. Farrington1
TL;DR: Diagnostic tools for use with proportional hazards models for interval-censored survival data and counterparts to the Cox-Snell, Lagakos, Deviance, deviance, and Schoenfeld residuals are proposed.
Abstract: We develop diagnostic tools for use with proportional hazards models for interval-censored survival data. We propose counterparts t o the Cox-Snell, Lagakos (or martingale), deviance, and Schoenfeld residuals. Many of the properties of these residuals carry over to the interval-censored case. In particular, the interval-censored versions of the Lagakos and Schoenfeld residuals may be derived as components of suitable score statistics. The Lagakos residuals may be used to check regression relationships, while the Schoenfeld residuals can help to detect nonproportional hazards in semiparametric models. The methods apply to parametric models and to the semiparametric model with discrete observation times.

Journal ArticleDOI
Biao Zhang1
TL;DR: In this article, the authors considered quantile estimation under a two-sample semi-parametric model in which the log ratio of two unknown density functions has a known parametric form.
Abstract: We consider quantile estimation under a two-sample semi-parametric model in which the log ratio of two unknown density functions has a known parametric form. This two-sample semi-parametric model, arising naturally from case-control studies and logistic discriminant analysis, can be regarded as a biased sampling model. A new quantile estimator is constructed on the basis of the maximum semi-parametric likelihood estimator of the underlying distribution function. It is shown that the proposed quantile estimator is asymptotically normally distributed with smaller asymptotic variance than that of the standard quantile estimator. Also presented are some results on simulation and on analysis of a real data set.

Journal ArticleDOI
TL;DR: In this article, a data-driven procedure for obtaining parsimonious mixture model estimates or, conversely, kernel estimates with data driven local smoothing properties is described and investigated, where the main idea is to obtain a semiparametric estimate by alternating between the parametric and nonparametric viewpoints.

Journal ArticleDOI
TL;DR: A scaled chi-squared test for the equality of two nonparametric time functions is developed and it is shown that all model parameters can be easily obtained by fitting a linear mixed model.
Abstract: We consider semiparametric regression for periodic longitudinal data. Parametric fixed effects are used to model the covariate effects and a periodic nonparametric smooth function is used to model the time effect. The within-subject correlation is modeled using subject-specific random effects and a random stochastic process with a periodic variance function. We use maximum penalized likelihood to estimate the regression coefficients and the periodic nonparametric time function, whose estimator is shown to be a periodic cubic smoothing spline. We use restricted maximum likelihood to simultaneously estimate the smoothing parameter and the variance components. We show that all model parameters can be easily obtained by fitting a linear mixed model. A common problem in the analysis of longitudinal data is to compare the time profiles of two groups, e.g., between treatment and placebo. We develop a scaled chi-squared test for the equality of two nonparametric time functions. The proposed model and the test are illustrated by analyzing hormone data collected during two consecutive menstrual cycles and their performance is evaluated through simulations.

Posted Content
TL;DR: The question which is addressed in this paper is what is the best obtainable rate when s is unknown, so that estimators cannot depend on s, and a lower bound for the asymptotic quadratic risk of any such adaptive estimator is obtained.
Abstract: In Giraitis, Robinson, and Samarov (1997), we have shown that the optimal rate for memory parameter estimators in semiparametric long memory models with degree of “local smoothness” s is n?r(s), r(s)=s/(2s+1), and that a log-periodogram regression estimator (a modified Geweke and Porter-Hudak (1983) estimator) with maximum frequency m=m(s)?n2r(s) is rate optimal. The question which we address in this paper is what is the best obtainable rate when s is unknown, so that estimators cannot depend on s. We obtain a lower bound for the asymptotic quadratic risk of any such adaptive estimator, which turns out to be larger than the optimal nonadaptive rate n?r(s) by a logarithmic factor. We then consider a modified log-periodogram regression estimator based on tapered data and with a data-dependent maximum frequency m=m(s), which depends on an adaptively chosen estimator s of s, and show, using methods proposed by Lepskii (1990) in another context, that this estimator attains the lower bound up to a logarithmic factor. On one hand, this means that this estimator has nearly optimal rate among all adaptive (free from s) estimators, and, on the other hand, it shows near optimality of our data-dependent choice of the rate of the maximum frequency for the modified log-periodogram regression estimator. The proofs contain results which are also of independent interest: one result shows that data tapering gives a significant improvement in asymptotic properties of covariances of discrete Fourier transforms of long memory time series, while another gives an exponential inequality for the modified log-periodogram regression estimator.

Journal ArticleDOI
TL;DR: In this article, the authors extend the notion of the tangent vector and provide conditions for smoothness, or differentiability, of the parameter of interest as a function of the underlying probability measures.

Journal ArticleDOI
TL;DR: In this article, a semiparametric regression method for estimating the regression parameter in the linear model without specifying the distribution of the random error, where the response variable is subject to so-called case 1 interval censoring.
Abstract: In survival analysis, a linear model often provides an adequate approximation after a suitable transformation of the survival times and possibly of the covariates. This article proposes a semiparametric regression method for estimating the regression parameter in the linear model without specifying the distribution of the random error, where the response variable is subject to so-called case 1 interval censoring. The method uses a constructed random-sieve likelihood and constraints, combining the benefits of semiparametric likelihood with estimating equations. The estimation procedure is implemented, and the asymptotic distributions for the estimated regression parameter and for the profile likelihood ratio statistic are obtained. In addition, some model diagnostics aspects are described. Finally, the small-sample operating characteristics of the proposed method is examined via simulations, and its usefulness is illustrated on datasets from an animal tumorigenicity study and from a HIV study.

Journal ArticleDOI
TL;DR: In this paper, a fully nonparametric model for nonlinear analysis of covariance is proposed, which is useful whenever modelling assumptions such as proportional odds, or linearity and homoscedasticity appear suspect.
Abstract: SUMMARY A fully nonparametric model for nonlinear analysis of covariance is proposed. The term nonlinear means that the covariate influences the response in a possibly nonlinear and nonpolynomial fashion, while the term fully nonparametric implies that the distributions for each factor level combination and covariate value are not restricted to comply with any parametric or semiparametric model. The possibility of different shapes of covariate effect in different factor level combinations is also allowed. This generality is useful whenever modelling assumptions such as proportional odds, or linearity and homoscedasticity appear suspect. In the context of this nonparametric model hypotheses, of no main effect, no interaction and no simple effect, which adjust for the covariate values are defined and test statistics are developed. Both the response and the covariate are allowed to be ordinal. The test statistics are based on averages over the covariate values of certain NadarayaWatson-type nonparametric regression quantities and asymptotically they have, under their respective null hypotheses, a central Z2-distribution. Simulation results show that the statistics have good power properties. The procedures are demonstrated on two real datasets.

Journal ArticleDOI
TL;DR: In this article, a generalized RESET test was used as a test for misspecification in a variety of parametric and semi-parametric micro-econometric models.
Abstract: The letter notes that a generalized RESET test can be employed as a test for misspecification in a variety of parametric and semi-parametric micro-econometric models. The test's performance is illustrated using three models commonly used in labour supply studies: the linear, censored (Tobit), and duration (Weibull) regression models. All are estimated by fully parametric (maximum likelihood) and semiparametric methods. Comments are provided on the finite sample performance of the test.

Journal ArticleDOI
TL;DR: In this article, a Gibbs sampler was used to traverse the model space and predict chlorophyll concentrations in Lake Okeechobee using Bayesian model averaging (BMA) over the sampled models.
Abstract: Long-term eutrophication data along with water quality measurements (total phosphorous and total nitrogen) and other physical environmental factors such as lake level (stage), water temperature, wind speed, and direction were used to develop a model to predict chlorophyll a concentrations in Lake Okeechobee The semiparametric model included each of the potential explanatory variables as linear predictors, regression spline predictors, or product spline interactions allowing for nonlinear relationships A Gibbs sampler was used to traverse the model space Predictions that incorporate uncertainty about inclusion of variables and their functional forms were obtained using Bayesian model averaging (BMA) over the sampled models Semiparametric regression with Bayesian model averaging and spline interactions provides a flexible framework for addressing the problems of nonlinearity and counterintuitive total phosphorus function estimates identified in previous statistical models The use of regression splines allows nonlinear effects to be manifest, while their extension allows inclusion of interactions for which the mathematical form cannot be specified a priori Prediction intervals under BMA provided better coverage for new observations than confidence intervals for ordinary least squares models obtained using backwards selection Also, BMA was more efficient than ordinary least squares in terms of predictive mean squared error for overall lake predictions

Book ChapterDOI
21 Sep 2000
TL;DR: In this article, a family of in-nite-order smoothing kernels that is characterized by the atness near the origin of the Fourier transform of each member of the family is introduced.
Abstract: The problem of nonparametric estimation of a smooth, real-valued function of a vector argument. is addressed. In particular, we focus on a family of in niteorder smoothing kernels that is characterized by the atness near the origin of the Fourier transform of each member of the family; hence, the term ` at-top' kernels. Smoothing with the proposed in nite-order at-top kernels has optimal Mean Squared Error properties. We review some recent advances, as well as give two new results on density estimation in two cases of interest: (i) case of a smooth density over a nite domain, and (ii) case of in nite domain with some discontinuities.

Journal ArticleDOI
TL;DR: In this article, a semi-parametric model of consumer demand is proposed to model consumer demand by a class of Nearly Ideal Demand Systems (HITS) parameterized by a unique taste parameter, which is consistent with both the yearly cross-sections of individual choices and the dynamics of aggregate shares.
Abstract: Inspired by the recent literature on aggregation theory, this paper introduces HITS, a semiparametric model of consumer demand that allows for diversity in tastes. The strong variation of budget shares observed across income strata can arise from two economic factors: the individual income effect, and taste differences between poor and rich households. Consumer expenditure surveys that report repeated cross-sections do not permit the direct measurement of these two effects, and the paper solves this difficulty by developing a new microeconometric framework. We model consumer demand by a class of Nearly Ideal Demand Systems parameterized by a unique taste parameter. Linear heterogeneity allows GMM estimation of the structural coefficients on an aggregate time series, and the joint density of spending and tastes is recovered from cross-sections by a nonparametric procedure involving a deconvolution. We develop an asymptotic theory and demonstrate the accuracy of the algorithm by Monte Carlo and bootstrap simulations. The model is estimated on four size groups using the British Family Expenditure Survey (1968-98). We report a strong correlation between income and tastes, which explains most of the observed variation of budget shares with income. Unlike some earlier models, this new approach is consistent with both the yearly cross-sections of individual choices and the dynamics of aggregate shares.

Book ChapterDOI
TL;DR: Credit scoring methods aim to assess credit worthiness of potential borrowers to keep the risk of credit loss low and to minimize the costs of failure over risk groups.
Abstract: Credit scoring methods aim to assess credit worthiness of potential borrowers to keep the risk of credit loss low and to minimize the costs of failure over risk groups. Typical methods which are used for the statistical classification of credit applicants are linear or quadratic discriminant analysis and logistic discriminant analysis. These methods are based on scores which depend on the explanatory variables in a predefined form (usually linear). Recent methods that allow a more flexible modeling are neural networks and classification trees (see e.g. Arminger, Enache and Bonne, 1997) as well as nonparametric approaches (see e.g. Henley and Hand, 1996).

Journal ArticleDOI
TL;DR: In this paper, the authors analyzed bottled water expenditures data with zero observations by employing parametric and semiparametric models, and the overall results of specification tests indicate that the semi-parametric model outperforms the parametric model significantly.

Journal ArticleDOI
TL;DR: In this article, a semiparametric estimator for the probability density function of detected distances in line transect sampling is proposed, which affords the advantages of both parametric and nonparametric methods, i.e. accuracy and robustness.
Abstract: A novel semiparametric estimator for the probability density function of detected distances in line transect sampling is proposed. The estimator is obtained using a local likelihood density estimation approach, a technique recently proposed which affords the advantages of both parametric and nonparametric methods, i.e. accuracy and robustness. Moreover, a procedure for the selection of the local likelihood bandwidth is obtained. The performance of the proposed estimator with respect to some existing nonparametric and semiparametric estimators is assessed by means of a Monte Carlo study. Finally, a real data set is analyzed. Copyright © 2000 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, a semiparametric estimator for multiple equations multiple index (MEMI) models is proposed, which minimizes the average distance between the dependent variable unconditional and conditional on an index.
Abstract: This paper proposes a semiparametric estimator for multiple equations multiple index (MEMI) models. Examples of MEMI models include several sample selection models and the multinomial choice model. The proposed estimator minimizes the average distance between the dependent variable unconditional and conditional on an index. The estimator is √N-consistent and asymptotically normally distributed. The paper also provides a Monte Carlo experiment to evaluate the finite-sample performance of the estimator.

Journal ArticleDOI
TL;DR: In this article, the authors investigated a semiparametric normal model in which an unknown transformation of the adverse response satisfies the linear model and demonstrated that this formulation unifies the two existing approaches and allows for a coherent risk analysis of dose-response data.
Abstract: Various frameworks have been suggested for assessing the risk associated with continuous toxicity outcomes. The first formulates the effect of exposure on the adverse effect via a simple normal model and then computes the risk function using tail probabilities from the standard normal distribution. Because this risk function depends heavily on the assumed model, it may be sensitive to model misspecification. Recently, a semiparametric approach that utilizes an alternative definition of excess risk has been studied. Unfortunately, it is not yet clear how the two approaches relate to one another. In this article, we investigate a semiparametric normal model in which an unknown transformation of the adverse response satisfies the linear model. We demonstrate that this formulation unifies the two existing approaches and allows for a coherent risk analysis of dose-response data. In addition, estimation and inference procedures for the unknown transformation in the semiparametric model for the continuo...