scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2012"


Reference EntryDOI
31 Aug 2012
TL;DR: A statistical generative model called independent component analysis is discussed, which shows how sparse coding can be interpreted as providing a Bayesian prior, and answers some questions which were not properly answered in the sparse coding framework.
Abstract: Independent component models have gained increasing interest in various fields of applications in recent years. The basic independent component model is a semiparametric model assuming that a p-variate observed random vector is a linear transformation of an unobserved vector of p independent latent variables. This linear transformation is given by an unknown mixing matrix, and one of the main objectives of independent component analysis (ICA) is to estimate an unmixing matrix by means of which the latent variables can be recovered. In this article, we discuss the basic independent component model in detail, define the concepts and analysis tools carefully, and consider two families of ICA estimates. The statistical properties (consistency, asymptotic normality, efficiency, robustness) of the estimates can be analyzed and compared via the so called gain matrices. Some extensions of the basic independent component model, such as models with additive noise or models with dependent observations, are briefly discussed. The article ends with a short example. Keywords: blind source separation; fastICA; independent component model; independent subspace analysis; mixing matrix; overcomplete ICA; undercomplete ICA; unmixing matrix

2,976 citations


Journal ArticleDOI
TL;DR: In this paper, a semiparametric frontier model that combines the DEA-type nonparametric frontier, which satisfies monotonicity and concavity, with the SFA-style stochastic homoskedastic composite error term is proposed.
Abstract: The field of productive efficiency analysis is currently divided between two main paradigms: the deterministic, nonparametric Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). This paper examines an encompassing semiparametric frontier model that combines the DEA-type nonparametric frontier, which satisfies monotonicity and concavity, with the SFA-style stochastic homoskedastic composite error term. To estimate this model, a new two-stage method is proposed, referred to as Stochastic Non-smooth Envelopment of Data (StoNED). The first stage of the StoNED method applies convex nonparametric least squares (CNLS) to estimate the shape of the frontier without any assumptions about its functional form or smoothness. In the second stage, the conditional expectations of inefficiency are estimated based on the CNLS residuals, using the method of moments or pseudolikelihood techniques. Although in a cross-sectional setting distinguishing inefficiency from noise in general requires distributional assumptions, we also show how these can be relaxed in our approach if panel data are available. Performance of the StoNED method is examined using Monte Carlo simulations.

285 citations


Journal ArticleDOI
TL;DR: The semiparametric approach reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression.
Abstract: We provide a novel and completely different approach to dimension-reduction problems from the existing literature. We cast the dimension-reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression. The semiparametric estimators without these common assumptions are illustrated through simulation studies and a real data example. This article has online supplementary material.

184 citations


Journal ArticleDOI
TL;DR: In this article, a weighted additive nonparametric regression model was proposed to estimate the factor returns and the characteristic-beta functions of a factor model, with factor returns serving as time-varying weights and a set of univariate non-parametric functions relating security characteristic to the associated factor betas.
Abstract: This paper develops a new estimation procedure for characteristic-based factor models of stock returns. We treat the factor model as a weighted additive nonparametric regression model, with the factor returns serving as time-varying weights and a set of univariate nonparametric functions relating security characteristic to the associated factor betas. We use a time-series and cross-sectional pooled weighted additive nonparametric regression methodology to simultaneously estimate the factor returns and characteristic-beta functions. By avoiding the curse of dimensionality, our methodology allows for a larger number of factors than existing semiparametric methods. We apply the technique to the three-factor Fama–French model, Carhart’s four-factor extension of it that adds a momentum factor, and a five-factor extension that adds an own-volatility factor. We find that momentum and own-volatility factors are at least as important, if not more important, than size and value in explaining equity return comovements. We test the multifactor beta pricing theory against a general alternative using a new nonparametric test

127 citations


Journal ArticleDOI
TL;DR: In this paper, a series of Monte Carlo experiments demonstrates that nonparametric predicted values and marginal effect estimates are much more accurate than spatial AR models when the contiguity matrix is misspecified.
Abstract: Though standard spatial econometric models may be useful for specification testing, they rely heavily on a parametric structure that is highly sensitive to model misspecification. The commonly used spatial AR model is a form of spatial smoothing with a structure that closely resembles a semiparametric model. Nonparametric and semiparametric models are generally a preferable approach for more descriptive spatial analysis. Estimated population density functions illustrate the differences between the spatial AR model and nonparametric approaches to data smoothing. A series of Monte Carlo experiments demonstrates that nonparametric predicted values and marginal effect estimates are much more accurate then spatial AR models when the contiguity matrix is misspecified.

102 citations


Journal ArticleDOI
TL;DR: In this article, a difference based ridge regression estimator and a Liu type estimator of the regression parameters in the partial linear semiparametric regression model are analyzed and compared in the sense of mean-squared error.

101 citations


Journal ArticleDOI
TL;DR: In this paper, the so-called Bernstein-von Mises theorem in a semiparametric framework where the unknown quantity is (θ, f), with θ the parameter of interest and f an infinite-dimensional nuisance parameter.
Abstract: This paper is a contribution to the Bayesian theory of semiparametric estimation. We are interested in the so-called Bernstein–von Mises theorem, in a semiparametric framework where the unknown quantity is (θ, f), with θ the parameter of interest and f an infinite-dimensional nuisance parameter. Two theorems are established, one in the case with no loss of information and one in the information loss case with Gaussian process priors. The general theory is applied to three specific models: the estimation of the center of symmetry of a symmetric function in Gaussian white noise, a time-discrete functional data analysis model and Cox’s proportional hazards model. In all cases, the range of application of the theorems is investigated by using a family of Gaussian priors parametrized by a continuous parameter.

101 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a semi-parametric GMM estimator for the parameter in the SAR error process and derived the joint asymptotic distribution for both spatial parameters.

98 citations


Journal ArticleDOI
TL;DR: In this paper, the authors develop techniques to simplify semiparametric inference by deriving a number of numerical equivalence results from well-known parametric literature, such as this paper.
Abstract: The goal of this paper is to develop techniques to simplify semiparametric inference. We do this by deriving a number of numerical equivalence results. These illustrate that in many cases, one can obtain estimates of semiparametric variances using standard formulas derived in the well-known parametric literature. This means that for computational purposes, an empirical researcher can ignore the semiparametric nature of the problem and do all calculations as if it were a parametric situation. We hope that this simplicity will promote the use of semiparametric procedures.

95 citations


Journal ArticleDOI
TL;DR: In this article, a modified expectation-maximization-type (EM-type) estimation procedure was proposed to achieve the optimal convergence rates for both regression parameters and the nonparametric functions of mixing proportions.
Abstract: In this article, we study a class of semiparametric mixtures of regression models, in which the regression functions are linear functions of the predictors, but the mixing proportions are smoothing functions of a covariate. We propose a one-step backfitting estimation procedure to achieve the optimal convergence rates for both regression parameters and the nonparametric functions of mixing proportions. We derive the asymptotic bias and variance of the one-step estimate, and further establish its asymptotic normality. A modified expectation-maximization-type (EM-type) estimation procedure is investigated. We show that the modified EM algorithms preserve the asymptotic ascent property. Numerical simulations are conducted to examine the finite sample performance of the estimation procedures. The proposed methodology is further illustrated via an analysis of a real dataset.

71 citations


Journal ArticleDOI
TL;DR: This work derives an optimal estimator of f-divergence in the sense of the asymptotic variance in a semiparametric setting, and provides a statistic for two-sample homogeneity test based on the optimal estimators.
Abstract: A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semiparametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the f-divergence between two probability densities is estimated using a density-ratio estimator. The f -divergence estimator is then exploited for the two-sample homogeneity test. We derive an optimal estimator of f-divergence in the sense of the asymptotic variance in a semiparametric setting, and provide a statistic for two-sample homogeneity test based on the optimal estimator. We prove that the proposed test dominates the existing empirical likelihood score test. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference.

Journal ArticleDOI
TL;DR: This article described Robinson's double residual semiparametric regression estimator and Hardle and Mammen's (1993, Annals of Statistics 21: 1926-1947) spec...
Abstract: In this article, we describe Robinson's (1988, Econometrica 56: 931– 954) double residual semiparametric regression estimator and Hardle and Mammen's (1993, Annals of Statistics 21: 1926–1947) spec...

Journal ArticleDOI
TL;DR: A Bayesian semiparametric model for capturing spatio-temporal heterogeneity within the proportional hazards framework is proposed and an autoregressive dependent tailfree process is introduced.
Abstract: Incorporating temporal and spatial variation could potentially enhance information gathered from survival data. This paper proposes a Bayesian semiparametric model for capturing spatio-temporal heterogeneity within the proportional hazards framework. The spatial correlation is introduced in the form of county-level frailties. The temporal effect is introduced by considering the stratification of the proportional hazards model, where the time-dependent hazards are indirectly modeled using a probability model for related probability distributions. With this aim, an autoregressive dependent tailfree process is introduced. The full Kullback-Leibler support of the proposed process is provided. The approach is illustrated using simulated and data from the Surveillance Epidemiology and End Results database of the National Cancer Institute on patients in Iowa diagnosed with breast cancer.

Journal ArticleDOI
TL;DR: In this paper, the authors extend the parametric, asymmetric, stochastic volatility model (ASV), where returns are correlated with volatility, by flexibly modeling the bivariate distribution of the return and volatility innovations nonparametrically.
Abstract: In this paper, we extend the parametric, asymmetric, stochastic volatility model (ASV), where returns are correlated with volatility, by flexibly modeling the bivariate distribution of the return and volatility innovations nonparametrically. Its novelty is in modeling the joint, conditional, return-volatility distribution with an infinite mixture of bivariate Normal distributions with mean zero vectors, but having unknown mixture weights and covariance matrices. This semiparametric ASV model nests stochastic volatility models whose innovations are distributed as either Normal or Student-t distributions, plus the response in volatility to unexpected return shocks is more general than the fixed asymmetric response with the ASV model. The unknown mixture parameters are modeled with a Dirichlet process prior. This prior ensures a parsimonious, finite, posterior mixture that best represents the distribution of the innovations and a straightforward sampler of the conditional posteriors. We develop a Bayesian Markov chain Monte Carlo sampler to fully characterize the parametric and distributional uncertainty. Nested model comparisons and out-of-sample predictions with the cumulative marginal-likelihoods, and one-day-ahead, predictive log-Bayes factors between the semiparametric and parametric versions of the ASV model shows the semiparametric model projecting more accurate empirical market returns. A major reason is how volatility responds to an unexpected market movement. When the market is tranquil, expected volatility reacts to a negative (positive) price shock by rising (initially declining, but then rising when the positive shock is large). However, when the market is volatile, the degree of asymmetry and the size of the response in expected volatility is muted. In other words, when times are good, no news is good news, but when times are bad, neither good nor bad news matters with regards to volatility.

Journal ArticleDOI
TL;DR: A new semiparametric model for functional regression analysis, combining a parametric mixed- effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussianprocess functional regression model is proposed.
Abstract: We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.

Journal ArticleDOI
TL;DR: The question of how to design SVMs by choosing the reproducing kernel Hilbert space (RKHS) or its corresponding kernel to obtain consistent and statistically robust estimators in additive models is addressed and an explicit construction of such RKHSs and their kernels, which will be called additive kernels, is given.

Journal ArticleDOI
TL;DR: In this paper, a semiparametric proportional likelihood ratio model is proposed for modeling a nonlinear monotonic relationship between the outcome variable and a covariate, and a maximum likelihood estimator is obtained for the new model.
Abstract: We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.

Journal ArticleDOI
TL;DR: A general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables.
Abstract: There has been great interest in developing nonlinear structural equation models and associated statistical inference procedures, including estimation and model selection methods. In this paper a general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables. A basis representation is used to approximate these nonparametric functions in the structural equation and the Bayesian Lasso method coupled with a Markov Chain Monte Carlo (MCMC) algorithm is used for simultaneous estimation and model selection. The proposed method is illustrated using a simulation study and data from the Affective Dynamics and Individual Differences (ADID) study. Results demonstrate that our method can accurately estimate the unknown parameters and correctly identify the true underlying model.

Journal ArticleDOI
TL;DR: In this paper, a new semiparametric estimator for an empirical asset pricing model with general nonparametric risk-return tradeoff and GARCH-type underlying volatility is introduced.

Book ChapterDOI
30 Jan 2012
TL;DR: Recent advances in dynamic modelling of non-Gaussian data, in particular discrete-valued, time series and longitudinal data, are focused on, and ideas from dynamic models can be adopted for Bayesian semiparametric inference in generalized additive and varying coefficient models.
Abstract: This paper surveys dynamic or state space models and their relationship to non- and semiparametric models that are based on the roughness penalty approach. We focus on recent advances in dynamic modelling of non-Gaussian, in particular discrete-valued, time series and longitudinal data, make the close correspondence to semiparametric smoothing methods evident, and show how ideas from dynamic models can be adopted for Bayesian semiparametric inference in generalized additive and varying coefficient models. Basic tools for corresponding inference techniques are penalized likelihood estimation, Kalman filtering and smoothing and Markov chain Monte Carlo (MCMC) simulation. Similarities, relative merits, advantages and disadvantages of these methods are illustrated through several applications.

Journal ArticleDOI
01 Dec 2012
TL;DR: In this paper, a semiparametric model is considered where the functional of interest is a shift parameter between two curves and a surprising example is provided where two at first sight indistinguishable Gaussian priors lead to quite different behaviours of the posterior distribution of the function of interest.
Abstract: A semiparametric model is considered where the functional of interest is a shift parameter between two curves. A surprising example is provided where two at first sight indistinguishable Gaussian priors lead to quite different behaviours of the posterior distribution of the functional of interest. This phenomenon also illustrates that a condition introduced in Castillo (2012) of the approximation of the least favourable direction by the Gaussian prior is almost necessary for the Bernstein–von Mises theorem to hold.

Journal ArticleDOI
TL;DR: It is demonstrated that shrinkage estimators which combine two semiparametric estimators computed for the full model and the reduced model outperform the semiprametric estimator for theFull model.

Journal ArticleDOI
TL;DR: Comparing the results of the two approaches, it was found that both methods opportunely captured, in terms of signs, the relationships under investigation, but the use of a more flexible approach has allowed us to uncover some interesting non-linearities that are usually not assumed a priori, thus improving the interpretation of the results.
Abstract: In this paper, we investigate the impact that spatial and micro-economic variables have on the probability that a household goes on holiday. In doing so, we propose two alternative modelling specifications: a classic discrete choice model and a semiparametric logistic model. The semiparametric model extends the classic logistic model, usually employed in studies on participation in tourism, allowing modelling in a flexible manner for continuous predictors without making any a priori assumption. This is achieved via the use of penalized regression splines. A sample of Italian households was considered for our study. Comparing the results of the two approaches, we found that both methods opportunely captured, in terms of signs, the relationships under investigation. However, the use of a more flexible approach has allowed us to uncover some interesting non-linearities that are usually not assumed a priori, thus improving the interpretation of the results. Copyright © 2011 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the covariates in the linear part are measured with additive error and some additional linear restrictions on the parametric component are available, and a restricted modified profile least-squares estimator for the parameteretric component, and prove the asymptotic normality of the proposed estimator.

Journal ArticleDOI
TL;DR: A new nonparametric regression model for the conditional hazard rate using a suitable sieve of Bernstein polynomials is presented and empirical results indicate that the proposed model has reasonably robust performance compared to other semiparametric models.

Proceedings Article
03 Dec 2012
TL;DR: It is proved that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size.
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman's rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012).

Journal ArticleDOI
TL;DR: A maximum pseudo-profile likelihood estimator is proposed, which can handle time-dependent covariates and is consistent under covariate-dependent censoring and is more efficient than its competitors.
Abstract: This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory.

Book
12 Feb 2012
TL;DR: A survey of basic theory can be found in this paper, where Tangent spaces and gradients are used to estimate equations and asymptotic bounds for the concentration of estimator-sequences.
Abstract: A Survey of basic theory.- 1. Tangent spaces and gradients.- 2. Asymptotic bounds for the concentration of estimator-sequences.- 3. Constructing estimator-sequences.- 4. Estimation in semiparametric models.- 5. Families of gradients.- 6. Estimating equations.- B Semiparametric families admitting a sufficient statistic.- 7. A special semiparametric model.- 8. Mixture models.- 9. Examples of mixture models.- Example.- Example.- Example.- L Auxiliary results.- References.- Notation index.

Journal ArticleDOI
TL;DR: A penalized variable selection approach is developed for prognosis studies with right censored response variables whose covariate effects have two parts: a nonparametric part for low-dimensional covariates, and a parametric parts for high-dimensionalivariable covariates.
Abstract: Recent biomedical studies often measure two distinct sets of risk factors: low-dimensional clinical and environmental measurements, and high-dimensional gene expression measurements. For prognosis studies with right censored response variables, we propose a semiparametric regression model whose covariate effects have two parts: a nonparametric part for low-dimensional covariates, and a parametric part for high-dimensional covariates. A penalized variable selection approach is developed. The selection of parametric covariate effects is achieved using an iterated Lasso approach, for which we prove the selection consistency property. The nonparametric component is estimated using a sieve approach. An empirical model selection tool for the nonparametric component is derived based on the Kullback-Leibler geometry. Numerical studies show that the proposed approach has satisfactory performance. Application to a lymphoma study illustrates the proposed method.

Journal ArticleDOI
TL;DR: A semiparametric transformation model that can be fitted to a general nonlinear mixed model, including linear or nonlinear regression models, mixed effect models, factor analysis models, and other latent variable models as special cases is developed.
Abstract: In this paper, we aim to develop a semiparametric transformation model. Nonparametric transformation functions are modeled with Bayesian P-splines. The transformed variables can be fitted to a general nonlinear mixed model, including linear or nonlinear regression models, mixed effect models, factor analysis models, and other latent variable models as special cases. Markov chain Monte Carlo algorithms are implemented to estimate transformation functions and unknown quantities in the model. The performance of the developed methodology is demonstrated with a simulation study. Its application to a real study on polydrug use is presented.