scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2014"



OtherDOI
29 Sep 2014
TL;DR: In this article, a review of the common nonparametric approaches to incorporate time and other covariate effects for longitudinally observed response data is presented, where the prevailing approaches to model random effects are through functional principal components analysis and B-splines.
Abstract: Nonparametric approaches have recently emerged as a flexible way to model longitudinal data. This entry reviews some of the common nonparametric approaches to incorporate time and other covariate effects for longitudinally observed response data. Smoothing procedures are invoked to estimate the associated nonparametric functions, but the choice of smoothers can vary and is often subjective. Both fixed and random effects may be included for vector or longitudinal covariates. A closely related type of data is functional data, where the prevailing approaches to model random effects are through functional principal components analysis and B-splines. Related semiparametric regression models also play an increasingly important role. Keywords: functional data analysis; scatter-plot smoother; mean curve; fixed effects; random effects; principal components analysis; semiparametric regression

102 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the semiparametric inference of the simple Gamma-process model and a random effects variant, where the maximum likelihood estimates of the parameters were obtained through the EM algorithm and the bootstrap was used to construct confidence intervals.
Abstract: This article investigates the semiparametric inference of the simple Gamma-process model and a random-effects variant. Maximum likelihood estimates of the parameters are obtained through the EM algorithm. The bootstrap is used to construct confidence intervals. A simulation study reveals that an estimation based on the full likelihood method is more efficient than the pseudo likelihood method. In addition, a score test is developed to examine the existence of random effects under the semiparametric scenario. A comparison study using a fatigue-crack growth dataset shows that performance of a semiparametric estimation is comparable to the parametric counterpart. This article has supplementary material online.

85 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a temporal multi-scale model that combines three components: a long term trend estimated by means of nonparametric smoothing, a medium term component describing the sensitivity of the electricity demand to the temperature at each time step, and a short term component models local behaviours.

85 citations


Posted Content
TL;DR: In this paper, a rank-based approach is proposed for both latent graph estimation and latent principal component analysis, which achieves the same rates of convergence for both precision matrix estimation and eigenvector estimation, as if the latent variables were observed.
Abstract: Graphical models are commonly used tools for modeling multivariate random variables. While there exist many convenient multivariate distributions such as Gaussian distribution for continuous data, mixed data with the presence of discrete variables or a combination of both continuous and discrete variables poses new challenges in statistical modeling. In this paper, we propose a semiparametric model named latent Gaussian copula model for binary and mixed data. The observed binary data are assumed to be obtained by dichotomizing a latent variable satisfying the Gaussian copula distribution or the nonparanormal distribution. The latent Gaussian model with the assumption that the latent variables are multivariate Gaussian is a special case of the proposed model. A novel rank-based approach is proposed for both latent graph estimation and latent principal component analysis. Theoretically, the proposed methods achieve the same rates of convergence for both precision matrix estimation and eigenvector estimation, as if the latent variables were observed. Under similar conditions, the consistency of graph structure recovery and feature selection for leading eigenvectors is established. The performance of the proposed methods is numerically assessed through simulation studies, and the usage of our methods is illustrated by a genetic dataset.

84 citations


Journal ArticleDOI
TL;DR: In this paper, a uniform expansion for sums of weighted kernel-based regression residuals from nonparametric or semiparametric models is introduced, which is useful for deriving asymptotic properties of semi-parametric estimators and test statistics with data-dependent bandwidth, random trimming, and estimated weights.

73 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric model for the situation where several multivariate extremal distributions are linked through the action of a covariate on an unspecified baseline distribution, through a so-called density ratio model, is presented.
Abstract: The modeling of multivariate extremes has received increasing recent attention because of its importance in risk assessment. In classical statistics of extremes, the joint distribution of two or more extremes has a nonparametric form, subject to moment constraints. This article develops a semiparametric model for the situation where several multivariate extremal distributions are linked through the action of a covariate on an unspecified baseline distribution, through a so-called density ratio model. Theoretical and numerical aspects of empirical likelihood inference for this model are discussed, and an application is given to pairs of extreme forest temperatures. Supplementary materials for this article are available online.

57 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a semiparametric spatial dynamic model, which extends the ordinary spatial autoregressive models to accommodate the effects of some covariates associated with the house price.
Abstract: Stimulated by the Boston house price data, in this paper, we propose a semiparametric spatial dynamic model, which extends the ordinary spatial autoregressive models to accommodate the effects of some covariates associated with the house price. A profile likelihood based estimation procedure is proposed. The asymptotic normality of the proposed estimators are derived. We also investigate how to identify the parametric/nonparametric components in the proposed semiparametric model. We show how many unknown parameters an unknown bivariate function amounts to, and propose an AIC/BIC of nonparametric version for model selection. Simulation studies are conducted to examine the performance of the proposed methods. The simulation results show our methods work very well. We finally apply the proposed methods to analyze the Boston house price data, which leads to some interesting findings

55 citations


Journal ArticleDOI
TL;DR: In this article, the authors develop algorithms for performing semiparametric regression analysis in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies, and demonstrate the methodology for continually arriving stock market, real estate, and airline data.
Abstract: We develop algorithms for performing semiparametric regression analysis in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies. Our definition of semiparametric regression is quite broad and includes, as special cases, generalized linear mixed models, generalized additive models, geostatistical models, wavelet nonparametric regression models and their various combinations. Fast updating of regression fits is achieved by couching semiparametric regression into a Bayesian hierarchical model or, equivalently, graphical model framework and employing online mean field variational ideas. An Internet site attached to this article, realtime-semiparametric-regression.net, illustrates the methodology for continually arriving stock market, real estate, and airline data. Flexible real-time analyses based on increasingly ubiquitous streaming data sources stand to benefit. This article has online supplementary material.

48 citations


Journal ArticleDOI
TL;DR: In this article, the authors characterize the semiparametric efficiency bound for structural economics models with non-parametric conditional moment restrictions with possibly non-nested or overlapping conditioning sets, and the finite dimensional parameters of interest are over-identified via unconditional moment restrictions involving the nuisance functions.
Abstract: Many structural economics models are semiparametric ones in which the unknown nuisance functions are identified via non-parametric conditional moment restrictions with possibly non-nested or overlapping conditioning sets, and the finite dimensional parameters of interest are over-identified via unconditional moment restrictions involving the nuisance functions. In this article we characterize the semiparametric efficiency bound for this class of models. We show that semiparametric two-step optimally weighted GMM estimators achieve the efficiency bound, where the nuisance functions could be estimated via any consistent non-parametric methods in the first step. Regardless of whether the efficiency bound has a closed form expression or not, we provide easy-to-compute sieve-based optimal weight matrices that lead to asymptotically efficient two-step GMM estimators.

48 citations


Journal Article
TL;DR: Asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement and a set of empirical process tools are developed including a Glivenko-Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes.
Abstract: We develop asymptotic theory for weighted likelihood esti- mators (WLE) under two-phase stratied sampling without replacement. We also consider several variants of WLE's involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko-Cantelli theorem, a theorem for rates of convergence of Z- estimators, and a Donsker theorem for the inverse probability weighted em- pirical processes under two-phase sampling and sampling without replace- ment at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a nite dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is es- timable either at regular or non-regular rates. We illustrate these results and methods in the Cox model with right censoring and interval censor- ing. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase. AMS 2000 subject classications: Primary 62E20; secondary 62G20, 62D99, 62N01.

Journal ArticleDOI
TL;DR: It is shown that the standard EM algorithm can be adapted to infer the model parameters and a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility.
Abstract: In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.

Journal ArticleDOI
TL;DR: In this article, the authors extend the parametric, asymmetric, stochastic volatility model (ASV), where returns are correlated with volatility, by flexibly modeling the bivariate distribution of the return and volatility innovations nonparametrically.

Journal ArticleDOI
TL;DR: A heterogeneous Bayesian semiparametric approach for modeling choice endogeneity that offers a flexible and robust alternative to parametric methods is proposed and results show that parameter and elasticity estimates are sensitive to the choice of distributional forms.
Abstract: Marketing variables that are included in consumer discrete choice models are often endogenous. Extant treatments using likelihood-based estimators impose parametric distributional assumptions, such as normality, on the source of endogeneity. These assumptions are restrictive because misspecified distributions have an impact on parameter estimates and associated elasticities. The normality assumption for endogeneity can be inconsistent with some marginal cost specifications given a price-setting process, although they are consistent with other specifications. In this paper, we propose a heterogeneous Bayesian semiparametric approach for modeling choice endogeneity that offers a flexible and robust alternative to parametric methods. Specifically, we construct centered Dirichlet process mixtures CDPM to allow uncertainty over the distribution of endogeneity errors. In a similar vein, we also model consumer preference heterogeneity nonparametrically via a CDPM. Results on simulated data show that incorrect distributional assumptions can lead to poor recovery of model parameters and price elasticities, whereas the proposed semiparametric model is able to robustly recover the true parameters in an efficient fashion. In addition, the CDPM offers the benefits of automatically inferring the number of mixture components that are appropriate for a given data set and is able to reconstruct the shape of the underlying distributions for endogeneity and heterogeneity errors. We apply our approach to two scanner panel data sets. Model comparison statistics indicate the superiority of the semiparametric specification and the results show that parameter and elasticity estimates are sensitive to the choice of distributional forms. Moreover, the CDPM specification yields evidence of multimodality, skewness, and outlying observations in these real data sets. Data, as supplemental material, are available at http://dx.doi.org/10.1287/mnsc.2013.1811 . This paper was accepted by J. Miguel Villas-Boas, marketing.

Journal ArticleDOI
TL;DR: This work proposes a semiparametric method for estimating a precision matrix of high-dimensional elliptical distributions that naturally handles heavy tailness and conducts parameter estimation under a calibration framework, thus achieves improved theoretical rates of convergence and finite sample performance on heavy-tail applications.
Abstract: We propose a semiparametric method for estimating a precision matrix of high-dimensional elliptical distributions. Unlike most existing methods, our method naturally handles heavy tailness and conducts parameter estimation under a calibration framework, thus achieves improved theoretical rates of convergence and finite sample performance on heavy-tail applications. We further demonstrate the performance of the proposed method using thorough numerical experiments.

Journal ArticleDOI
TL;DR: This work proposes a semiparametric g-prior which incorporates an unknown matrix of cluster allocation indicators and Bayes’ factor and variable selection consistency is shown to result under a class of proper priors on g even when the number of candidate predictors p is allowed to increase much faster than sample size n, while making sparsity assumptions on the true model size.
Abstract: There is a rich literature on Bayesian variable selection for parametric models. Our focus is on generalizing methods and asymptotic theory established for mixtures of g-priors to semiparametric linear regression models having unknown residual densities. Using a Dirichlet process location mixture for the residual density, we propose a semiparametric g-prior which incorporates an unknown matrix of cluster allocation indicators. For this class of priors, posterior computation can proceed via a straightforward stochastic search variable selection algorithm. In addition, Bayes’ factor and variable selection consistency is shown to result under a class of proper priors on g even when the number of candidate predictors p is allowed to increase much faster than sample size n, while making sparsity assumptions on the true model size.

Journal ArticleDOI
TL;DR: In this article, a semiparametric single index panel data model with cross-sectional dependence, high-dimensionality and stationarity is considered, and the rate of convergence and asymptotic normality consistencies are established for the proposed estimates.
Abstract: In this paper, we consider a semiparametric single index panel data model with cross-sectional dependence, high-dimensionality and stationarity. Meanwhile, we allow fixed effects to be correlated with the regressors to capture unobservable heterogeneity. Under a general spatial error dependence structure, we then establish some consistent closed-form estimates for both the unknown parameters and a link function for the case where both N and T go to infinity. Rates of convergence and asymptotic normality consistencies are established for the proposed estimates. Our experience suggests that the proposed estimation method is simple and thus attractive for finite-sample studies and empirical implementations. Moreover, both the finite-sample performance and the empirical applications show that the proposed estimation method works well when the cross-sectional dependence exists in the data set.

Posted Content
01 Jan 2014
TL;DR: This article developed alternative asymptotic results for a large class of two-step semiparametric estimators and showed that the bootstrap provides an automatic method of correcting for the bias even when it is non-negligible.
Abstract: This paper develops alternative asymptotic results for a large class of two-step semiparametric estimators. The first main result is an asymptotic distribution result for such estimators and differs from those obtained in earlier work on classes of semiparametric two-step estimators by accommodating a non-negligible bias. A noteworthy feature of the assumptions under which the result is obtained is that reliance on a commonly employed stochastic equicontinuity condition is avoided. The second main result shows that the bootstrap provides an automatic method of correcting for the bias even when it is non-negligible.

Journal ArticleDOI
TL;DR: In this article, the authors consider a semiparametric mixture of two unknown distributions, where the mixed distribution is assumed to be zero-symmetric and the model is defined by the mixing proportion, two location parameters, and the probability density function.
Abstract: We consider in this paper the semiparametric mixture of two unknown distributions equal up to a shift parameter. The model is said to be semiparametric in the sense that the mixed distribution is not supposed to belong to a parametric family. In order to insure the identifiability of the model it is assumed that the mixed distribution is zero-symmetric, the model being then defined by the mixing proportion, two location parameters, and the probability density function of the mixed distribution. We propose a new class of M-estimators of these parameters based on a Fourier approach, and prove that they are √ n-consistent under mild regularity conditions. Their finite-sample properties are illustrated by a Monte Carlo study and a benchmark real dataset is also studied with our method.

Journal ArticleDOI
TL;DR: In this article, higher order tangent spaces and influence functions are reviewed and their use to construct minimax efficient estimators for parameters in highdimensional semiparametric models is discussed.
Abstract: We review higher order tangent spaces and influence functions and their use to construct minimax efficient estimators for parameters in highdimensional semiparametric models.

Journal ArticleDOI
TL;DR: In this article, the nonparametric component is approximated by Fourier series which is expressed by the non-parametric components of the regression model and the response variable is assumed to be proportional to the predictor variable.
Abstract: Consider data pairs 1 1 ( ,..., , ,..., , ) i ir i ip i x x t t y involving in a semiparametric regression model ( , ) i i i i y x t      , where 1 ( , ) ( ), p i i i j ji j x t x g t        1, , ; i n  1,..., j p  is the semiparametric regression curve. Response variable i y is assumed to be proportional to predictor variable 1 ( ,..., ) i i ir x x x   , but at the same time, its relationship with other predictor variables 1 ( ,..., ) i i ip t t t  is unidentified. The i x   and ( ) j ji g t are, parametric and nonparametric components respectively. In this study, the nonparametric component is approximated by Fourier series which is expressed by

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of estimating the proportion of true null hypotheses in a multiple testing context, where the setup is classically modeled through a semiparametric mixture with two components: a uniform distribution on interval $[0, 1] with prior probability $\theta$ and a nonparametric density $f$.
Abstract: We consider the problem of estimating the proportion $\theta$ of true null hypotheses in a multiple testing context. The setup is classically modeled through a semiparametric mixture with two components: a uniform distribution on interval $[0,1]$ with prior probability $\theta$ and a nonparametric density $f$. We discuss asymptotic efficiency results and establish that two different cases occur whether $f$ vanishes on a set with non null Lebesgue measure or not. In the first case, we exhibit estimators converging at parametric rate, compute the optimal asymptotic variance and conjecture that no estimator is asymptotically efficient (\emph{i.e.} attains the optimal asymptotic variance). In the second case, we prove that the quadratic risk of any estimator does not converge at parametric rate. We illustrate those results on simulated data.

Journal ArticleDOI
TL;DR: This work adopted a two-part model to describe the overall survival experience for interval censored data with a cured proportion and constructed a BIC-type model selection method to recommend an appropriate specification of parametric and nonparametric components in the model.
Abstract: Varying-coefficient models have claimed an increasing portion of statistical research and are now applied to censored data analysis in medical studies. We incorporate such flexible semiparametric regression tools for interval censored data with a cured proportion. We adopted a two-part model to describe the overall survival experience for such complicated data. To fit the unknown functional components in the model, we take the local polynomial approach with bandwidth chosen by cross-validation. We establish consistency and asymptotic distribution of the estimation and propose to use bootstrap for inference. We constructed a BIC-type model selection method to recommend an appropriate specification of parametric and nonparametric components in the model. We conducted extensive simulations to assess the performance of our methods. An application on a decompression sickness data illustrates our methods.

Journal ArticleDOI
TL;DR: In this article, a non-parametric weighted estimator for group testing data is proposed, where the individuals are pooled randomly into groups, and only the pooled data are available.
Abstract: Summary We consider non- and semi-parametric estimation of a conditional probability curve in the case of group testing data, where the individuals are pooled randomly into groups, and only the pooled data are available. We derive a nonparametric weighted estimator 15 that has optimality properties accounting for group sizes, and show how to extend it to multivariate settings, including the partially linear model. In the group testing context, it is natural to assume that the probability curve depends on the covariates only through a linear combination of them. Motivated by this, we develop a nonparametric estimator based on the single-index model. We study theoretical properties of the sug- 20 gested estimators, and derive data-driven procedures. Practical properties of the methods are demonstrated via real and simulated examples and shown to have smaller median integrated square error than existing competitors.

Journal ArticleDOI
TL;DR: This article proposed a semiparametric Bayesian latent variable model for multivariate data of arbitrary type that does not require specification of conditional distributions, and employed this model to investigate the association between cognitive outcomes and MRI-measured regional brain volumes.
Abstract: Multivariate data that combine binary, categorical, count and continuous outcomes are common in the social and health sciences. We propose a semiparametric Bayesian latent variable model for multivariate data of arbitrary type that does not require specification of conditional distributions. Drawing on the extended rank likelihood method by Hoff [Ann. Appl. Stat. 1 (2007) 265-283], we develop a semiparametric approach for latent variable modeling with mixed outcomes and propose associated Markov chain Monte Carlo estimation methods. Motivated by cognitive testing data, we focus on bifactor models, a special case of factor analysis. We employ our semiparametric Bayesian latent variable model to investigate the association between cognitive outcomes and MRI-measured regional brain volumes.

Journal ArticleDOI
TL;DR: In this article, a semi-parametric approach to selectivity was proposed, which specifies a penalty on differences between estimated selectivity at age and a pre-specified parametric model whose parameters are freely estimated.

Journal ArticleDOI
TL;DR: An efficient Bayesian approach under a proportional hazards frailty model to analyze interval-censored survival data with spatial correlation is proposed, using a linear combination of monotonic splines to model the unknown baseline cumulative hazard function.

Journal ArticleDOI
TL;DR: This paper introduces several goodness-of-fit tests for the parametric model and applies them to data on the induction time to acquired immune deficiency syndrome for blood transfusion patients and to parametric specification of the distribution function of the truncation times.
Abstract: Doubly truncated data are commonly encountered in areas like medicine, astronomy, economics, among others. A semiparametric estimator of a doubly truncated random variable may be computed based on a parametric specification of the distribution function of the truncation times. This semiparametric estimator outperforms the nonparametric maximum likelihood estimator when the parametric information is correct, but might behave badly when the assumed parametric model is far off. In this paper we introduce several goodness-of-fit tests for the parametric model. The proposed tests are investigated through simulations. For illustration purposes, the tests are also applied to data on the induction time to acquired immune deficiency syndrome for blood transfusion patients.

Journal ArticleDOI
TL;DR: The proposed model has a proportionality parameter for the speed of each test taker, for the time intensity of each item, and for subject or item characteristics of interest, and it is shown how all these parameters can be estimated by Markov chain Monte Carlo methods.
Abstract: The semi-parametric proportional hazards model with crossed random effects has two important characteristics: it avoids explicit specification of the response time distribution by using semi-parametric models, and it captures heterogeneity that is due to subjects and items. The proposed model has a proportionality parameter for the speed of each test taker, for the time intensity of each item, and for subject or item characteristics of interest. It is shown how all these parameters can be estimated by Markov chain Monte Carlo methods (Gibbs sampling). The performance of the estimation procedure is assessed with simulations and the model is further illustrated with the analysis of response times from a visual recognition task.

Journal ArticleDOI
TL;DR: In this article, a semiparametric mixture of generalized linear models and a nonparametric mixture was proposed, and identifiability results under mild conditions were established under mild assumptions.