scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2017"


Journal ArticleDOI
TL;DR: In this article, a class of semiparametric transformation models for the problem and for inference, a sieve maximum likelihood approach is developed, which provides a great flexibility, in particular including the commonly used proportional hazards model as a special case, and in the approach, Bernstein polynomials are employed.
Abstract: Interval-censored failure time data arise in a number of fields and many authors have discussed various issues related to their analysis. However, most of the existing methods are for univariate data and there exists only limited research on bivariate data, especially on regression analysis of bivariate interval-censored data. We present a class of semiparametric transformation models for the problem and for inference, a sieve maximum likelihood approach is developed. The model provides a great flexibility, in particular including the commonly used proportional hazards model as a special case, and in the approach, Bernstein polynomials are employed. The strong consistency and asymptotic normality of the resulting estimators of regression parameters are established and furthermore, the estimators are shown to be asymptotically efficient. Extensive simulation studies are conducted and indicate that the proposed method works well for practical situations. Supplementary materials for this article are ...

65 citations


Journal ArticleDOI
TL;DR: A change-plane technique is adopted to first test the existence of a subgroup, and then identify the subgroup if the null hypothesis on nonexistence of such a sub group is rejected.
Abstract: We propose a systematic method for testing and identifying a subgroup with an enhanced treatment effect. We adopts a change-plane technique to first test the existence of a subgroup, and then identify the subgroup if the null hypothesis on nonexistence of such a subgroup is rejected. A semiparametric model is considered for the response with an unspecified baseline function and an interaction between a subgroup indicator and treatment. A doubly robust test statistic is constructed based on this model, and asymptotic distributions of the test statistic under both null and local alternative hypotheses are derived. Moreover, a sample size calculation method for subgroup detection is developed based on the proposed statistic. The finite sample performance of the proposed test is evaluated via simulations. Finally, the proposed methods for subgroup identification and sample size calculation are applied to a data from an AIDS study.

43 citations


Journal ArticleDOI
TL;DR: Several nonparametric estimators outperform commonly used treatment estimators based on parametric propensity scores in terms of root mean squared error (RMSE), even though average RMSEs based on the 16 simulation designs considered are not statistically significantly different across the estimators investigated.

43 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed estimators for the finite‐dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood.
Abstract: Interval-censored multivariate failure time data arise when there are multiple types of failure or there is clustering of study subjects and each failure time is known only to lie in a certain interval. We investigate the effects of possibly time-dependent covariates on multivariate failure times by considering a broad class of semiparametric transformation models with random effects, and we study nonparametric maximum likelihood estimation under general interval-censoring schemes. We show that the proposed estimators for the finite-dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we develop an EM algorithm that converges stably for arbitrary datasets. Finally, we assess the performance of the proposed methods in extensive simulation studies and illustrate their application using data derived from the Atherosclerosis Risk in Communities Study.

39 citations


Journal ArticleDOI
TL;DR: This article proposed the conditional log odds-product as a preferred nuisance model, which facilitates maximum-likelihood estimation, but also permits doubly-robust estimation for the parameters of interest.
Abstract: A common problem in formulating models for the relative risk and risk difference is the variation dependence between these parameters and the baseline risk, which is a nuisance model. We address this problem by proposing the conditional log odds-product as a preferred nuisance model. This novel nuisance model facilitates maximum-likelihood estimation, but also permits doubly-robust estimation for the parameters of interest. Our approach is illustrated via simulations and a data analysis. An R package implementing the proposed methods is available on CRAN. Supplementary materials for this article are available online.

37 citations


Posted Content
TL;DR: In this paper, a flexible semiparametric spatial autoregressive (mixed-regressive) model is considered, where unknown coefficients are permitted to be nonparametric functions of some contextual variables to allow for potential nonlinearities and parameter heterogeneity in the spatial relationship.
Abstract: This paper considers a flexible semiparametric spatial autoregressive (mixed-regressive) model in which unknown coefficients are permitted to be nonparametric functions of some contextual variables to allow for potential nonlinearities and parameter heterogeneity in the spatial relationship. Unlike other semiparametric spatial dependence models, ours permits the spatial autoregressive parameter to meaningfully vary across units and thus allows the identification of a neighborhood-specific spatial dependence measure conditional on the vector of contextual variables. We propose several (locally) nonparametric GMM estimators for our model. The developed two-stage estimators incorporate both the linear and quadratic orthogonality conditions and are capable of accommodating a variety of data generating processes, including the instance of a pure spatially autoregressive semiparametric model with no relevant regressors as well as multiple partially linear specifications. All proposed estimators are shown to be consistent and asymptotically normal. We also contribute to the literature by putting forward two test statistics to test for parameter constancy in our model. Both tests are consistent.

36 citations


Journal ArticleDOI
TL;DR: In this paper, a flexible semiparametric spatial autoregressive (mixed-regressive) model is considered, where unknown coefficients are permitted to be nonparametric functions of some contextual variables to allow for potential nonlinearities and parameter heterogeneity in the spatial relationship.

31 citations


Journal ArticleDOI
TL;DR: This work investigates general structures conducive to the construction of so‐called multiply robust estimating functions, whose computation requires postulating several dimension‐reducing models but which have mean zero at the true parameter value provided one of these models is correct.
Abstract: We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.

30 citations


ReportDOI
TL;DR: In this paper, a new HOIF estimator was proposed that has the same asymptotic properties as their estimator but does not require nonparametric estimation of a multivariate density, which is important because accurate estimation of high dimensional density is not feasible at moderate sample sizes often encountered in applications.
Abstract: Robins et al. (2008, 2016) applied the theory of higher order influence functions (HOIFs) to derive an estimator of the mean of an outcome Y in a missing data model with Y missing at random conditional on a vector X of continuous covariates; their estimator, in contrast to previous estimators, is semiparametric efficient under minimal conditions. However, the Robins et al. (2008, 2016) estimator depends on a non-parametric estimate of the density of X. In this paper, we introduce a new HOIF estimator that has the same asymptotic properties as their estimator but does not require nonparametric estimation of a multivariate density, which is important because accurate estimation of a high dimensional density is not feasible at the moderate sample sizes often encountered in applications. We also show that our estimator can be generalized to the entire class of functionals considered by Robins et al. (2008) which include the average effect of a treatment on a response Y when a vector X suffices to control confounding and the expected conditional variance of a response Y given a vector X.

29 citations


Journal ArticleDOI
TL;DR: In this article, a broad class of semiparametric transformation models which extend the Fine and Gray model, and allow for unknown causes of failure, are presented, and the non-parametric maximum likelihood estimators are derived using the profile likelihood.
Abstract: Summary The cumulative incidence is the probability of failure from the cause of interest over a certain time period in the presence of other risks. A semiparametric regression model proposed by Fine and Gray has become the method of choice for formulating the effects of covariates on the cumulative incidence. Its estimation, however, requires modelling of the censoring distribution and is not statistically efficient. We present a broad class of semiparametric transformation models which extends the Fine and Gray model, and we allow for unknown causes of failure. We derive the non-parametric maximum likelihood estimators and develop simple and fast numerical algorithms using the profile likelihood. We establish the consistency, asymptotic normality and semiparametric efficiency of the non-parametric maximum likelihood estimators. In addition, we construct graphical and numerical procedures to evaluate and select models. Finally, we demonstrate the advantages of the proposed methods over the existing methods through extensive simulation studies and an application to a major study on bone marrow transplantation.

26 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a complete methodology for detecting time varying or non-time-varying parameters in auto-regressive conditional heteroscedasticity (ARCH) processes.
Abstract: Summary We develop a complete methodology for detecting time varying or non-time-varying parameters in auto-regressive conditional heteroscedasticity (ARCH) processes. For this, we estimate and test various semiparametric versions of time varying ARCH models which include two well-known non-stationary ARCH-type models introduced in the econometrics literature. Using kernel estimation, we show that non-time-varying parameters can be estimated at the usual parametric rate of convergence and, for Gaussian noise, we construct estimates that are asymptotically efficient in a semiparametric sense. Then we introduce two statistical tests which can be used for detecting non-time-varying parameters or for testing the second-order dynamics. An information criterion for selecting the number of lags is also provided. We illustrate our methodology with several real data sets.

Journal ArticleDOI
TL;DR: Results of simulation studies show that LOD/$$\sqrt{2}$$2 and ROS methods give better results than other methods in deviation from the mean in different sample sizes and at different censoring rates, while ROS gives better results.
Abstract: In this study, an attempt was made to determine the degrees of bias in particular sampling sizes and methods. The aim of the study was to determine deviations from the median, the mean, and the standard deviation (SD) in different sample sizes and at different censoring rates for log-normal, exponential, and Weibull distributions in the case of full and censored data sampling. Thus, the concept of “censoring” and censoring types was handled in the first place. Then substitution, parametric (MLE), nonparametric (KM), and semi-parametric (ROS) methods were introduced for the evaluation of left-censored observations. Within the scope of the present study, the data were produced uncensored based on the different parameters of each distribution. Then the datasets were left-censored at the ratios of 5, 25, 45, and 65 %. The censored data were estimated through substitution (LOD and LOD/ $$\sqrt{2}$$ ), parametric (MLE), semi-parametric (ROS), and nonparametric (KM) methods. In addition, evaluation was made by increasing the sample size from 20 to 300 by tens. Performance comparison was made between the uncensored dataset and the censored dataset on the basis of deviations from the median, the mean, and the SD. The results of simulation studies show that LOD/ $$\sqrt{2}$$ and ROS methods give better results than other methods in deviation from the mean in different sample sizes and at different censoring rates, while ROS gives better results than other methods in deviation from the median in almost all sample sizes and at almost all censoring rates.

Journal ArticleDOI
TL;DR: A hybrid semi-parametric modelling framework implemented using mixed integer linear programming (MILP) is used to extract (coupled) nonlinear ordinary differential equations (ODEs) from process data to demonstrate a principled approach to hybrid model development.

Journal ArticleDOI
TL;DR: In this article, a semiparametric copula-based estimator for conditional quantiles is investigated for both complete or right-censored data, and the proposed quantile regression estimator has the valuable property of being automatically monotonic across quantile levels.
Abstract: When facing multivariate covariates, general semiparametric regression techniques come at hand to propose flexible models that are unexposed to the curse of dimensionality. In this work a semiparametric copula-based estimator for conditional quantiles is investigated for both complete or right-censored data. In spirit, the methodology is extending the recent work of Noh, El Ghouch and Bouezmarni [34] and Noh, El Ghouch and Van Keilegom [35], as the main idea consists in appropriately defining the quantile regression in terms of a multivariate copula and marginal distributions. Prior estimation of the latter and simple plug-in lead to an easily implementable estimator expressed, for both contexts with or without censoring, as a weighted quantile of the observed response variable. In addition, and contrary to the initial suggestion in the literature, a semiparametric estimation scheme for the multivariate copula density is studied, motivated by the possible shortcomings of a purely parametric approach and driven by the regression context. The resulting quantile regression estimator has the valuable property of being automatically monotonic across quantile levels. Additionally, the copula-based approach allows the analyst to spontaneously take account of common regression concerns such as interactions between covariates or possible transformations of the latter. From a theoretical prospect, asymptotic normality for both complete and censored data is obtained under classical regularity conditions. Finally, numerical examples as well as a real data application are used to illustrate the validity and finite sample performance of the proposed procedure.

Journal ArticleDOI
TL;DR: A review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data for right-censored survival data.
Abstract: For the past several decades, nonparametric and semiparametric modeling for conventional right-censored survival data has been investigated intensively under a noninformative censoring mechanism. However, these methods may not be applicable for analyzing right-censored survival data that arise from prevalent cohorts when the failure times are subject to length-biased sampling. This review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data.

Journal ArticleDOI
TL;DR: Rank-based estimation as mentioned in this paper is an alternative to Gaussian quasi-likelihood and standard semiparametric estimation in time series models, where conditional location and/or scale depend on a Euclidean parameter of interest, while the unspecified innovation density is a nuisance.

Journal ArticleDOI
TL;DR: In this article, a semivarying coefficient model where the regressors are generated by the multivariate unit root I(1) processes is studied, and the influence of the explanatory vectors on the response variable satisfies the semiparametric partially linear structure with the nonlinear component being functional coefficients.
Abstract: We study a semivarying coefficient model where the regressors are generated by the multivariate unit root I(1) processes. The influence of the explanatory vectors on the response variable satisfies the semiparametric partially linear structure with the nonlinear component being functional coefficients. A semiparametric estimation methodology with the first-stage local polynomial smoothing is applied to estimate both the constant coefficients in the linear component and the functional coefficients in the nonlinear component. The asymptotic distribution theory for the proposed semiparametric estimators is established under some mild conditions, from which both the parametric and nonparametric estimators are shown to enjoy the well-known super-consistency property. Furthermore, a simulation study is conducted to investigate the finite sample performance of the developed methodology and results.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a joint approach for model selection based on several asymmetries simultaneously to deal with the special feature that expectile regression estimates the complete distribution of the response.
Abstract: Ordinary least squares regression focuses on the expected response and strongly depends on the assumption of normally distributed errors for inferences. An approach to overcome these restrictions is expectile regression, where no distributional assumption is made but rather the whole distribution of the response is described in terms of covariates. This is similar to quantile regression, but expectiles provide a convenient generalization of the arithmetic mean while quantiles are a generalization of the median. To analyze more complex data structures where purely linear predictors are no longer sufficient, semiparametric regression methods have been introduced for both ordinary least squares and expectile regression. However, with increasing complexity of the data and the regression structure, the selection of the true covariates and their effects becomes even more important than in standard regression models. Therefore we introduce several approaches depending on selection criteria and shrinkage methods to perform model selection in semiparametric expectile regression. Moreover, we propose a joint approach for model selection based on several asymmetries simultaneously to deal with the special feature that expectile regression estimates the complete distribution of the response. Furthermore, to distinguish between linear and smooth predictors, we split nonlinear effects into the purely linear trend and the deviation from this trend. All selection methods are compared with the benchmark of functional gradient descent boosting in a simulation study and applied to determine the relevant covariates when studying childhood malnutrition in Peru.

Journal ArticleDOI
TL;DR: In this paper, the authors provide an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual-specific term.
Abstract: In this paper, we provide an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual-specific term. We analyze these developments under two alternative model specifications: fixed and random effects panel data models. More precisely, in the random effects setting, we focus our attention in the analysis of some efficiency issues that have to do with the so-called working independence condition. This assumption is introduced when estimating the asymptotic variance–covariance matrix of nonparametric estimators. In the fixed effects setting, to cope with the so-called incidental parameters problem, we consider two different estimation approaches: profiling techniques and differencing methods. Furthermore, we are also interested in the endogeneity problem and how instrumental variables are used in this context. In addition, for practitioners, we also show different ways of avoiding the so-called curse of dimensionality problem in pure nonparametric models. In this way, semiparametric and additive models appear as a solution when the number of explanatory variables is large.

Journal ArticleDOI
TL;DR: A novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution is proposed and is broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation.
Abstract: In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer.

Journal ArticleDOI
TL;DR: In this paper, the conditional autoregressive expectile class of model, used to implicitly model ES, has been extended to allow the intra-day range, not just the daily return, as an input, and this model class is further extended to incorporate information on realized measures of volatility, including realized variance and realized range (RR), as well as scaled and smoothed versions of these.
Abstract: Realized measures employing intra-day sources of data have proven effective for dynamic volatility and tail-risk estimation and forecasting. Expected shortfall (ES) is a tail risk measure, now recommended by the Basel Committee, involving a conditional expectation that can be semi-parametrically estimated via an asymmetric sum of squares function. The conditional autoregressive expectile class of model, used to implicitly model ES, has been extended to allow the intra-day range, not just the daily return, as an input. This model class is here further extended to incorporate information on realized measures of volatility, including realized variance and realized range (RR), as well as scaled and smoothed versions of these. An asymmetric Gaussian density error formulation allows a likelihood that leads to direct estimation and one-step-ahead forecasts of quantiles and expectiles, and subsequently of ES. A Bayesian adaptive Markov chain Monte Carlo method is developed and employed for estimation and forecast...

Journal ArticleDOI
TL;DR: In this paper, a semiparametric negative binomial count data model is proposed which is based on the local likelihood approach and generalized product kernels, allowing to leave unspecified the functional form of the conditional mean, while still exploiting basic assumptions of count data models.

Journal ArticleDOI
TL;DR: The proposed semiparametric model and a novel pairwise conditional likelihood ratio test are designed to identify the combined differences in higher moments among genotypic groups and have a simple asymptotic chi-square distribution, which does not require permutation or bootstrap procedures.
Abstract: Summary Quantitative trait locus analysis has been used as an important tool to identify markers where the phenotype or quantitative trait is linked with the genotype. Most existing tests for single locus association with quantitative traits aim at the detection of the mean differences across genotypic groups. However, recent research has revealed functional genetic loci that affect the variance of traits, known as variability-controlling quantitative trait locus. In addition, it has been suggested that many genotypes have both mean and variance effects, while the mean effects or variance effects alone may not be strong enough to be detected. The existing methods accounting for unequal variances include the Levene's test, the Lepage test, and the D-test, but suffer from their limitations of lack of robustness or lack of power. We propose a semiparametric model and a novel pairwise conditional likelihood ratio test. Specifically, the semiparametric model is designed to identify the combined differences in higher moments among genotypic groups. The pairwise likelihood is constructed based on conditioning procedure, where the unknown reference distribution is eliminated. We show that the proposed pairwise likelihood ratio test has a simple asymptotic chi-square distribution, which does not require permutation or bootstrap procedures. Simulation studies show that the proposed test performs well in controlling Type I errors and having competitive power in identifying the differences across genotypic groups. In addition, the proposed test has certain robustness to model mis-specifications. The proposed test is illustrated by an example of identifying both mean and variances effects in body mass index using the Framingham Heart Study data.

Journal ArticleDOI
TL;DR: A new test is developed for the parametric form of the regression function m, which has power against local directional alternatives that converge to the null model at parametric rate, and its performance is compared to that of the test proposed by Colling and Van Keilegom (2016).

Journal ArticleDOI
TL;DR: A Bayesian decision theoretic approach to describe the failure characteristics of systems by specifying a nonparametric form for cumulative intensity function and taking into account effect of covariates by a parametric form is presented.
Abstract: We present a Bayesian decision theoretic approach for developing replacement strategies In so doing, we consider a semiparametric model to describe the failure characteristics of systems by specifying a nonparametric form for cumulative intensity function and by taking into account effect of covariates by a parametric form Use of a gamma process prior for the cumulative intensity function complicates the Bayesian analysis when the updating is based on failure count data We develop a Bayesian analysis of the model using Markov chain Monte Carlo methods and determine replacement strategies Adoption of Markov chain Monte Carlo methods involves a data augmentation algorithm We show the implementation of our approach using actual data from railroad tracks Copyright © 2016 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: In this paper, a nonparametric estimation technique for semiparametric transformation models of the form: H (Y) = φ(Z) + X′β + U where H,φ are unknown functions, β is an unknown finite-dimensional parameter vector and the variables (Y,Z) are endogenous.
Abstract: In this paper we develop a nonparametric estimation technique for semiparametric transformation models of the form: H (Y) = φ(Z) + X′β + U where H,φ are unknown functions, β is an unknown finite-dimensional parameter vector and the variables (Y,Z) are endogenous. Identification of the model and asymptotic properties of the estimator are analyzed under the mean independence assumption between the error term and the instruments. We show that the estimators are consistent, and a -convergence rate and asymptotic normality for can be attained. The simulations demonstrate that our nonparametric estimates fit the data well.

Journal ArticleDOI
TL;DR: In this article, a time-varying coefficient (TVC) model was proposed to combine the good characteristics of existing models with efficient model calibration methods, which can significantly improve the forecasting performance of mortality models.
Abstract: Over the last few decades, there has been an enormous growth in mortality modeling as the field of mortality risk and longevity risk has attracted great attention from academic, government and private sectors. In this paper, we propose a time-varying coefficient (TVC) mortality model aiming to combine the good characteristics of existing models with efficient model calibration methods. Nonparametric kernel smoothing techniques have been applied in the literature of mortality modeling and based on the findings from Li et al.’s (2015) study, such techniques can significantly improve the forecasting performance of mortality models. In this study we take the same path and adopt a kernel smoothing approach along the time dimension. Since we follow the model structure of the Cairns–Blake–Dowd (CBD) model, the TVC model we propose can be seen as a semi-parametric extension of the CBD model and it gives specific model design according to different countries’ mortality experience. Our empirical study presented here includes Great Britain, the United States, and Australia amongst other developed countries. Fitting and forecasting results from the empirical study have shown superior performances of the model over a selection of well-known mortality models in the current literature.

Posted Content
TL;DR: The package frailtySurv as mentioned in this paper implements semi-parametric consistent estimators for a variety of frailty distributions, including gamma, log-normal, inverse Gaussian and power variance function.
Abstract: The R package frailtySurv for simulating and fitting semi-parametric shared frailty models is introduced. Package frailtySurv implements semi-parametric consistent estimators for a variety of frailty distributions, including gamma, log-normal, inverse Gaussian and power variance function, and provides consistent estimators of the standard errors of the parameters' estimators. The parameters' estimators are asymptotically normally distributed, and therefore statistical inference based on the results of this package, such as hypothesis testing and confidence intervals, can be performed using the normal distribution. Extensive simulations demonstrate the flexibility and correct implementation of the estimator. Two case studies performed with publicly available datasets demonstrate applicability of the package. In the Diabetic Retinopathy Study, the onset of blindness is clustered by patient, and in a large hard drive failure dataset, failure times are thought to be clustered by the hard drive manufacturer and model.

Posted Content
TL;DR: The identifiability results are established, the corresponding modified EM algorithms are proposed to achieve optimal convergence rates for both parametric and nonparametric parts and the asymptotic properties of the proposed estimation procedures are investigated.
Abstract: In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models are indeed special cases of the new models. Backfitting estimates and the corresponding modified EM algorithms are proposed to achieve optimal convergence rates for both parametric and nonparametric parts. We establish the identifiability results of the proposed two models and investigate the asymptotic properties of the proposed estimation procedures. Simulation studies are conducted to demonstrate the finite sample performance of the proposed models. An application of NBA data by new models reveals some new findings.

Journal ArticleDOI
TL;DR: In this paper, an estimator of the parametric component of the model, which is the solution of an ill-posed inverse problem, is proposed, and shown to be asymptotically normal under certain regularity conditions.
Abstract: We consider a semiparametric single-index model, and suppose that endogeneity is present in the explanatory variables. The presence of an instrument is assumed that is non-correlated with the error term. We propose an estimator of the parametric component of the model, which is the solution of an ill-posed inverse problem. The estimator is shown to be asymptotically normal under certain regularity conditions. A simulation study is conducted to illustrate the finite sample performance of the proposed estimator.