scispace - formally typeset
Search or ask a question

Showing papers on "Semiparametric model published in 2007"


Journal ArticleDOI
TL;DR: In this paper, a coherent data-generating process (DGP) is described for nonparametric estimates of productive efficiency on environmental variables in two-stage procedures to account for exogenous factors that might affect firms’ performance.

2,915 citations


Journal ArticleDOI
TL;DR: This book deals with probability distributions, discrete and continuous densities, distribution functions, bivariate distributions, means, variances, covariance, correlation, and some random process material.
Abstract: Chapter 3 deals with probability distributions, discrete and continuous densities, distribution functions, bivariate distributions, means, variances, covariance, correlation, and some random process material. Chapter 4 is a detailed study of the concept of utility including the psychological aspects, risk, attributes, rules for utilities, multidimensional utility, and normal form of analysis. Chapter 5 treats games and optimization, linear optimization, and mixed strategies. Entropy is the topic of Chapter 6 with sections devoted to entropy, disorder, information, Shannon’s theorem, demon’s roulette, Maxwell– Boltzmann distribution, Schrodinger’s nutshell, maximum entropy probability distributions, blackbodies, and Bose–Einstein distribution. Chapter 7 is standard statistical fare including transformations of random variables, characteristic functions, generating functions, and the classic limit theorems such as the central limit theorem and the laws of large numbers. Chapter 8 is about exchangeability and inference with sections on Bayesian techniques and classical inference. Partial exchangeability is also treated. Chapter 9 considers such things as order statistics, extreme value, intensity, hazard functions, and Poisson processes. Chapter 10 covers basic elements of risk and reliability, while Chapter 11 is devoted to curve fitting, regression, and Monte Carlo simulation. There is an ample number of exercises at the ends of the chapters with answers or comments on many of them in an appendix in the back of the book. Other appendices are on the common discrete and continuous distributions and mathematical aspects of integration.

539 citations


Journal ArticleDOI
TL;DR: It is shown that the LSKM semiparametric regression can be formulated using a linear mixed model, and both the regression coefficients of the covariate effects and the L SKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the correspondinglinear mixed model formulation.
Abstract: We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.

334 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present several classes of semiparametric regression models, which extend the existing models in important directions, and construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters.
Abstract: Summary. Semiparametric regression models play a central role in formulating the effects of covariates on potentially censored failure times and in the joint modelling of incomplete repeated measures and failure times in longitudinal studies. The presence of infinite dimensional parameters poses considerable theoretical and computational challenges in the statistical analysis of such models. We present several classes of semiparametric regression models, which extend the existing models in important directions. We construct appropriate likelihood functions involving both finite dimensional and infinite dimensional parameters. The maximum likelihood estimators are consistent and asymptotically normal with efficient variances. We develop simple and stable numerical techniques to implement the corresponding inference procedures. Extensive simulation experiments demonstrate that the inferential and computational methods proposed perform well in practical settings. Applications to three medical studies yield important new insights. We conclude that there is no reason, theoretical or numerical, not to use maximum likelihood estimation for semiparametric regression models.We discuss several areas that need further research.

314 citations


Journal ArticleDOI
TL;DR: In this article, a class of semiparametric models for the covariance function by that imposes a parametric correlation structure while allowing a nonparametric variance function is proposed, and a kernel estimator is developed.
Abstract: Improving efficiency for regression coefficients and predicting trajectories of individuals are two important aspects in the analysis of longitudinal data. Both involve estimation of the covariance function. Yet challenges arise in estimating the covariance function of longitudinal data collected at irregular time points. A class of semiparametric models for the covariance function by that imposes a parametric correlation structure while allowing a nonparametric variance function is proposed. A kernel estimator for estimating the nonparametric variance function is developed. Two methods for estimating parameters in the correlation structure—a quasi-likelihood approach and a minimum generalized variance method—are proposed. A semiparametric varying coefficient partially linear model for longitudinal data is introduced, and an estimation procedure for model coefficients using a profile weighted least squares approach is proposed. Sampling properties of the proposed estimation procedures are studied, and asy...

240 citations


Journal ArticleDOI
TL;DR: In this article, the mean of a counting process with panel count data is estimated using a semiparametric regression model, where the conditional mean function of the counting process is of the form EfN(t)jZg = exp(fl T Z)⁄0(t), where Z is a vector of covariates and ⁄0 is the baseline mean function.
Abstract: We consider estimation in a particular semiparametric regression model for the mean of a counting process with \panel count" data. The basic model assumption is that the conditional mean function of the counting process is of the form EfN(t)jZg = exp(fl T Z)⁄0(t) where Z is a vector of covariates and ⁄0 is the baseline mean function. The \panel count" observation scheme involves observation of the counting processN for an individual at a random number K of random time points; both the number and the locations of these time points may difier across individuals. We study semiparametric maximum pseudo-likelihood and maximum likelihood estimators of the unknown parameters (fl0;⁄0) derived on the basis of a nonhomogeneous Poisson process assumption. The pseudo-likelihood estimator is fairly easy to compute, while the maximum likelihood estimator poses more challenges from the computational perspective. We study asymptotic properties of both estimators assuming that the proportional mean model holds, but dropping the Poisson process assumption used to derive the estimators. In particular we establish asymptotic normality for the estimators of the regression parameter fl0 under appropriate hypotheses. The results show that our estimation procedures are robust in the sense that the estimators converge to the truth regardless of the underlying counting process.

148 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area, and propose a penalized spline formulation of the model that relates to generalized kriging of the latent traffic pollution variable and leads to a natural Bayesian Markov chain Monte Carlo algorithm for model fitting.
Abstract: Summary. Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modelling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies that were conducted at specific household locations as well as 15 ambient monitoring sites in the area. The models allow for both flexible non-linear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic particles, with some recording only outdoor concentrations of black or elemental carbon, some recording indoor concentrations of black carbon and others recording both indoor and outdoor concentrations of black carbon. A joint model for outdoor and indoor exposure that specifies a spatially varying latent variable provides greater spatial coverage in the area of interest. We propose a penalized spline formulation of the model that relates to generalized kriging of the latent traffic pollution variable and leads to a natural Bayesian Markov chain Monte Carlo algorithm for model fitting. We propose methods that allow us to control the degrees of freedom of the smoother in a Bayesian framework. Finally, we present results from an analysis that applies the model to data from summer and winter separately.

132 citations


Journal ArticleDOI
TL;DR: 4 hierarchical regression methods are compared and a semiparametric model with a variable-selection prior is presented to allow clustering of coefficients at 0.05 to allow the data to help inform their values.
Abstract: Studies that include individuals with multiple highly correlated exposures are common in epidemiology. Because standard maximum likelihood techniques often fail to converge in such instances, hierarchical regression methods have seen increasing use. Bayesian hierarchical regression places prior distributions on exposure-specific regression coefficients to stabilize estimation and incorporate prior knowledge, if available. A common parametric approach in epidemiology is to treat the prior mean and variance as fixed constants. An alternative parametric approach is to place distributions on the prior mean and variance to allow the data to help inform their values. As a more flexible semiparametric option, one can place an unknown distribution on the coefficients that simultaneously clusters exposures into groups using a Dirichlet process prior. We also present a semiparametric model with a variable-selection prior to allow clustering of coefficients at 0. We compare these 4 hierarchical regression methods and demonstrate their application in an example estimating the association of herbicides with retinal degeneration among wives of pesticide applicators.

130 citations


Journal ArticleDOI
TL;DR: In this article, Newey and Powell proposed a modified sieve minimum distance (SMD) estimation of both finite dimensional parameter (i.e., θ ) and infinite dimension parameter (h) that are identified through a conditional moment restriction model, in which h could depend on endogenous variables.

128 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a mixed model methodology for Cox-type hazard regression models where the usual linear predictor is generalized to a geoadditive predictor incorporating non-parametric terms for the (log-)baseline hazard rate, time-varying coefficients and non-linear effects of continuous covariates, a spatial component, and additional cluster-specific frailties.
Abstract: . Mixed model based approaches for semiparametric regression have gained much interest in recent years, both in theory and application. They provide a unified and modular framework for penalized likelihood and closely related empirical Bayes inference. In this article, we develop mixed model methodology for a broad class of Cox-type hazard regression models where the usual linear predictor is generalized to a geoadditive predictor incorporating non-parametric terms for the (log-)baseline hazard rate, time-varying coefficients and non-linear effects of continuous covariates, a spatial component, and additional cluster-specific frailties. Non-linear and time-varying effects are modelled through penalized splines, while spatial components are treated as correlated random effects following either a Markov random field or a stationary Gaussian random field prior. Generalizing existing mixed model methodology, inference is derived using penalized likelihood for regression coefficients and (approximate) marginal likelihood for smoothing parameters. In a simulation we study the performance of the proposed method, in particular comparing it with its fully Bayesian counterpart using Markov chain Monte Carlo methodology, and complement the results by some asymptotic considerations. As an application, we analyse leukaemia survival data from northwest England.

118 citations


Journal ArticleDOI
TL;DR: The results demonstrate that the likelihood-based parametric analyses for the cumulative incidence function are a practically useful alternative to the semiparametric analyses.
Abstract: We propose parametric regression analysis of cumulative incidence function with competing risks data. A simple form of Gompertz distribution is used for the improper baseline subdistribution of the event of interest. Maximum likelihood inferences on regression parameters and associated cumulative incidence function are developed for parametric models, including a flexible generalized odds rate model. Estimation of the long-term proportion of patients with cause-specific events is straightforward in the parametric setting. Simple goodness-of-fit tests are discussed for evaluating a fixed odds rate assumption. The parametric regression methods are compared with an existing semiparametric regression analysis on a breast cancer data set where the cumulative incidence of recurrence is of interest. The results demonstrate that the likelihood-based parametric analyses for the cumulative incidence function are a practically useful alternative to the semiparametric analyses.

Journal ArticleDOI
TL;DR: In this article, a groupwise empirical likelihood procedure was proposed to handle the inter-series dependence for the longitudinal semiparametric regression model, and employed bias correction to construct the empirical likelihood ratio functions for the parameters of interest.
Abstract: A semiparametric regression model for longitudinal data is considered. The empirical likelihood method is used to estimate the regression coefficients and the baseline function, and to construct confidence regions and intervals. It is proved that the maximum empirical likelihood estimator of the regression coefficients achieves asymptotic efficiency and the estimator of the baseline function attains asymptotic normality when a bias correction is made. Two calibrated empirical likelihood approaches to inference for the baseline function are developed. We propose a groupwise empirical likelihood procedure to handle the inter-series dependence for the longitudinal semiparametric regression model, and employ bias correction to construct the empirical likelihood ratio functions for the parameters of interest. This leads us to prove a nonparametric version of Wilks' theorem. Compared with methods based on normal approximations, the empirical likelihood does not require consistent estimators for the asymptotic variance and bias. A simulation compares the empirical likelihood and normal-based methods in terms of coverage accuracies and average areas/lengths of confidence regions/intervals.

Journal ArticleDOI
TL;DR: In this article, a fully nonparametric model that captures nonlinearity in both continuous and categorical variables was employed to estimate a hedonic price function, which gave more intuitive and meaningful results than the semiparametric procedure.
Abstract: In this paper we attempt to replicate the results of an article (Anglin and Gencay 1996) published in this journal which applied semiparametric procedures to estimate a hedonic price function. To relax additional restrictive assumptions, we also employ a fully nonparametric model that captures nonlinearity in both continuous and categorical variables. We find that the nonparametric procedure gives more intuitive and meaningful results.

Journal ArticleDOI
TL;DR: In this paper, a new consistent variable selection method, called separated cross-validation, is proposed, which leads to single-index models with selected variables that have better prediction capability than models based on all the covariates.
Abstract: SUMMARY We consider variable selection in the single-index model. We prove that the popular leave-m-out crossvalidation method has different behaviour in the single-index model from that in linear regression models or nonparametric regression models. A new consistent variable selection method, called separated crossvalidation, is proposed. Further analysis suggests that the method has better finite-sample performance and is computationally easier than leave-m-out crossvalidation. Separated crossvalidation, applied to the Swiss banknotes data and the ozone concentration data, leads to single-index models with selected variables that have better prediction capability than models based on all the covariates.

Journal ArticleDOI
TL;DR: A generalization of the EM algorithm to semiparametric mixture models is proposed, and the behavior of the proposed EM type estimators is studied numerically not only through several Monte-Carlo experiments but also through comparison with alternative methods existing in the literature.

Journal Article
TL;DR: Although it seems that there may not be a single model that is substantially better than others, in univariate analysis the data strongly supported the log normal regression among parametric models and it can be lead to more precise results as an alternative to Cox.
Abstract: Background Researchers in medical sciences often tend to prefer Cox semi-parametric instead of parametric models for survival analysis because of fewer assumptions but under certain circumstances, parametric models give more precise estimates. The objective of this study was to compare two survival regression methods - Cox regression and parametric models - in patients with gastric adenocarcinomas who registered at Taleghani hospital, Tehran. Methods We retrospectively studied 746 cases from February 2003 through January 2007. Gender, age at diagnosis, family history of cancer, tumor size and pathologic distant of metastasis were selected as potential prognostic factors and entered into the parametric and semi parametric models. Weibull, exponential and lognormal regression were performed as parametric models with the Akaike Information Criterion (AIC) and standardized of parameter estimates to compare the efficiency of models. Results The survival results from both Cox and Parametric models showed that patients who were older than 45 years at diagnosis had an increased risk for death, followed by greater tumor size and presence of pathologic distant metastasis. Conclusion In multivariate analysis Cox and Exponential are similar. Although it seems that there may not be a single model that is substantially better than others, in univariate analysis the data strongly supported the log normal regression among parametric models and it can be lead to more precise results as an alternative to Cox.

Journal ArticleDOI
TL;DR: In this article, a class of semiparametric transformation models with random effects for the intensity function of the counting process is studied and the nonparametric maximum likelihood estimators (NPMLEs) for the parameters of these models are consistent and asymptotically normal.
Abstract: In this article we study a class of semiparametric transformation models with random effects for the intensity function of the counting process. These models provide considerable flexibility in formulating the effects of possibly time-dependent covariates on the developments of recurrent events while accounting for the dependence of the recurrent event times within the same subject. We show that the nonparametric maximum likelihood estimators (NPMLEs) for the parameters of these models are consistent and asymptotically normal. The limiting covariance matrices for the estimators of the regression parameters achieve the semiparametric efficiency bounds and can be consistently estimated. The limiting covariance function for the estimator of any smooth functional of the cumulative intensity function also can be consistently estimated. We develop a simple and stable EM algorithm to compute the NPMLEs as well as the variance and covariance estimators. Simulation studies demonstrate that the proposed methods per...

Journal ArticleDOI
TL;DR: The aim of this paper is to propose a SAS macro to estimate parametric and semiparametric mixture cure models with covariates and an example in the field of cancer clinical trials is shown.

Journal ArticleDOI
TL;DR: A latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model is introduced.
Abstract: In this paper we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model. We extend existing LVMs with the usual linear covariate effects by including nonparametric components for nonlinear effects of continuous covariates and interactions with other covariates as well as spatial effects. Full Bayesian modelling is based on penalized spline and Markov random field priors and is performed by computationally efficient Markov chain Monte Carlo (MCMC) methods. We apply our approach to a German social science survey which motivated our methodological development.

Journal ArticleDOI
TL;DR: The results indicate that the flexibility of this general class of models provides a safeguard for analyzing recurrent event data, even data possibly arising from a frailtyless mechanism.

Journal ArticleDOI
TL;DR: In this article, the authors considered binary response models where errors are uncorrelated with a set of instrumental variables and are independent of a continuous regressor v, conditional on all other variables.

Journal ArticleDOI
TL;DR: A semiparametric efficient estimator under minimal assumptions when the panel model contains a lagged dependent variable is developed and applied to analyze the structure of demand between city pairs for selected U.S. airlines during the period 1979 I–1992 IV.

Journal ArticleDOI
TL;DR: In this article, the authors consider a partially linear model where the vector of coefficients β in the linear part can be partitioned as (β1, β2), where β1 is the coefficient vector for main effects (e.g. treatment effect, genetic effects) and β2 is a vector for nuisance effects.
Abstract: Summary We consider a partially linear model in which the vector of coefficients β in the linear part can be partitioned as (β1, β2), where β1 is the coefficient vector for main effects (e.g. treatment effect, genetic effects) and β2 is a vector for ‘nuisance’ effects (e.g. age, laboratory). In this situation, inference about β1 may benefit from moving the least squares estimate for the full model in the direction of the least squares estimate without the nuisance variables (Steinian shrinkage), or from dropping the nuisance variables if there is evidence that they do not provide useful information (pretesting). We investigate the asymptotic properties of Stein-type and pretest semiparametric estimators under quadratic loss and show that, under general conditions, a Stein-type semiparametric estimator improves on the full model conventional semiparametric least squares estimator. The relative performance of the estimators is examined using asymptotic analysis of quadratic risk functions and it is found that the Stein-type estimator outperforms the full model estimator uniformly. By contrast, the pretest estimator dominates the least squares estimator only in a small part of the parameter space, which is consistent with the theory. We also consider an absolute penalty-type estimator for partially linear models and give a Monte Carlo simulation comparison of shrinkage, pretest and the absolute penalty-type estimators. The comparison shows that the shrinkage method performs better than the absolute penalty-type estimation method when the dimension of the β2 parameter space is large.

Journal ArticleDOI
TL;DR: In this article, the authors consider a semiparametric version of the problem, where the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model.
Abstract: SUMMARY Hjort & Claeskens (2003) developed an asymptotic theory for model selection, model averaging and subsequent inference using likelihood methods in parametric models, along with associated confidence statements. In this article, we consider a semiparametric version of this problem, wherein the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model. We show that all the results of Hjort & Claeskens hold in the semiparametric context, if the Fisher information matrix for parametric models is replaced by the semiparametric information bound for semiparametric models, and if maximum likelihood estimators for parametric models are replaced by semiparametric efficient profile estimators. Our methods of proof employ Le Cam’s contiguity lemmas, leading to transparent results. The results also describe the behaviour of semiparametric model estimators when the parametric component is misspecified, and also have implications for pointwise-consistent model selectors. Somekeywords: Akaike information criterion; Bayes information criterion; Efficient semiparametric estimation; Frequentist model averaging; Model averaging; Model selection; Profile likelihood; Semiparametric model.

Journal ArticleDOI
TL;DR: In this paper, the authors consider marginal semiparametric partially linear models for longitudinal/clustered data and propose an estimation procedure based on a spline approximation of the non-parametric part of the model and an extension of the parametric marginal generalized estimating equations (GEE).
Abstract: . We consider marginal semiparametric partially linear models for longitudinal/clustered data and propose an estimation procedure based on a spline approximation of the non-parametric part of the model and an extension of the parametric marginal generalized estimating equations (GEE). Our estimates of both parametric part and non-parametric part of the model have properties parallel to those of parametric GEE, that is, the estimates are efficient if the covariance structure is correctly specified and they are still consistent and asymptotically normal even if the covariance structure is misspecified. By showing that our estimate achieves the semiparametric information bound, we actually establish the efficiency of estimating the parametric part of the model in a stronger sense than what is typically considered for GEE. The semiparametric efficiency of our estimate is obtained by assuming only conditional moment restrictions instead of the strict multivariate Gaussian error assumption.

Journal ArticleDOI
TL;DR: In this paper, a large number of functions differing from each other only by a translation parameter are observed, and the shift parameters are estimated using the Fourier transform, which enables to transform this statistical problem into a semi-parametric framework.
Abstract: We observe a large number of functions differing from each other only by a translation parameter. While the main pattern is unknown, we propose to estimate the shift parameters using $M$-estimators. Fourier transform enables to transform this statistical problem into a semi-parametric framework. We study the convergence of the estimator and provide its asymptotic behavior. Moreover, we use the method in the applied case of velocity curve forecasting.

Journal ArticleDOI
TL;DR: In this article, a large number of functions differing from each other only by a translation parameter are observed, and the shift parameters are estimated using the Fourier transform, which enables to transform this statistical problem into a semi-parametric framework.
Abstract: We observe a large number of functions differing from each other only by a translation parameter. While the main pattern is unknown, we propose to estimate the shift parameters using $M$-estimators. Fourier transform enables to transform this statistical problem into a semi-parametric framework. We study the convergence of the estimator and provide its asymptotic behavior. Moreover, we use the method in the applied case of velocity curve forecasting.

Posted Content
01 Jan 2007
TL;DR: In this article, the authors analyzed several volatility models by examining their ability to forecast the Value-at-Risk (VaR) for two different time periods and two capitalization weighting schemes.
Abstract: This paper analyses several volatility models by examining their ability to forecast the Value-at-Risk (VaR) for two different time periods and two capitalization weighting schemes. Specifically, VaR is calculated for large and small capitalization stocks, based on Dow Jones (DJ) Euro Stoxx indices and is modeled for long and short trading positions by using non parametric, semi parametric and parametric methods. In order to choose one model among the various forecasting methods, a two-stage backtesting procedure is implemented. In the first stage the unconditional coverage test is used to examine the statistical accuracy of the models. In the second stage a loss function is applied to investigate whether the differences between the models, that calculated accurately the VaR, are statistically significant. Under this framework, the combination of a parametric model with the historical simulation produced robust results across the sample periods, market capitalization schemes, trading positions and confidence levels and therefore there is a risk measure that is reliable.

Journal ArticleDOI
TL;DR: The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy, and achieves the desired classification with a semiparametric hierarchical model.
Abstract: We analyse data from a study involving 173 pregnant women. The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.

Journal ArticleDOI
TL;DR: In this paper, the authors propose two ways of dealing with the problem: (1) Estimate Lorenz curves using parametric models and (2) combine empirical estimation with a parametric (robust) estimation of the upper tail of the distribution using the Pareto model.
Abstract: Lorenz curves and second-order dominance criteria, the fundamental tools for stochastic dominance, are known to be sensitive to data contamination in the tails of the distribution. We propose two ways of dealing with the problem: (1) Estimate Lorenz curves using parametric models and (2) combine empirical estimation with a parametric (robust) estimation of the upper tail of the distribution using the Pareto model. Approach (2) is preferred because of its flexibility. Using simulations we show the dramatic effect of a few contaminated data on the Lorenz ranking and the performance of the robust semi-parametric approach (2). Since estimation is only a first step for statistical inference and since semi-parametric models are not straightforward to handle, we also derive asymptotic covariance matrices for our semi-parametric estimators.