scispace - formally typeset
Search or ask a question

Showing papers on "Coverage probability published in 2006"


Journal ArticleDOI
TL;DR: This review describes for the practitioner the essential features of line‐fitting methods for estimating the relationship between two variables: what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions.
Abstract: Fitting a line to a bivariate dataset can be a deceptively complex problem, and there has been much debate on this issue in the literature. In this review, we describe for the practitioner the essential features of line-fitting methods for estimating the relationship between two variables: what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions. A particularly important point for line-fitting in allometry is that usually, two sources of error are present (which we call measurement and equation error), and these have quite different implications for choice of line-fitting method. As a consequence, the approach in this review and the methods presented have subtle but important differences from previous reviews in the biology literature. Linear regression, major axis and standardised major axis are alternative methods that can be appropriate when there is no measurement error. When there is measurement error, this often needs to be estimated and used to adjust the variance terms in formulae for line-fitting. We also review line-fitting methods for phylogenetic analyses. Methods of inference are described for the line-fitting techniques discussed in this paper. The types of inference considered here are testing if the slope or elevation equals a given value, constructing confidence intervals for the slope or elevation, comparing several slopes or elevations, and testing for shift along the axis amongst several groups. In some cases several methods have been proposed in the literature. These are discussed and compared. In other cases there is little or no previous guidance available in the literature. Simulations were conducted to check whether the methods of inference proposed have the intended coverage probability or Type I error. We identified the methods of inference that perform well and recommend the techniques that should be adopted in future work.

1,952 citations


Journal ArticleDOI
TL;DR: Two methods are provided to correct relative risk estimates obtained from logistic regression models for measurement errors in continuous exposures within cohort studies that may be due to either random (unbiased) within-person variation or to systematic errors for individual subjects.
Abstract: Errors in the measurement of exposure that are independent of disease status tend to bias relative risk estimates and other measures of effect in epidemiologic studies toward the null value. Two methods are provided to correct relative risk estimates obtained from logistic regression models for measurement errors in continuous exposures within cohort studies that may be due to either random (unbiased) within-person variation or to systematic errors for individual subjects. These methods require a separate validation study to estimate the regression coefficient lambda relating the surrogate measure to true exposure. In the linear approximation method, the true logistic regression coefficient beta* is estimated by beta/lambda, where beta is the observed logistic regression coefficient based on the surrogate measure. In the likelihood approximation method, a second-order Taylor series expansion is used to approximate the logistic function, enabling closed-form likelihood estimation of beta*. Confidence intervals for the corrected relative risks are provided that include a component representing error in the estimation of lambda. Based on simulation studies, both methods perform well for true odds ratios up to 3.0; for higher odds ratios the likelihood approximation method was superior with respect to both bias and coverage probability. An example is provided based on data from a prospective study of dietary fat intake and risk of breast cancer and a validation study of the questionnaire used to assess dietary fat intake.

649 citations


Journal ArticleDOI
TL;DR: This work provides some examples of separation and near-separation in clinical data sets and discusses some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach.
Abstract: In logistic regression analysis of small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analyses it may even happen that the likelihood meets the convergence criteria while at least one parameter estimate diverges to +/-infinity. This situation has been termed 'separation', and it typically occurs whenever no events are observed in one of the two groups defined by a dichotomous covariate. More generally, separation is caused by a linear combination of continuous or dichotomous covariates that perfectly separates events from non-events. Separation implies infinite or zero maximum likelihood estimates of odds ratios, which are usually considered unrealistic. I provide some examples of separation and near-separation in clinical data sets and discuss some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach. Both methods supply finite point estimates in case of separation. Profile penalized likelihood confidence intervals for parameters show excellent behaviour in terms of coverage probability and provide higher power than exact confidence intervals. General advantages of the penalized likelihood approach are discussed.

334 citations


Journal ArticleDOI
TL;DR: In this article, a subclass of generalized pivotal quantities, called fiducial generalized pivotal quantity (FGPQs), is proposed and shown to have correct frequentist coverage, at least asymptotically.
Abstract: Generalized pivotal quantities (GPQs) and generalized confidence intervals (GCIs) have proven to be useful tools for making inferences in many practical problems. Although GCIs are not guaranteed to have exact frequentist coverage, a number of published and unpublished simulation studies suggest that the coverage probabilities of such intervals are sufficiently close to their nominal value so as to be useful in practice. In this article we single out a subclass of generalized pivotal quantities, which we call fiducial generalized pivotal quantities (FGPQs), and show that under some mild conditions, GCIs constructed using FGPQs have correct frequentist coverage, at least asymptotically. We describe three general approaches for constructing FGPQs—a recipe based on invertible pivotal relationships, and two extensions of it—and demonstrate their usefulness by deriving some previously unknown GPQs and GCIs. It is fair to say that nearly every published GCI can be obtained using one of these recipes. As an inte...

286 citations


Journal ArticleDOI
TL;DR: In this paper, a flexible class of zero inflated models, such as the zero inflated Poisson (ZIP) model, is introduced as an alternative to traditional maximum likelihood based methods to analyze defect counts.

180 citations


Journal ArticleDOI
TL;DR: In this paper, Monte Carlo sampling-based procedures for assessing solution quality in stochastic programs are developed. But the quality is defined via the optimality gap and the procedures' output is a confidence interval on this gap.
Abstract: Determining whether a solution is of high quality (optimal or near optimal) is fundamental in optimization theory and algorithms. In this paper, we develop Monte Carlo sampling-based procedures for assessing solution quality in stochastic programs. Quality is defined via the optimality gap and our procedures' output is a confidence interval on this gap. We review a multiple-replications procedure that requires solution of, say, 30 optimization problems and then, we present a result that justifies a computationally simplified single-replication procedure that only requires solving one optimization problem. Even though the single replication procedure is computationally significantly less demanding, the resulting confidence interval might have low coverage probability for small sample sizes for some problems. We provide variants of this procedure that require two replications instead of one and that perform better empirically. We present computational results for a newsvendor problem and for two-stage stochastic linear programs from the literature. We also discuss when the procedures perform well and when they fail, and we propose using ɛ-optimal solutions to strengthen the performance of our procedures.

140 citations


Journal ArticleDOI
TL;DR: This article uses a receiver operating characteristic (ROC) surface to describe the probabilities of correct classifications into three diagnostic groups based on various sets of diagnostic thresholds of a test and proposes to use the entire and the partial volume under the surface to measure the diagnostic accuracy.
Abstract: This article studies the problem of measuring and estimating the diagnostic accuracy when there are three ordinal diagnostic groups. We use a receiver operating characteristic (ROC) surface to describe the probabilities of correct classifications into three diagnostic groups based on various sets of diagnostic thresholds of a test and propose to use the entire and the partial volume under the surface to measure the diagnostic accuracy. Mathematical properties and probabilistic interpretations of the proposed measure of diagnostic accuracy are discussed. Under the assumption of normal distributions of the diagnostic test from three diagnostic groups, we present the maximum likelihood estimate to the volume under the ROC surface and give the asymptotic variance to the estimate. We further propose several asymptotic confidence interval estimates to the volume under the ROC surface. The performance of these confidence interval estimates is evaluated in terms of attaining the nominal coverage probability based on a simulation study. In addition, we develop a method of sample size determination to achieve an adequate accuracy of the confidence interval estimate. Finally, we demonstrate the proposed methodology by applying it to the clinical diagnosis of early stage Alzheimer's disease based on the neuropsychological database of the Washington University Alzheimer's Disease Research Center.

99 citations


Journal ArticleDOI
TL;DR: In this paper, a large-sample analysis of the minimal coverage probability of the usual confidence intervals for regression parameters when the underlying model is chosen by a conservative (or overconsistent) model selection procedure is given.
Abstract: We give a large-sample analysis of the minimal coverage probability of the usual confidence intervals for regression parameters when the underlying model is chosen by a “conservative” (or “overconsistent”) model selection procedure. We derive an upper bound for the large-sample limit minimal coverage probability of such intervals that applies to a large class of model selection procedures including the Akaike information criterion as well as various pretesting procedures. This upper bound can be used as a safeguard to identify situations where the actual coverage probability can be far below the nominal level. We illustrate that the (asymptotic) upper bound can be statistically meaningful even in rather small samples.

91 citations


Journal ArticleDOI
TL;DR: In this article, an adjusted pseudo-empirical likelihood ratio statistic that is asymptoti- cally distributed as a chi-square random variable is used to construct confidence intervals for a finite population mean or finite population distribution function.
Abstract: The authors show how an adjusted pseudo-empirical likelihood ratio statistic that is asymptoti- cally distributed as a chi-square random variable can be used to construct confidence intervals for a finite population mean or a finite population distribution function from complex surv ey samples. They consider both non-stratified and stratified sampling designs, with or without auxiliary in formation. They examine the behaviour of estimates of the mean and the distribution function at specifi c points using simulations calling on the Rao-Sampford method of unequal probability sampling without replacement. They conclude that the pseudo-empirical likelihood ratio confidence intervals are super ior to those based on the normal approximation, whether in terms of coverage probability, tail error rates or average length of the intervals.

82 citations


Journal ArticleDOI
TL;DR: An overview of point and interval estimates for flexible designs and some proposals for confidence intervals which have nominal coverage probability also after an unforeseen design adaptation and which contain the maximum likelihood estimate and the usual unadjusted confidence interval are made.
Abstract: Adaptive test designs for clinical trials allow for a wide range of data driven design adaptations using all information gathered until an interim analysis. The basic principle is to use a test statistics which is invariant with respect to the design adaptations under the null hypothesis. This allows for a control of the type I error rate for the primary hypothesis even for adaptations not specified a priori in the study protocol. Estimation is usually another important part of a clinical trial, however, is more difficult in adaptive designs. In this research paper we give an overview of point and interval estimates for flexible designs and compare methods for typical sample size rules. We also make some proposals for confidence intervals which have nominal coverage probability also after an unforeseen design adaptation and which contain the maximum likelihood estimate and the usual unadjusted confidence interval.

73 citations


Journal ArticleDOI
TL;DR: Adaptive confidence balls are constructed for individual resolution levels as well as the entire mean vector in a multiresolution framework in this paper, which have guaranteed coverage probability over all of $\mathbb{R}^N$ and expected squared radius adapting over a maximum range of Besov bodies.
Abstract: Adaptive confidence balls are constructed for individual resolution levels as well as the entire mean vector in a multiresolution framework. Finite sample lower bounds are given for the minimum expected squared radius for confidence balls with a prespecified confidence level. The confidence balls are centered on adaptive estimators based on special local block thresholding rules. The radius is derived from an analysis of the loss of this adaptive estimator. In addition adaptive honest confidence balls are constructed which have guaranteed coverage probability over all of $\mathbb{R}^N$ and expected squared radius adapting over a maximum range of Besov bodies.

Journal ArticleDOI
TL;DR: In this article, the authors compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-t-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) method under three conditions of nonnormality, eight cases of sample size, and six cases of population effect size (ES) magnitude.
Abstract: Kelley compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-t-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) bootstrap methods under three conditions of nonnormality, eight cases of sample size, and six cases of population effect size (ES) magnitude. Kelley recommended the BCA bootstrap method. The authors expand on his investigation by including additional cases of nonnormality. Like Kelley, they find that under many conditions, the BCA bootstrap method works best; however, they also find that in some cases of nonnormality, the method does not control probability coverage. The authors also define a robust parameter for ES and a robust sample statistic, based on trimmed means and Winsorized variances, and cite evidence that coverage probability for this parameter is good over the range of nonnormal distributions investigated when the PERC bootstrap method is used to set CIs for the robust ES.

Journal ArticleDOI
TL;DR: Adaptive confidence balls are constructed for individual resolution levels as well as the entire mean vector in a multiresolution framework in this article, which have guaranteed coverage probability over all of R N and expected squared radius adapting over a maximum range of Besov bodies.
Abstract: Adaptive confidence balls are constructed for individual resolution levels as well as the entire mean vector in a multiresolution framework. Finite sample lower bounds are given for the minimum expected squared radius for confidence balls with a prespecified confidence level. The confidence balls are centered on adaptive estimators based on special local block thresholding rules. The radius is derived from an analysis of the loss of this adaptive estimator. In addition adaptive honest confidence balls are constructed which have guaranteed coverage probability over all of R N and expected squared radius adapting over a maximum range of Besov bodies.

Journal ArticleDOI
TL;DR: The authors show that replacement values below the limit of detection, including those suggested, result in the same biased area under the curve when properly accounted for, but they also provide guidance on the usefulness of these values in limited situations.
Abstract: The receiver operating characteristic curve is a commonly used tool for evaluating biomarker usefulness in clinical diagnosis of disease. Frequently, biomarkers being assessed have immeasurable or unreportable samples below some limit of detection. Ignoring observations below the limit of detection leads to negatively biased estimates of the area under the curve. Several correction methods are suggested in the areas of mean estimation and testing but nothing regarding the receiver operating characteristic curve or its summary measures. In this paper, the authors show that replacement values below the limit of detection, including those suggested, result in the same biased area under the curve when properly accounted for, but they also provide guidance on the usefulness of these values in limited situations. The authors demonstrate maximum likelihood techniques leading to asymptotically unbiased estimators of the area under the curve for both normally and gamma distributed biomarker levels. Confidence intervals are proposed, the coverage probability of which is scrutinized by simulation study. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods.

Journal ArticleDOI
TL;DR: This paper derives five first-order likelihood-based confidence intervals for a population proportion parameter based on binary data subject to false-positive misclassification and obtained using a double sampling plan and determines that an interval estimator derived from inverting a score-type statistic is superior in terms of coverage probabilities to three competing interval estimators for the parameter configurations examined here.

Journal ArticleDOI
TL;DR: In this paper, exact simultaneous confidence sets based on the multivariate t-distribution with estimated correlation matrix and a resampling approach are discussed, and approximate simultaneous confidence intervals are applied to ratios of linear combinations of the means in the one-way layout and ratios of parameter combinations in the general linear model.

Journal ArticleDOI
TL;DR: In this paper, an approximate normal distribution based on a Bayesian uncertainty is proposed as an alternative to the t-distribution based on the W-S formula. But the distribution for the value of the measurand may be approximated by a scaled-and-shifted tdistribution with effective degrees of freedom obtained from the Welch-Satterthwaite (W-S) formula.
Abstract: In certain disciplines, uncertainty is traditionally expressed as an interval about an estimate for the value of the measurand. Development of such uncertainty intervals with a stated coverage probability based on the International Organization for Standardization (ISO) Guide to the Expression of Uncertainty in Measurement (GUM) requires a description of the probability distribution for the value of the measurand. The ISO-GUM propagates the estimates and their associated standard uncertainties for various input quantities through a linear approximation of the measurement equation to determine an estimate and its associated standard uncertainty for the value of the measurand. This procedure does not yield a probability distribution for the value of the measurand. The ISO-GUM suggests that under certain conditions motivated by the central limit theorem the distribution for the value of the measurand may be approximated by a scaled-and-shifted t-distribution with effective degrees of freedom obtained from the Welch–Satterthwaite (W–S) formula. The approximate t-distribution may then be used to develop an uncertainty interval with a stated coverage probability for the value of the measurand. We propose an approximate normal distribution based on a Bayesian uncertainty as an alternative to the t-distribution based on the W–S formula. A benefit of the approximate normal distribution based on a Bayesian uncertainty is that it greatly simplifies the expression of uncertainty by eliminating altogether the need for calculating effective degrees of freedom from the W–S formula. In the special case where the measurand is the difference between two means, each evaluated from statistical analyses of independent normally distributed measurements with unknown and possibly unequal variances, the probability distribution for the value of the measurand is known to be a Behrens–Fisher distribution. We compare the performance of the approximate normal distribution based on a Bayesian uncertainty and the approximate t-distribution based on the W–S formula with respect to the Behrens–Fisher distribution. The approximate normal distribution is simpler and better in this case. A thorough investigation of the relative performance of the two approximate distributions would require comparison for a range of measurement equations by numerical methods.

Journal ArticleDOI
TL;DR: Four interval estimation methods for the ratio of marginal binomial proportions are compared in terms of expected interval width and exact coverage probability and two new methods are proposed that are based on combining two Wilson score intervals are proposed.
Abstract: Four interval estimation methods for the ratio of marginal binomial proportions are compared in terms of expected interval width and exact coverage probability. Two new methods are proposed that are based on combining two Wilson score intervals. The new methods are easy to compute and perform as well or better than the method recently proposed by Nam and Blackwelder. Two sample size formulas are proposed to approximate the sample size required to achieve an interval estimate with desired confidence level and width.

Journal ArticleDOI
TL;DR: An extensive simulation study suggests that the proposed approach based on the adjusted signed log‐likelihood ratio statistic outperforms all the existing methods in terms of coverage probabilities and symmetry of upper and lower tail error probabilities.
Abstract: In this paper, we consider an approach based on the adjusted signed log-likelihood ratio statistic for constructing a confidence interval for the mean of lognormal data with excess zeros. An extensive simulation study suggests that the proposed approach outperforms all the existing methods in terms of coverage probabilities and symmetry of upper and lower tail error probabilities. Finally, we analyzed two real-life datasets using the proposed approach.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric exact quantile interval is developed for ranked-set samples, which provides higher coverage probability and shorter expected length than its simple random sample analog, and in order to achieve the desired confidence level a distribution-free confidence interval that interpolates the adjacent order statistics is constructed.

Journal ArticleDOI
TL;DR: In this paper, a constrained empirical likelihood confidence region for a parameter in the semi-linear errors-in-variables model is proposed, which combines the score function corresponding to the squared orthogonal distance with a constraint on the parameter, and it overcomes that the solution of limiting mean estimation equations is not unique.
Abstract: This paper proposes a constrained empirical likelihood confidence region for a parameter in the semi-linear errors-in-variables model. The confidence region is constructed by combining the score function corresponding to the squared orthogonal distance with a constraint on the parameter, and it overcomes that the solution of limiting mean estimation equations is not unique. It is shown that the empirical log likelihood ratio at the true parameter converges to the standard chi-square distribution. Simulations show that the proposed confidence region has coverage probability which is closer to the nominal level, as well as narrower than those of normal approximation of generalized least squares estimator in most cases. A real data example is given.

Journal ArticleDOI
TL;DR: In this article, the coverage probability errors of both delta method and parametric bootstrap confidence intervals (CIs) for the covariance parameters of stationary long-memory Gaussian time series are determined.

Journal ArticleDOI
TL;DR: In this article, the authors study the large deviation principle for M-estimators (and maximum likelihood estimators in particular) and obtain the rate function of the LDA for M estimators.
Abstract: We study the large deviation principle for M-estimators (and maximum likelihood estimators in particular) We obtain the rate function of the large deviation principle for M-estimators For exponential families, this rate function agrees with the Kullback–Leibler information number However, for location or scale families this rate function is smaller than the Kullback–Leibler information number We apply our results to obtain confidence regions of minimum size whose coverage probability converges to one exponentially In the case of full exponential families, the constructed confidence regions agree with the ones obtained by inverting the likelihood ratio test with a simple null hypothesis

Journal ArticleDOI
TL;DR: In this article, a Bayesian method that models the propensity score as a latent variable was proposed to reduce confounding from measured variables in the analysis of observational data, stratifying patients on the estimated propensity scores, and the impact of modelling uncertainty in the propensity scores in a case study investigating the effect of statin therapy on mortality in Ontario patients discharged from hospital following acute myocardial infarction.
Abstract: In the analysis of observational data, stratifying patients on the estimated propensity scores reduces confounding from measured variables. Confidence intervals for the treatment effect are typically calculated without acknowledging uncertainty in the estimated propensity scores, and intuitively this may yield inferences, which are falsely precise. In this paper, we describe a Bayesian method that models the propensity score as a latent variable. We consider observational studies with a dichotomous treatment, dichotomous outcome, and measured confounders where the log odds ratio is the measure of effect. Markov chain Monte Carlo is used for posterior simulation. We study the impact of modelling uncertainty in the propensity scores in a case study investigating the effect of statin therapy on mortality in Ontario patients discharged from hospital following acute myocardial infarction. Our analysis reveals that the Bayesian credible interval for the treatment effect is 10 per cent wider compared with a conventional propensity score analysis. Using simulations, we show that when the association between treatment and confounders is weak, then this increases uncertainty in the estimated propensity scores. Bayesian interval estimates for the treatment effect are longer on average, though there is little improvement in coverage probability. A novel feature of the proposed method is that it fits models for the treatment and outcome simultaneously rather than one at a time. The method uses the outcome variable to inform the fit of the propensity model. We explore the performance of the estimated propensity scores using cross-validation.

Journal ArticleDOI
TL;DR: In this paper, the problem of randomised intervals possibly being empty is solved using a new technique involving tail functions, with the offshoot being a new class of randomized and Clopper-Pearson intervals.
Abstract: In this paper we develop some new confidence intervals for the binomial proportion. The Clopper–Pearson interval is interpreted as an outcome of randomised confidence interval theory. The problem of randomised intervals possibly being empty is solved using a new technique involving ‘tail functions’, with the offshoot being a new class of randomised and Clopper–Pearson intervals. Some of the new intervals are investigated and shown to have attractive frequentist properties. Coverage probabilities and expected widths are compared and guidelines are established for constructing the optimal generalised Clopper–Pearson interval in any given situation.

Journal ArticleDOI
TL;DR: In this paper, the authors specify three classes of one-sided and two-sided 1 -α confidence intervals with certain monotonicity and symmetry on the confidence limits for the probability of success, the parameter in a binomial distribution.

Journal ArticleDOI
TL;DR: In this paper, an empirical likelihood (EL) inference procedure of the mean residual life (MRL) function is proposed and the limiting distribution of the EL ratio for MRL function is derived.
Abstract: In addition to the distribution function, the mean residual life (MRL) function is the other important function which can be used to characterize a lifetime in survival analysis and reliability. For inference on the MRL function, some procedures have been proposed in the literature. However, the coverage accuracy of such procedures may be low when the sample size is small. In this article, an empirical likelihood (EL) inference procedure of MRL function is proposed and the limiting distribution of the EL ratio for MRL function is derived. Based on the result, we obtain confidence interval/band for the MRL function. The proposed method is compared with the normal approximation based method through simulation study in terms of coverage probability.

Journal ArticleDOI
TL;DR: In this article, the difference between two multivariate normal mean vectors based on incomplete data matrices with different monotone patterns is investigated using Monte Carlo simulation and a multiple comparison procedure is outlined.

Journal ArticleDOI
TL;DR: The batch-means procedure ASAP3 and the spectral procedure WASSP are sequential procedures designed to produce a confidence-interval estimator for the mean response that satisfies user-specified half-length and coverage-probability requirements as discussed by the authors.
Abstract: The performance of the batch-means procedure ASAP3 and the spectral procedure WASSP is evaluated on test problems with characteristics typical of practical applications of steady-state simulation analysis procedures. ASAP3 and WASSP are sequential procedures designed to produce a confidence-interval estimator for the mean response that satisfies user-specified half-length and coverage-probability requirements. ASAP3 is based on an inverse Cornish-Fisher expansion for the classical batch-means t-ratio, whereas WASSP is based on a wavelet estimator of the batch-means power spectrum. Regarding closeness of the empirical coverage probability and average half-length of the delivered confidence intervals to their respective nominal levels, both procedures compared favorably with the Law-Carson procedure and the original ASAP algorithm. Regarding the average sample sizes required for decreasing levels of maximum confidence-interval half-length, ASAP3 and WASSP exhibited reasonable efficiency in the test problems.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new method of calibration for the empirical loglikelihood ratio, which corrects the undercoverage problem of the χ 2-approximation.
Abstract: Summary Empirical likelihood has attracted much attention in the literature as a nonparametric method. A recent paper by Lu & Peng (2002)[Likelihood based confidence intervals for the tail index. Extremes 5, 337–352] applied this method to construct a confidence interval for the tail index of a heavy-tailed distribution. It turns out that the empirical likelihood method, as well as other likelihood-based methods, performs better than the normal approximation method in terms of coverage probability. However, when the sample size is small, the confidence interval computed using the χ 2 approximation has a serious undercoverage problem. Motivated by Tsao (2004)[A new method of calibration for the empirical loglikelihood ratio. Statist. Probab. Lett. 68, 305–314], this paper proposes a new method of calibration, which corrects the undercoverage problem.