scispace - formally typeset
Search or ask a question

Showing papers on "U-statistic published in 2009"


Journal ArticleDOI
TL;DR: A recursive optimal filter with global optimality in the sense of unbiased minimum variance over all linear unbiased estimators, is provided, which is milder than existing approaches.

132 citations


Journal ArticleDOI
TL;DR: In this paper, a nonparametric estimator for the area under the receiver operating characteristic (ROC) curve (AUC) with binary time-varying failure status is proposed.
Abstract: The performance of clinical tests for disease screening is often evaluated using the area under the receiver-operating characteristic (ROC) curve (AUC). Recent developments have extended the traditional setting to the AUC with binary time-varying failure status. Without considering covariates, our first theme is to propose a simple and easily computed nonparametric estimator for the time-dependent AUC. Moreover, we use generalized linear models with time-varying coefficients to characterize the time-dependent AUC as a function of covariate values. The corresponding estimation procedures are proposed to estimate the parameter functions of interest. The derived limiting Gaussian processes and the estimated asymptotic variances enable us to construct the approximated confidence regions for the AUCs. The finite sample properties of our proposed estimators and inference procedures are examined through extensive simulations. An analysis of the AIDS Clinical Trials Group (ACTG) 175 data is further presented to show the applicability of the proposed methods. The Canadian Journal of Statistics 38:8–26; 2010 © 2009 Statistical Society of Canada

118 citations


Journal ArticleDOI
TL;DR: In this paper, the empirical difference estimator was used to estimate the average of forest attributes for the whole study area or sub-areas, and the performance of the estimator is evaluated by an extensive simulation study performed on several populations whose dimensions and covariate values are taken from a real case study.

98 citations


Journal ArticleDOI
TL;DR: A coupling idea of Dehling and Mikosch is used to show that the bootstrap counterpart converges to the same distribution and is applied to a goodness-of-fit test based on the empirical characteristic function.

38 citations


Journal ArticleDOI
01 Jan 2009
TL;DR: This work designs an unbiased estimator for a query and proves that it is indeed unbiased, and provides a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate.
Abstract: We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments.

30 citations


Journal ArticleDOI
TL;DR: In this paper, several estimators of the expectation, median and mode of the lognormal distribution are derived, which aim to be approximately unbiased, efficient, or have a minimax property in the class of estimators they introduce.

24 citations


Journal ArticleDOI
TL;DR: By embedding the missing covariate data into a left-truncated and right-censored survival model, a new class of weighted estimating functions for the Cox regression model with missing covariates are proposed, called the pseudo-partial likelihood estimators, which are shown to be consistent and asymptotically normal.
Abstract: By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported.

23 citations


Journal ArticleDOI
TL;DR: The cornerstone of the procedures here introduced is the connection between cumulants of a random variable and a suitable compound Poisson random variable, which holds also for multivariate random variables.
Abstract: We propose new algorithms for generating k-statistics, multivariate k-statistics, polykays and multivariate polykays. The resulting computational times are very fast compared with procedures existing in the literature. Such speeding up is obtained by means of a symbolic method arising from the classical umbral calculus. The classical umbral calculus is a light syntax that involves only elementary rules to managing sequences of numbers or polynomials. The cornerstone of the procedures here introduced is the connection between cumulants of a random variable and a suitable compound Poisson random variable. Such a connection holds also for multivariate random variables.

23 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the unbiased estimator of a certain parameter of the selected population does not exist and that it is a function of order statistics, which is a known result in the literature.

20 citations


Journal ArticleDOI
TL;DR: A consistent efficient estimator of the fourth-order cumulant for real discrete-time random i.i.d. (at least up to order 8) zero-mean signal is proposed, in both, batch and adaptive versions.
Abstract: In this paper, a consistent efficient estimator of the fourth-order cumulant for real discrete-time random i.i.d. (at least up to order 8) zero-mean signal is proposed, in both, batch and adaptive versions. In batch version, the proposed estimator is not only consistent, but also unbiased and efficient. The systematical theoretical and experimental studies with comparisons between the proposed estimator and three other estimators of the fourth-order cumulant (the natural or the traditional one, the trivial unbiased estimator for the known power case and the fourth k -statistics), are undertaken, for both, normal and uniform processes. Then, the adaptive versions of the estimators (all, except the fourth k-statistics), are given and studied in detail. The convergence in mean and the convergence in mean square analyses are performed for them, first theoretically, then empirically. Finally, the whole set of analyses carried out for both batch and adaptive versions shows that from many points of view the proposed estimator is interesting for use in versatile signal processing applications, especially in real-time and short-term ones.

17 citations


Journal ArticleDOI
TL;DR: A new formula for linear unbiased prediction of the local clock timescales is proposed and a new gain is derived for the p-step ramp unbiased finite impulse response predictor that gives the best linear unbiased fit suitable for forming the prediction vector.
Abstract: In this paper, we propose a new formula for linear unbiased prediction of the local clock timescales. To predict future errors over all the measurement data, a new gain is derived for the p-step ramp unbiased finite impulse response (FIR) predictor. We then show that this gain gives the best linear unbiased fit suitable for forming the prediction vector. The predictor proposed is consistent with linear regression and best linear unbiased estimator. Applications are given for a crystal clock and the USNO Master Clock.

Journal ArticleDOI
TL;DR: In this article, it was shown that the class of non-deterministically normalized U-statistics is closed under studentization (or self-normalization), for example, using the jackknife estimators of variances.
Abstract: Let X1;:::;Xn be i.i.d. random observations, taking their values in a measurable space. Consider a U-statistic of order k, sayS =S(X1;:::;Xn), based on the sample X1;:::;Xn. Assume that S can be represented in the form S =L +T with EL = ET = 0, where L is a linear statistic, and T is a (stochastically smaller) statistic. Assume further, that the linear statistic L is approximately normal and that we have a bound for error of such approximation. Our objective is to show that a similar bound holds for the distribution ofS provided that we correct it by adding (varT) i ln 2 (3 + 1=varT) ¢ , where varT = ET 2 is the variance of T. It occurs that one has to correct the normal approximation as well, replacing it by some asymptotically (as n ! 1) equivalent distribution. Furthermore, we show that the class of U-statistics (up to negligible error terms) is closed under studentization (or self-normalization), for example, using the jackknife estimators of variances. This leads to error bounds of normal approximations for non-deterministically normalized U-statistics. Our results extend, reflne and yield a number of related known results. The bounds have a simple and convenient form for applications and analysis since they involve only the variance of the noise termT. In the case k ‚ 3 of higher order statistics, it remains open whether one can improve the bounds removing the logarithmic factor ln 2 (3 + 1=varT). We provide as well an optimal bound under a lower moment assumption EjTj fi < 1, fi < 2.

Journal Article
TL;DR: In this article, a confidence region for the time-dependent area under the receiver op- erating characteristic curve (AUC) can be constructed based on the asymptotic normality of a nonparametric estimator.
Abstract: A confidence region for the time-dependent area under the receiver op- erating characteristic curve (AUC) can be constructed based on the asymptotic normality of a non-parametric estimator. In numerical studies, it was found that the performance of the normal approximated confidence interval is dramatically affected by small sample size and high censoring rate. To improve the accuracy of coverage probabilities as well as interval estimators, the random weighted bootstrap distribution and the Edgeworth expansion with remainder term o(n −1/2 ) are pro- posed to approximate the sampling distribution of the estimator. The asymptotic properties of random weighted bootstrap analogue and the one-term Edgeworth expansion are developed in this article. The usefulness of the proposed procedures are confirmed by a class of simulations with different sample sizes and censoring rates. Moreover, our methods are demonstrated using the ACTG 175 data.

Journal ArticleDOI
Abstract: Depth functions are increasingly being used in building nonparametric outlier detectors and in constructing useful nonparametric statistics such as depth-weighted L-statistics (DL-statistics). Robustness of a depth function is an essential property for such applications. Here, robustness of three key depth functions, spatial, simplicial, and generalised Tukey, is explored via the influence function (IF) approach. For all three depths, the IFs are derived and found to be bounded, an important robustness property, and are applied to evaluate two other robustness features, gross error sensitivity and local shift sensitivity. These IFs are also used as components of the IFs of associated DL-statistics, for which through a standard approach consistency and asymptotic normality are then derived. In turn, the asymptotic normality is applied to obtain asymptotic relative efficiencies (ARE). For spatial depth, two forms of weight function suggested in the recent literature are considered and AREs in comparison wit...

Journal ArticleDOI
01 Jun 2009
TL;DR: In this article, the minimum mean-squared error biased estimators (MBBEs) were proposed and compared with the unbiased estimators. But the MBBEs are biased and have the minimum possible mean-square error.
Abstract: The unbiased estimator of a population variance σ2, S2 has traditionally been overemphasized, regardless of sample size. In this paper, alternative estimators of population variance are developed. These estimators are biased and have the minimum possible mean-squared error [and we define them as the “minimum mean-squared error biased estimators” (MBBE)]. The comparative merit of these estimators over the unbiased estimator is explored using relative efficiency (RE) (a ratio of mean-squared error values). It is found that, across all population distributions investigated, the RE of the MBBE is much higher for small samples and progressively diminishes to 1 with increasing sample size. The paper gives two applications involving the normal and exponential distributions.

Journal ArticleDOI
TL;DR: The stationary density of a centered invertible linear process can be represented as a convolution of innovation-based densities, and it can be estimated at the parametric rate by plugging residual-based kernel estimators into the convolution representation as mentioned in this paper.
Abstract: The stationary density of a centered invertible linear process can be represented as a convolution of innovation-based densities, and it can be estimated at the parametric rate by plugging residual-based kernel estimators into the convolution representation. We have shown elsewhere that a functional central limit theorem holds both in the space of continuous functions vanishing at infinity, and in weighted L 1-spaces. Here, we show that we can improve the plug-in estimator considerably, exploiting the information that the innovations are centered, and replacing the kernel estimators by weighted versions, using the empirical likelihood approach.

Journal ArticleDOI
TL;DR: In this article, the authors explore the use of small bias kernel-based methods to construct confidence intervals, in particular using a geometric density estimator that seems better suited for this purpose.
Abstract: Confidence intervals for densities built on the basis of standard nonparametric theory are doomed to have poor coverage rates due to bias. Studies on coverage improvement exist, but reasonably behaved interval estimators are needed. We explore the use of small bias kernel-based methods to construct confidence intervals, in particular using a geometric density estimator that seems better suited for this purpose.

Posted Content
TL;DR: In this paper, the authors provide an unbiased estimator for the absolute S-Gini and almost unbiased estimators for the relative S -Gini for integer parameter values, and they show that these estimators perform considerably better then the usual estimators.
Abstract: This note provides an unbiased estimator for the absolute S-Gini and an almost unbiased estimator for the relative S-Gini for integer parameter values. Simulations indicate that these estimators perform considerably better then the usual estimators, especially for small sample sizes.

Journal ArticleDOI
TL;DR: In this article, the authors constructed minimum distance estimators for α by minimizing the Kolmogorov distance or the Cramer-von-Mises distance between the empirical distribution function G n, and a class of distributions defined based on the sum-preserving property of stable random variables.
Abstract: Assume that X 1, X 2,…, X n is a sequence of i.i.d. random variables with α-stable distribution (α ∈ (0,2], the stable exponent, is the unknown parameter). We construct minimum distance estimators for α by minimizing the Kolmogorov distance or the Cramer–von-Mises distance between the empirical distribution function G n , and a class of distributions defined based on the sum-preserving property of stable random variables. The minimum distance estimators can also be obtained by minimizing a U-statistic estimate of an empirical distribution function involving the stable exponent. They share the same invariance property with the maximum likelihood estimates. In this article, we prove the strong consistency of the minimum distance estimators. We prove the asymptotic normality of our estimators. Simulation study shows that the new estimators are competitive to the existing ones and perform very closely even to the maximum likelihood estimator.

Journal ArticleDOI
S. Sampath1
TL;DR: In this paper, an unbiased estimator for finite population variance is developed under linear systematic sampling with two random starts and an explicit expression for its variance is also obtained, supported by two real life situations.
Abstract: In this article, an unbiased estimator for finite population variance is developed under linear systematic sampling with two random starts and an explicit expression for its variance is also obtained. The study is supported by two real life situations. A detailed numerical comparative study has been carried out to compare its average variance with the average variance of the conventional unbiased estimator for finite population variance under simple random sampling for a wide variety of populations. Results based on the study strongly favor the use of the developed estimator for such populations.

Journal ArticleDOI
TL;DR: In this article, weak convergence of U -statistics via approximation in probability is investigated. But the conditional expectation of the kernel is assumed to be in the domain of attraction of the normal law (instead of the classical two-moment condition).

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of estimating the invariant distribution function of an ergodic diffusion process when the drift coefficient is unknown and study the properties of optimality for another kind of estimator recently proposed.
Abstract: We consider the problem of the estimation of the invariant distribution function of an ergodic diffusion process when the drift coefficient is unknown. The empirical distribution function is a natural estimator which is unbiased, uniformly consistent and efficient in different metrics. Here we study the properties of optimality for another kind of estimator recently proposed. We consider a class of unbiased estimators and we show that they are also efficient in the sense that their asymptotic risk, defined as the integrated mean square error, attains the same asymptotic minimax lower bound of the empirical distribution function.

Posted Content
TL;DR: In this paper, a mixed random -systematic sample is used to estimate the population variance for a variable with a random order in the population, so that the sample variance of simple random sampling without replacement is used.
Abstract: Systematic sampling is a commonly used technique due to its simplicity and ease of implementation. The drawback of this simplicity is that it is not possible to estimate the design variance without bias. There are several ways to circumvent this problem. One method is to suppose that the variable of interest has a random order in the population, so the sample variance of simple random sampling without replacement is used. By means of a mixed random - systematic sample, an unbiased estimator of the population variance for simple

Posted Content
TL;DR: In this paper, the authors proposed a new model with random variance components for estimating small area characteristics, and derived the empirical best linear unbiased estimator, an approximation to terms of order o(1/m) and an estimator whose bias is of order O(1 /m) for its mean squared error, where m is the number of small areas in the population.
Abstract: In this paper, we propose a new model with random variance components for estimating small area characteristics. Under the proposed model, we derive the empirical best linear unbiased estimator, an approximation to terms of order o(1/ m) and an estimator whose bias is of order o(1/ m) for its mean squared error, where m is the number of small areas in the population.


Journal ArticleDOI
TL;DR: In this article, a class of the best linear unbiased estimators (BLUE) of the linear parametric functions of a family of multivariate growth curve models is considered. And the results are expressed in a convenient computational form by using the coordinate-free approach and the usual parametric representations.
Abstract: The purpose of this article is to build a class of the best linear unbiased estimators (BLUE) of the linear parametric functions, to prove some necessary and sufficient conditions for their existence and to derive them from the corresponding normal equations, when a family of multivariate growth curve models is considered. It is shown that the classical BLUE known for this family of models is the element of a particular class of BLUE built in the proposed manner. The results are expressed in a convenient computational form by using the coordinate-free approach and the usual parametric representations.

Journal ArticleDOI
TL;DR: In this paper, the problem of unbiased estimation of the mean life and the reliability for an exponential life distribution using time censored sample data is considered, and necessary and sufficient conditions for the existence of an unbiased estimator (i.e., the uniformly minimum variance unbiased estimators) are given.

Posted Content
Abstract: In 1948, W. Hoeffding introduced a large class of unbiased estimators called U-statistics, defined as the average value of a real-valued m-variate function h calculated at all possible sets of m points from a random sample. In the present paper, we investigate the corresponding robust analogue which we call U-quantile-statistics. We are concerned with the asymptotic behavior of the sample p-quantile of such function h instead of its average. Alternatively, U-quantile-statistics can be viewed as quantile estimators for a certain class of dependent random variables. Examples are given by a slightly modified Hodges-Lehmann estimator of location and the median interpoint distance among random points in space.

Posted Content
TL;DR: In this article, the authors proposed a new model with random variance components for estimating small area characteristics, and derived the empirical best linear unbiased estimator, an approximation to terms of order o(1/m) and an estimator whose bias is of order O(1 /m) for its mean squared error, where m is the number of small areas in the population.
Abstract: In this paper, we propose a new model with random variance components for estimating small area characteristics. Under the proposed model, we derive the empirical best linear unbiased estimator, an approximation to terms of order o(1/ m) and an estimator whose bias is of order o(1/ m) for its mean squared error, where m is the number of small areas in the population.

01 Jan 2009
TL;DR: In this article, the authors considered the biased initial estimators, and constructed the new restriction estimator, which is based on the unbiased initial estimator proposed by Knottnerus (2003).
Abstract: Nowadays, the users of official statistics often require that estimates satisfy some certain restrictions. For example in the domain's case this requirement is that the estimators of the domain totals sum up to the population total or to its estimate. Another example is that quarterly estimates have to sum up to the yearly total. It is natural that such relationships are hold for the true population parameters, so they can be considered and used as a kind of the auxiliary information. Involving this information into the estimation process can improve the estimates. One solution to the described situation is the general restriction (GR) estimator proposed by Knottnerus (2003) that is based on the unbiased initial estimators. The advantages of this GR estimator are the variance minimizing property among other linear estimators satisfying the same restrictions and using the same initial estimators in its construction. But it is well known that there are very many good estimators that are unbiased only asymptotically. We will consider the biased initial estimators, and will construct the new restriction estimator. References: