scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 2002"


Journal ArticleDOI
TL;DR: A new graphical model, called a vine, for dependent random variables, which generalize the Markov trees often used in modelling high-dimensional distributions and is weakened to allow for various forms of conditional dependence.
Abstract: A new graphical model, called a vine, for dependent random variables is introduced. Vines generalize the Markov trees often used in modelling high-dimensional distributions. They differ from Markov trees and Bayesian belief nets in that the concept of conditional independence is weakened to allow for various forms of conditional dependence.

1,247 citations


Journal ArticleDOI
TL;DR: A class of graphical independence models that is closed under marginalization and conditioning but that contains all DAG independence models, called maximal ancestral graphs, which lead to a simple parametrization of the corresponding set of distributions in the Gaussian case.
Abstract: This paper introduces a class of graphical independence models that is closed under marginalization and conditioning but that contains all DAG independence models. This class of graphs, called maximal ancestral graphs, has two attractive features: there is at most one edge between each pair of vertices; every missing edge corresponds to an independence relation. These features lead to a simple parameterization of the corresponding set of distributions in the Gaussian case.

632 citations


Journal ArticleDOI
TL;DR: Fan and Li as mentioned in this paper extended the nonconcave penalized likelihood approach to the Cox proportional hazards model and Cox proportional hazard frailty model, two commonly used semi-parametric models in survival analysis and proposed new variable selection procedures for these two commonly-used models.
Abstract: A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an oracle property. The proposed procedures were illustrated in the context of linear regression, robust linear regression and generalized linear models. In this paper, the nonconcave penalized likelihood approach is extended further to the Cox proportional hazards model and the Cox proportional hazards frailty model, two commonly used semi-parametric models in survival analysis. As a result, new variable selection procedures for these two commonly-used models are proposed. It is demonstrated how the rates of convergence depend on the regularization parameter in the penalty function. Further, with a proper choice of the regularization parameter and the penalty function, the proposed estimators possess an oracle property. Standard error formulae are derived and their accuracies are empirically tested. Simulation studies show that the proposed procedures are more stable in prediction and more effective in computation than the best subset variable selection, and they reduce model complexity as effectively as the best subset variable selection. Compared with the LASSO, which is the penalized likelihood method with the $L_1$ -penalty, proposed by Tibshirani, the newly proposed approaches have better theoretic properties and finite sample performance.

570 citations


Journal ArticleDOI
TL;DR: In this paper, the authors prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers, such as boosting and bagging.
Abstract: We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of $\ell_1$-norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Levy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.

475 citations


Journal ArticleDOI
TL;DR: In this article, the authors analyzed whether standard covariance matrix tests work when dimensionality is large, and in particular larger than sample size, and found that the existing test for sphericity is robust against high dimensionality, but not the test for equality of the covariance matrices to a given matrix.
Abstract: This paper analyzes whether standard covariance matrix tests work when dimensionality is large, and in particular larger than sample size. In the latter case, the singularity of the sample covariance matrix makes likelihood ratio tests degenerate, but other tests based on quadratic forms of sample covariance matrix eigenvalues remain well-defined. We study the consistency property and limiting distribution of these tests as dimensionality and sample size go to infinity together, with their ratio converging to a finite nonzero limit. We find that the existing test for sphericity is robust against high dimensionality, but not the test for equality of the covariance matrix to a given matrix. For the latter test, we develop a new correction to the existing test statistic that makes it robust against high dimensionality.

413 citations


Journal ArticleDOI
TL;DR: The notion of the Central Mean Subspace (CMS), a natural inferential object for dimension reduction when the mean function is of interest, is introduced and methods to estimate it are developed.
Abstract: In many situations regression analysis is mostly concerned with inferring about the conditional mean of the response given the predictors, and less concerned with the other aspects of the conditional distribution. In this paper we develop dimension reduction methods that incorporate this consideration. We introduce the notion of the Central Mean Subspace (CMS), a natural inferential object for dimension reduction when the mean function is of interest. We study properties of the CMS, and develop methods to estimate it. These methods include a new class of estimators which requires fewer conditions than pHd, and which displays a clear advantage when one of the conditions for pHd is violated. CMS also reveals a transparent distinction among the existing methods for dimension reduction: OLS, pHd, SIR and SAVE. We apply the new methods to a data set involving recumbent cows.

363 citations


Journal ArticleDOI
TL;DR: It is proved that the curvelet shrinkage can be tuned so that the estimator will attain, within logarithmic factors, the MSE $O(\varepsilon^{4/5})$ as noise level $\varePSilon\to 0$.
Abstract: We consider a model problem of recovering a function $f(x_1,x_2)$ from noisy Radon data. The function $f$ to be recovered is assumed smooth apart from a discontinuity along a $C^2$ curve, that is, an edge. We use the continuum white-noise model, with noise level $\varepsilon$. Traditional linear methods for solving such inverse problems behave poorly in the presence of edges. Qualitatively, the reconstructions are blurred near the edges; quantitatively, they give in our model mean squared errors (MSEs) that tend to zero with noise level $\varepsilon$ only as $O(\varepsilon^{1/2})$ as $\varepsilon\to 0$. A recent innovation--nonlinear shrinkage in the wavelet domain--visually improves edge sharpness and improves MSE convergence to $O(\varepsilon^{2/3})$. However, as we show here, this rate is not optimal. In fact, essentially optimal performance is obtained by deploying the recently-introduced tight frames of curvelets in this setting. Curvelets are smooth, highly anisotropic elements ideally suited for detecting and synthesizing curved edges. To deploy them in the Radon setting, we construct a curvelet-based biorthogonal decomposition of the Radon operator and build "curvelet shrinkage" estimators based on thresholding of the noisy curvelet coefficients. In effect, the estimator detects edges at certain locations and orientations in the Radon domain and automatically synthesizes edges at corresponding locations and directions in the original domain. We prove that the curvelet shrinkage can be tuned so that the estimator will attain, within logarithmic factors, the MSE $O(\varepsilon^{4/5})$ as noise level $\varepsilon\to 0$. This rate of convergence holds uniformly over a class of functions which are $C^2$ except for discontinuities along $C^2$ curves, and (except for log terms) is the minimax rate for that class. Our approach is an instance of a general strategy which should apply in other inverse problems; we sketch a deconvolution example.

347 citations


Journal ArticleDOI
TL;DR: Brown, Cai and DasGupta as mentioned in this paper compared the coverage properties of the standard Wald interval and four alternative interval methods by asymptotic expansions of their coverage probabilities and expected lengths.
Abstract: We address the classic problem of interval estimation of a binomial proportion. The Wald interval $\hat{p}\pm z_{\alpha/2} n^{-1/2} (\hat{p} (1 - \hat{p}))^{1/2}$ is currently in near universal use. We first show that the coverage properties of the Wald interval are persistently poor and defy virtually all conventional wisdom. We then proceed to a theoretical comparison of the standard interval and four additional alternative intervals by asymptotic expansions of their coverage probabilities and expected lengths. The four additional interval methods we study in detail are the score-test interval (Wilson), the likelihood-ratio-test interval, a Jeffreys prior Bayesian interval and an interval suggested by Agresti and Coull. The asymptotic expansions for coverage show that the first three of these alternative methods have coverages that fluctuate about the nominal value, while the Agresti–Coull interval has a somewhat larger and more nearly conservative coverage function. For the five interval methods we also investigate asymptotically their average coverage relative to distributions for $p$ supported within (0 1) . In terms of expected length, asymptotic expansions show that the Agresti–Coull interval is always the longest of these. The remaining three are rather comparable and are shorter than the Wald interval except for $p$ near 0 or 1. These analytical calculations support and complement the findings and the recommendations in Brown, Cai and DasGupta (Statist. Sci. (2001) 16 101–133).

299 citations


Journal ArticleDOI
TL;DR: The concept of false discovery rate (FDR) has been receiving increasing attention by researchers in multiple hypotheses testing as mentioned in this paper, where the critical values of the Benjamini-Hochberg step-up procedure can be used in a much more general step-down procedure under similar positive dependency.
Abstract: The concept of false discovery rate (FDR) has been receiving increasing attention by researchers in multiple hypotheses testing This paper produces some theoretical results on the FDR in the context of stepwise multiple testing procedures with dependent test statistics It was recently shown by Benjamini and Yekutieli that the Benjamini–Hochberg step-up procedure controls the FDR when the test statistics are positively dependent in a certain sense This paper strengthens their work by showing that the critical values of that procedure can be used in a much more general stepwise procedure under similar positive dependency It is also shown that the FDR-controlling Benjamini–Liu step-down procedure originally developed for independent test statistics works even when the test statistics are positively dependent in some sense An explicit expression for the FDR of a generalized stepwise procedure and an upper bound to the FDR of a step-down procedure are obtained in terms of probability distributions of ordered components of dependent random variables before establishing the main results

298 citations


Journal ArticleDOI
TL;DR: In this paper, an adjusted empirical likelihood approach to inference for the mean of the response variable is developed, and a nonparametric version of Wilks' theorem is proved for the adjusted empirical log-likelihood ratio by showing that it has an asymptotic standard chi-squared distribution.
Abstract: Inference under kernel regression imputation for missing response data is considered. An adjusted empirical likelihood approach to inference for the mean of the response variable is developed. A nonparametric version of Wilks' theorem is proved for the adjusted empirical log-likelihood ratio by showing that it has an asymptotic standard chi-squared distribution, and the corresponding empirical likelihood confidence interval for the mean is constructed. With auxiliary information, an empirical likelihood-based estimator is defined and an adjusted empirical log-likelihood ratio is derived. Asymptotic normality of the estimator is proved. Also, it is shown that the adjusted empirical log-likelihood ratio obeys Wilks' theorem. A simulation study is conducted to compare the adjusted empirical likelihood and the normal approximation methods in terms of coverage accuracies and average lengths of confidence intervals. Based on biases and standard errors, a comparision is also made by simulation between the empirical likelihood-based estimator and related estimators. Our simulation indicates that the adjusted empirical likelihood method performs competitively and that the use of auxiliary information provides improved inferences.

267 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider a sequence space model of statistical linear inverse problems where the objective is to mimic the estimator in the set of linear estimators that has the smallest risk on the true function.
Abstract: We consider a sequence space model of statistical linear inverse problems where we need to estimate a function $f$ from indirect noisy observations. Let a finite set $\Lambda$ of linear estimators be given. Our aim is to mimic the estimator in $\Lambda$ that has the smallest risk on the true $f$. Under general conditions, we show that this can be achieved by simple minimization of an unbiased risk estimator, provided the singular values of the operator of the inverse problem decrease as a power law. The main result is a nonasymptotic oracle inequality that is shown to be asymptotically exact. This inequality can also be used to obtain sharp minimax adaptive results. In particular, we apply it to show that minimax adaptation on ellipsoids in the multivariate anisotropic case is realized by minimization of unbiased risk estimator without any loss of efficiency with respect to optimal nonadaptive procedures.

Journal ArticleDOI
TL;DR: In this article, it is argued that inference on the basis of a model is not possible unless the model admits a natural extension that includes the domain for which inference is required, and examples are given to show why such an extension is necessary and why a formal theory is required.
Abstract: This paper addresses two closely related questions, "What is a statistical model?" and "What is a parameter?" The notions that a model must "make sense," and that a parameter must "have a well-defined meaning" are deeply ingrained in applied statistical work, reasonably well understood at an instinctive level, but absent from most formal theories of modelling and inference. In this paper, these concepts are defined in algebraic terms, using morphisms, functors and natural transformations. It is argued that inference on the basis of a model is not possible unless the model admits a natural extension that includes the domain for which inference is required. For example, prediction requires that the domain include all future units, subjects or time points. Although it is usually not made explicit, every sensible statistical model admits such an extension. Examples are given to show why such an extension is necessary and why a formal theory is required. In the definition of a subparameter, it is shown that certain parameter functions are natural and others are not. Inference is meaningful only for natural parameters. This distinction has important consequences for the construction of prior distributions and also helps to resolve a controversy concerning the Box-Cox model.

Journal ArticleDOI
TL;DR: In this article, a new class of robust estimators for the linear regression model is introduced, weighted least squares estimators, with weights adaptively computed using the empirical distribution of the residuals of an initial robust estimator.
Abstract: This paper introduces a new class of robust estimators for the linear regression model. They are weighted least squares estimators, with weights adaptively computed using the empirical distribution of the residuals of an initial robust estimator. It is shown that under certain general conditions the asymptotic breakdown points of the proposed estimators are not less than that of the initial estimator, and the finite sample breakdown point can be at most $1/n$ less. For the special case of the least median of squares as initial estimator, hard rejection weights and normal errors and carriers, the maximum bias function of the proposed estimators for point-mass contaminations is numerically computed, with the result that there is almost no worsening of bias. Moreover–and this is the original contribution of this paper–if the errors are normally distributed and under fairly general conditions on the design the proposed estimators have full asymptotic efficiency. A Monte Carlo study shows that they have better behavior than the initial estimators for finite sample sizes.

Journal ArticleDOI
TL;DR: In this article, the auxiliary scale estimate is included in the reweighted representation of the estimates to obtain a bootstrap method that is asymptotically correct, and the breakdown points of the quantile estimates derived with this method are higher than those obtained with the bootstrap.
Abstract: We introduce a new computer-intensive method to estimate the distribution of robust regression estimates. The basic idea behind our method is to bootstrap a reweighted representation of the estimates. To obtain a bootstrap method that is asymptotically correct, we include the auxiliary scale estimate in our reweighted representation of the estimates. Our method is computationally simple because for each bootstrap sample we only have to solve a linear system of equations. The weights we use are decreasing functions of the absolute value of the residuals and hence outlying observations receive small weights. This results in a bootstrap method that is resistant to the presence of outliers in the data. The breakdown points of the quantile estimates derived with this method are higher than those obtained with the bootstrap. We illustrate our method on two datasets and we report the results of a Monte Carlo experiment on confidence intervals for the parameters of the linear model.

Journal ArticleDOI
TL;DR: In this paper, a minimum discrepancy approach is proposed to estimate the dimension of the partial central subspace of a SIR regression with both continuous and categorical predictors without the need of homogeneous predictor covariances across the subpopulations.
Abstract: Though partial sliced inverse regression (partial SIR: Chiaromonte et al. [2002. Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 30, 475–497]) extended the scope of sufficient dimension reduction to regressions with both continuous and categorical predictors, its requirement of homogeneous predictor covariances across the subpopulations restricts its application in practice. When this condition fails, partial SIR may provide misleading results. In this article, we propose a new estimation method via a minimum discrepancy approach without this restriction. Our method is optimal in terms of asymptotic efficiency and its test statistic for testing the dimension of the partial central subspace always has an asymptotic chi-squared distribution. It also gives us the ability to test predictor effects. An asymptotic chi-squared test of the conditional independence hypothesis that the response is independent of a selected subset of the continuous predictors given the remaining predictors is obtained.

Journal ArticleDOI
TL;DR: In this paper, a unified jackknife theory for a fairly general class of mixed models is presented, which includes some of the widely used mixed linear models and generalized linear mixed models as special cases.
Abstract: The paper presents a unified jackknife theory for a fairly general class of mixed models which includes some of the widely used mixed linear models and generalized linear mixed models as special cases. The paper develops jackknife theory for the important, but so far neglected, prediction problem for the general mixed model. For estimation of fixed parameters, a jackknife method is considered for a general class of M-estimators which includes the maximum likelihood, residual maximum likelihood and ANOVA estimators for mixed linear models and the recently developed method of simulated moments estimators for generalized linear mixed models. For both the prediction and estimation problems, a jackknife method is used to obtain estimators of the mean squared errors (MSE). Asymptotic unbiasedness of the MSE estimators is shown to hold essentially under certain moment conditions. Simulation studies undertaken support our theoretical results.

Journal ArticleDOI
TL;DR: In this article, an estimator of the additive components of a nonparametric additive model with a known link function is presented. And the asymptotic distribution of each additive component is the same as it would be if the other components were known with certainty.
Abstract: This paper describes an estimator of the additive components of a nonparametric additive model with a known link function. When the additive components are twice continuously differentiable, the estimator is asymptotically normally distributed with a rate of convergence in probability of 2 / 5 n - . This is true regardless of the (finite) dimension of the explanatory variable. Thus, in contrast to the existing asymptotically normal estimator, the new estimator has no curse of dimensionality. Moreover, the asymptotic distribution of each additive component is the same as it would be if the other components were known with certainty.

Journal ArticleDOI
TL;DR: It is shown that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normal-Wishart distribution.
Abstract: We develop simple methods for constructing parameter priors for model choice among directed acyclic graphical (DAG) models. In particular, we introduce several assumptions that permit the construction of parameter priors for a large number of DAG models from a small set of assessments. We then present a method for directly computing the marginal likelihood of every DAG model given a random sample with no missing observations. We apply this methodology to Gaussian DAG models which consist of a recursive set of linear regression models. We show that the only parameter prior for complete Gaussian DAG models that satisfies our assumptions is the normal-Wishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let $W$ be an $n \times n$, $n \ge 3$, positive definite symmetric matrix of random variables and $f(W)$ be a pdf of $W$. Then, $f(W)$ is a Wishart distribution if and only if $W_{11} - W_{12} W_{22}^{-1} W'_{12}$ is independent of $\{W_{12},W_{22}\}$ for every block partitioning $W_{11},W_{12}, W'_{12}, W_{22}$ of $W$. Similar characterizations of the normal and normal-Wishart distributions are provided as well.

Journal ArticleDOI
TL;DR: Nonasymptotic risk bounds are provided for maximum likelihood-type isotonic estimators of an unknown nondecreasing regression function, with general average loss at design points, and they imply uniform n -1/3 -consistency of the p risk for unknown regression functions of uniformly bounded variation.
Abstract: Nonasymptotic risk bounds are provided for maximum likelihood-type isotonic estimators of an unknown nondecreasing regression function, with general average loss at design points. These bounds are optimal up to scale constants, and they imply uniform $n^{-1/3}$-consistency of the $\ell_p$ risk for unknown regression functions of uniformly bounded variation, under mild assumptions on the joint probability distribution of the data, with possibly dependent observations.

Journal ArticleDOI
TL;DR: In this paper, a partitioning principle (PP) is introduced for the construction of multiple decision procedures and selection procedures, which is based on a partition of the parameter space and can be used for testing and selecting PPs.
Abstract: A first general principle and nowadays state of the art for the construction of powerful multiple test procedures controlling a multiple level $\alpha$ is the so-called closure principle. In this article we introduce another powerful tool for the construction of multiple decision procedures, especially for the construction of multiple test procedures and selection procedures. This tool is based on a partition of the parameter space and will be called partitioning principle (PP). In the first part of the paper we review basic concepts of multiple hypotheses testing and discuss a slight generalization of the current theory. In the second part we present various variants of the PP for the construction of multiple test procedures, these are a general PP (GPP), a weak PP (WPP) and a strong PP (SPP). It will be shown that, depending on the underlying decision problem, a PP may lead to more powerful test procedures than a formal application of the closure principle (FCP). Moreover, the more complex SPP may be more powerful than the WPP. Based on a duality between testing and selecting PPs can also be applied for the construction of more powerful selection procedures. In the third part of the paper FCP, WPP and SPP are applied and compared in some examples.

Journal ArticleDOI
TL;DR: In this article, a family of tests based on Randles' concept of interdirections and the ranks of pseudo-Mahalanobis distances computed with respect to a multivariate M-estimator of scatter due to Tyler (1987), for the multivariate one-sample problem under elliptical symmetry is proposed.
Abstract: We propose a family of tests, based on Randles' (1989) concept of interdirections and the ranks of pseudo-Mahalanobis distances computed with respect to a multivariate M-estimator of scatter due to Tyler (1987), for the multivariate one-sample problem under elliptical symmetry. These tests, which generalize the univariate signed-rank tests, are affine-invariant. Depending on the score function considered (van der Waerden, Laplace,...), they allow for locally asymptotically maximin tests at selected densities (multivariate normal, multivariate double-exponential,...). Local powers and asymptotic relative efficiencies are derived--with respect to Hotelling's test, Randles' (1989) multivariate sign test, Peters and Randles' (1990) Wilcoxon-type test, and with respect to the Oja median tests. We, moreover, extend to the multivariate setting two famous univariate results: the traditional Chernoff-Savage (1958) property, showing that Hotelling's traditional procedure is uniformly dominated, in the Pitman sense, by the van der Waerden version of our tests, and the celebrated Hodges-Lehmann (1956) ".864 result," providing, for any fixed space dimension $k$, the lower bound for the asymptotic relative efficiency of Wilcoxon-type tests with respect to Hotelling's. These asymptotic results are confirmed by a Monte Carlo investigation, and application to a real data set.

Journal ArticleDOI
TL;DR: In this article, Laplace approximations for the Type I confluent hypergeometric function and the Gauss hypergeometrical function are presented, which have excellent numerical accuracy.
Abstract: In this paper we present Laplace approximations for two functions of matrix argument: the Type I confluent hypergeometric function and the Gauss hypergeometric function. Both of these functions play an important role in distribution theory in multivariate analysis, but from a practical point of view they have proved challenging, and they have acquired a reputation for being difficult to approximate. Appealing features of the approximations we present are: (i) they are fully explicit (and simple to evaluate in practice); and (ii) typically, they have excellent numerical accuracy. The excellent numerical accuracy is demonstrated in the calculation of noncentral moments of Wilks' $\Lambda$ and the likelihood ratio statistic for testing block independence, and in the calculation of the CDF of the noncentral distribution of Wilks' $\Lambda$ via a sequential saddlepoint approximation. Relative error properties of these approximations are also studied, and it is noted that the approximations have uniformly bounded relative errors in important cases.

Journal ArticleDOI
TL;DR: In this paper, a lower bound for maximal regression depth is proved in the general multidimensional case, as conjectured by Rousseeuw and Hubert and demonstrated to have an impact on the breakdown point of the maximum depth estimator.
Abstract: For a general definition of depth in data analysis a differential-like calculus is constructed in which the location case (the framework of Tukey's median) plays a fundamental role similar to that of linear functions in the mathematical analysis. As an application, a lower bound for maximal regression depth is proved in the general multidimensional case--as conjectured by Rousseeuw and Hubert and others. This lower bound is demonstrated to have an impact on the breakdown point of the maximum depth estimator.

Journal ArticleDOI
TL;DR: In this article, the authors provide a detailed characterization of the asymptotic behavior of kernel density estimators for one-sided linear processes under short-range and long-range dependence.
Abstract: In this paper we provide a detailed characterization of the asymptotic behavior of kernel density estimators for one-sided linear processes The conjecture that asymptotic normality for the kernel density estimator holds under short-range dependence is proved under minimal assumptions on bandwidths We also depict the dichotomous and trichotomous phenomena for various choices of bandwidths when the process is long-range dependent

Journal ArticleDOI
TL;DR: In this article, the authors investigated the behavior of the expected number of type I errors as a characteristic of certain multiple tests controlling the familywise error rate (FWER) or the false discovery rate (FDR) at a prespecified level.
Abstract: The performance of multiple test procedures with respect to error control is an old issue. Assuming that all hypotheses are true we investigate the behavior of the expected number of type I errors (ENE) as a characteristic of certain multiple tests controlling the familywise error rate (FWER) or the false discovery rate (FDR) at a prespecified level. We derive explicit formulas for the distribution of the number of false rejections as well as for the ENE for single-step, step-down and step-up procedures based on independent $p$-values. Moreover, we determine the corresponding asymptotic distributions of the number of false rejections as well as explicit formulae for the ENE if the number of hypotheses tends to infinity. In case of FWER-control we mostly obtain Poisson distributions and in one case a geometric distribution as limiting distributions; in case of FDR control we obtain limiting distributions which are apparently not named in the literature. Surprisingly, the ENE is bounded by a small number regardless of the number of hypotheses under consideration. Finally, it turns out that in case of dependent test statistics the ENE behaves completely differently compared to the case of independent test statistics.

Journal ArticleDOI
TL;DR: In this article, a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates is proposed, and appropriate randomization is used to select a good arm to play for a greater expected reward.
Abstract: We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Journal ArticleDOI
TL;DR: This article established the global asymptotic equivalence between the nonparametric regression with random design and the white noise under sharp smoothness conditions on an unknown regression or drift function.
Abstract: This paper establishes the global asymptotic equivalence between the nonparametric regression with random design and the white noise under sharp smoothness conditions on an unknown regression or drift function. The asymptotic equivalence is established by constructing explicit equivalence mappings between the nonparametric regression and the white-noise experiments, which provide synthetic observations and synthetic asymptotic solutions from any one of the two experiments with asymptotic properties identical to the true observations and given asymptotic solutions from the other. The impact of such asymptotic equivalence results is that an investigation in one nonparametric problem automatically yields asymptotically analogous results in all other asymptotically equivalent nonparametric problems.

Journal ArticleDOI
TL;DR: In this paper, the authors derived the limiting distribution of the urn composition under staggered entry and delayed response for adaptive clinical trials using a generalized Friedman's urn design, and showed that maximum likelihood estimators from such a trial have the usual asymptotic properties.
Abstract: For adaptive clinical trials using a generalized Friedman’s urn design, we derive the limiting distribution of the urn composition under staggered entry and delayed response. The stochastic delay mechanism is assumed to depend on both the treatment assigned and the patient’s response. A very general setup is employed with $K$ treatments and $L$ responses. When $L = K =2$, one example of a generalized Friedman’s urn design is the randomized play-the-winner rule. An application of this rule occurred in a clinical trial of depression, which had staggered entry and delayed response. We show that maximum likelihood estimators from such a trial have the usual asymptotic properties.

Journal ArticleDOI
TL;DR: In this article, it was shown that there is a screening effect for linear predictors when observations are on a regular grid and that the spectral density of the random field not decay faster than algebraically and not vary too quickly.
Abstract: When predicting the value of a stationary random field at a location x in some region in which one has a large number of observations, it may be difficult to compute the optimal predictor. One simple way to reduce the computational burden is to base the predictor only on those observations nearest to x. As long as the number of observations used in the predictor is sufficiently large, one might generally expect the best predictor based on these observations to be nearly optimal relative to the best predictor using all observations. Indeed, this phenomenon has been empirically observed in numerous circumstances and is known as the screening effect in the geostatistical literature. For linear predictors, when observations are on a regular grid, this work proves that there generally is a screening effect as the grid becomes increasingly dense. This result requires that, at high frequencies, the spectral density of the random field not decay faster than algebraically and not vary too quickly. Examples demonstrate that there may be no screening effect if these conditions on the spectral density are violated.

Journal ArticleDOI
TL;DR: The distribution of a mean or, more generally, of a vector of means of a Dirichlet process is considered in this paper, and the sharpest condition sufficient for the distribution of such a mean to be symmetric is given.
Abstract: The distribution of a mean or, more generally, of a vector of means of a Dirichlet process is considered. Some characterizing aspects of this paper are: (i) a review of a few basic results, providing new formulations free from many of the extra assumptions considered to date in the literature, and giving essentially new, simpler and more direct proofs; (ii) new numerical evaluations, with any prescribed error of approximation, of the distribution at issue; (iii) a new form for the law of a vector of means. Moreover, as applications of these results, we give: (iv) the sharpest condition sufficient for the distribution of a mean to be symmetric; (v) forms for the probability distribution of the variance of the Dirichlet random measure; (vi) some hints for determining the finite-dimensional distributions of a random function connected with Bayesian methods for queuing models.