scispace - formally typeset
Search or ask a question

Showing papers on "Mathematical statistics published in 2008"


Book
25 Aug 2008
TL;DR: In this paper, a short excursion into Matrix Algebra Moving to Higher Dimensions Multivariate Distributions Theory of the Multinormal Theory of Estimation Hypothesis Testing is described. But it is not discussed in detail.
Abstract: I Descriptive Techniques: Comparison of Batches.- II Multivariate Random Variables: A Short Excursion into Matrix Algebra Moving to Higher Dimensions Multivariate Distributions Theory of the Multinormal Theory of Estimation Hypothesis Testing.- III Multivariate Techniques: Decomposition of Data Matrices by Factors Principal Components Analysis Factor Analysis Cluster Analysis Discriminant Analysis.- Correspondence Analysis.- Canonical Correlation Analysis.- Multidimensional Scaling.- Conjoint Measurement Analysis.- Application in Finance.- Computationally Intensive Techniques.- A: Symbols and Notations.- B: Data.- Bibliography.- Index.

1,081 citations


Book
12 Aug 2008
TL;DR: In this paper, a collection of Inequalities in Probability, Linear Algebra, and Analysis is presented. But they focus mainly on two-sample problems: Chi-square Tests for Goodness of Fit and Goodness-of-Fit with estimated parameters.
Abstract: Basic Convergence Concepts and Theorems.- Metrics, Information Theory, Convergence, and Poisson Approximations.- More General Weak and Strong Laws and the Delta Theorem.- Transformations.- More General Central Limit Theorems.- Moment Convergence and Uniform Integrability.- Sample Percentiles and Order Statistics.- Sample Extremes.- Central Limit Theorems for Dependent Sequences.- Central Limit Theorem for Markov Chains.- Accuracy of Central Limit Theorems.- Invariance Principles.- Edgeworth Expansions and Cumulants.- Saddlepoint Approximations.- U-statistics.- Maximum Likelihood Estimates.- M Estimates.- The Trimmed Mean.- Multivariate Location Parameter and Multivariate Medians.- Bayes Procedures and Posterior Distributions.- Testing Problems.- Asymptotic Efficiency in Testing.- Some General Large-Deviation Results.- Classical Nonparametrics.- Two-Sample Problems.- Goodness of Fit.- Chi-square Tests for Goodness of Fit.- Goodness of Fit with Estimated Parameters.- The Bootstrap.- Jackknife.- Permutation Tests.- Density Estimation.- Mixture Models and Nonparametric Deconvolution.- High-Dimensional Inference and False Discovery.- A Collection of Inequalities in Probability, Linear Algebra, and Analysis.

738 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: In this article, the authors consider the least-square linear regression problem with regularization by the l1-norm, a problem usually referred to as the Lasso, and present a detailed asymptotic analysis of model consistency.
Abstract: We consider the least-square linear regression problem with regularization by the l1-norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository.

429 citations


Journal ArticleDOI
TL;DR: A general approach for establishing identifiability utilizing algebraic arguments is demonstrated, which sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters.
Abstract: While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

416 citations


Book
16 Sep 2008
TL;DR: Strict Minimum Message Length as a Descriptive Theory and Quadratic Approximations to SMML are presented.
Abstract: Inductive Inference.- Information.- Strict Minimum Message Length (SMML).- Approximations to SMML.- MML: Quadratic Approximations to SMML.- MML Details in Some Interesting Cases.- Structural Models.- The Feathers on the Arrow of Time.- MML as a Descriptive Theory.- Related Work.

323 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider two models, Poisson and Binomial, for the training samples, and show that the risk of misclassification is asymptotically equivalent to first order.
Abstract: The $k$th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of $k$; and by the absence of techniques for empirical choice of $k$. In the present paper we detail the way in which the value of $k$ determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of $k$.

276 citations


Posted Content
TL;DR: In this paper, two iterative procedures are proposed to jointly estimate the slope parameters and the stochastic trends, and the resulting estimators are referred to respectively as CupBC (continuously updated and bias-corrected) and CupFM (continuous-updated and fully modified) estimators.
Abstract: This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting estimators are referred to respectively as CupBC (continuously-updated and bias-corrected) and the CupFM (continuously-updated and fully-modified) estimators. We establish their consistency and derive their limiting distributions. Both are asymptotically unbiased and asymptotically mixed normal and permit inference to be conducted using standard test statistics. The estimators are also valid when there are mixed stationary and non-stationary factors, as well as when the factors are all stationary.

229 citations


Book
04 Aug 2008
TL;DR: The study of random variables and their properties, regression and multivariate analysis, and simulation techniques for design led to the development of Bayesian decision methods and parameter uncertainty.
Abstract: Preface . Introduction. Preliminary data analysis. Basic probability concepts. Random variables and their properties. Probability distributions. Model estimation and testing. Methods of regression and multivariate analysis. Frequency analysis of extreme events. Simulation techniques for design. Risk and reliability analysis. Bayesian decision methods and parameter uncertainty. Appendixes Further mathematics Glossary of symbols Tables of selected distributions Brief answers to selected problems . Data lists. Index

229 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider two models, Poisson and Binomial, for the training samples, and show that the risk of misclassification is asymptotically equivalent to first order.
Abstract: The kth-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of k; and by the absence of techniques for empirical choice of k. In the present paper we detail the way in which the value of k determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are “assigned” to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of k.

212 citations


Journal ArticleDOI
TL;DR: In this article, a figure is presented that shows properties that individual distributions possess and many of the relationships between these distributions, as well as the properties and relationships between probability distributions in introductory mathematical statistics textbooks.
Abstract: Probability distributions are traditionally treated separately in introductory mathematical statistics textbooks. A figure is presented here that shows properties that individual distributions possess and many of the relationships between these distributions.

169 citations


Book
30 Jun 2008
TL;DR: In this article, the authors present a unified and systematic exposition of the large deviation theory for heavy-tailed random walks with regular varying, sub-and semiexponential jump distributions.
Abstract: This book focuses on the asymptotic behaviour of the probabilities of large deviations of the trajectories of random walks with 'heavy-tailed' (in particular, regularly varying, sub- and semiexponential) jump distributions. Large deviation probabilities are of great interest in numerous applied areas, typical examples being ruin probabilities in risk theory, error probabilities in mathematical statistics, and buffer-overflow probabilities in queueing theory. The classical large deviation theory, developed for distributions decaying exponentially fast (or even faster) at infinity, mostly uses analytical methods. If the fast decay condition fails, which is the case in many important applied problems, then direct probabilistic methods usually prove to be efficient. This monograph presents a unified and systematic exposition of the large deviation theory for heavy-tailed random walks. Most of the results presented in the book are appearing in a monograph for the first time. Many of them were obtained by the authors.

Posted Content
TL;DR: Design and Analysis of Simulation Experiments (DASE) as mentioned in this paper is a reference work for discrete-event simulation, focusing on statistical methods for discrete event simulation (such as queuing and inventory simulations).
Abstract: Design and Analysis of Simulation Experiments (DASE)focuses on statistical methods for discrete-event simulation (such as queuing and inventory simulations). In addition, the book discusses DASE for deterministic simulation (such as engineering and physics simulations). The text presents both classic and modern statistical designs. Classic designs (e.g., fractional factorials) assume only a few factors with a few values per factor. The resulting input/output data of the simulation experiment are analyzed through low-order polynomials, which are linear regression (meta) models. Modern designs allow many more factors, possible with many values per factor. These designs include group screening (e.g., Sequential Bifurcation, SB) and space filling designs (e.g., Latin Hypercube Sampling, LHS). The data resulting from these modern designs may be analyzed through low-order polynomials for group screening and various metamodel types (e.g., Kriging) for LHS. Design and Analysis of Simulation Experimentsis an authoritative textbook and reference work for researchers, graduate students, and technical practitioners in simulation. Basic knowledge of simulation and mathematical statistics are expected; however, the book does summarize these basics, for the readers' convenience. In addition, the book provides relatively simple solutions for (a) selecting problems to simulate, (b) how to analyze the resulting data from simulation, and (c) computationally challenging simulation problems.

Journal ArticleDOI
TL;DR: It is shown that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1.
Abstract: Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.

Journal ArticleDOI
TL;DR: In this paper, the conditional probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family, when the two samples are independent.
Abstract: We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting.

BookDOI
01 Jan 2008
TL;DR: A comparison of Models, Reduction by and Large Sample Approximations of Models and Decisions reveals large sample approximations in Models with Monotonicity Properties and Statistical Decision Theory.
Abstract: Statistical Models- Tests in Models with Monotonicity Properties- Statistical Decision Theory- Comparison of Models, Reduction by- Invariant Statistical Decision Models- Large Sample Approximations of Models and Decisions- Estimation- Testing- Selection

Journal ArticleDOI
TL;DR: In this paper, a multiscale test statistic based on local order statistics and spacings is introduced that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate.
Abstract: We introduce a multiscale test statistic based on local order statistics and spacings that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate. The procedure provides guaranteed finite-sample significance levels, is easy to implement and possesses certain asymptotic optimality and adaptivity properties.

Journal Article
TL;DR: This study discusses how to assess practical significance of DIF at both item and test levels and reviews three methods of establishing a common metric: the equal-mean-difficulty method, the all-other-item method, and the constant-item (CI) method.
Abstract: This study addresses several important issues in assessment of differential item functioning (DIF). It starts with the definition of DIF, effectiveness of using item fit statistics to detect DIF, and linear modeling of DIF in dichotomous items, polytomous items, facets, and testlet-based items. Because a common metric over groups of test-takers is a prerequisite in DIF assessment, this study reviews three such methods of establishing a common metric: the equal-mean-difficulty method, the all-other-item method, and the constant-item (CI) method. A small simulation demonstrates the superiority of the CI method over the others. As the CI method relies on a correct specification of DIF-free items to serve as anchors, a method of identifying such items is recommended and its effectiveness is illustrated through a simulation. Finally, this study discusses how to assess practical significance of DIF at both item and test levels.

Journal ArticleDOI
TL;DR: A general theorem is presented on the rate of contraction of the resulting posterior distribution as n, which gives conditions under which the rates depend in a complicated way on the priors, but also that the rate is fairly robust to specification of the prior weights.
Abstract: We consider nonparametric Bayesian estimation of a probability density $p$ based on a random sample of size $n$ from this density using a hierarchical prior. The prior consists, for instance, of prior weights on the regularity of the unknown density combined with priors that are appropriate given that the density has this regularity. More generally, the hierarchy consists of prior weights on an abstract model index and a prior on a density model for each model index. We present a general theorem on the rate of contraction of the resulting posterior distribution as $n\to \infty$, which gives conditions under which the rate of contraction is the one attached to the model that best approximates the true density of the observations. This shows that, for instance, the posterior distribution can adapt to the smoothness of the underlying density. We also study the posterior distribution of the model index, and find that under the same conditions the posterior distribution gives negligible weight to models that are bigger than the optimal one, and thus selects the optimal model or smaller models that also approximate the true density well. We apply these result to log spline density models, where we show that the prior weights on the regularity index interact with the priors on the models, making the exact rates depend in a complicated way on the priors, but also that the rate is fairly robust to specification of the prior weights.

Book ChapterDOI
TL;DR: In this paper, the authors present certain recent methodologies and some new results for the statistical analysis of probability distributions on manifolds, including the 2-D shape space of k-ads, comprising all configurations of k planar landmarks.
Abstract: This article presents certain recent methodologies and some new results for the statistical analysis of probability distributions on manifolds An important example considered in some detail here is the 2-D shape space of k-ads, comprising all configurations of k planar landmarks (k > 2)-modulo translation, scaling and rotation

BookDOI
01 Sep 2008
TL;DR: This book gathers contributions to the 4th International Conference on Soft methods in Probability and Statistics to establish a dialogue between fuzzy random variables and imprecise probability theories.
Abstract: Probability theory has been the only well-founded theory of uncertainty for a long time. It was viewed either as a powerful tool for modelling random phenomena, or as a rational approach to the notion of degree of belief. During the last thirty years, in areas centered around decision theory, artificial intelligence and information processing, numerous approaches extending or orthogonal to the existing theory of probability and mathematical statistics have come to the front. The common feature of those attempts is to allow for softer or wider frameworks for taking into account the incompleteness or imprecision of information. Many of these approaches come down to blending interval or fuzzy interval analysis with probabilistic methods. This book gathers contributions to the 4th International Conference on Soft methods in Probability and Statistics. Its aim is to present recent results illustrating such new trends that enlarge the statistical and uncertainty modeling traditions, towards the handling of incomplete or subjective information. It covers a broad scope ranging from philosophical and mathematical underpinnings of new uncertainty theories, with a stress on their impact in the area of statistics and data analysis, to numerical methods and applications to environmental risk analysis and mechanical engineering. A unique feature of this collection is to establish a dialogue between fuzzy random variables and imprecise probability theories.

Posted Content
TL;DR: New adaptive proposal strategies for sequential Monte Carlo algorithms—also known as particle filters—relying on criteria evaluating the quality of the proposed particles are discussed, establishing an empirical estimate of linear complexity of the Kullback-Leibler divergence between the involved distributions.
Abstract: In this paper we discuss new adaptive proposal strategies for sequential Monte Carlo algorithms--also known as particle filters--relying on criteria evaluating the quality of the proposed particles. The choice of the proposal distribution is a major concern and can dramatically influence the quality of the estimates. Thus, we show how the long-used coefficient of variation of the weights can be used for estimating the chi-square distance between the target and instrumental distributions of the auxiliary particle filter. As a by-product of this analysis we obtain an auxiliary adjustment multiplier weight type for which this chi-square distance is minimal. Moreover, we establish an empirical estimate of linear complexity of the Kullback-Leibler divergence between the involved distributions. Guided by these results, we discuss adaptive designing of the particle filter proposal distribution and illustrate the methods on a numerical example.

Journal ArticleDOI
TL;DR: In this paper, the authors give an overview of two approaches to probability theory where lower and upper probabilities, rather than probabilities, are used: Walley's behavioural theory of imprecise probabilities, and Shafer and Vovk's game-theoretic account of probability.

Journal ArticleDOI
TL;DR: The authors generalizes the random effects model to a conditional dependence model which allows dependence between null hypotheses and shows that the dependence can be useful to characterize the spatial structure of the null hypotheses.
Abstract: A popular framework for false discovery control is the random effects model in which the null hypotheses are assumed to be independent. This paper generalizes the random effects model to a conditional dependence model which allows dependence between null hypotheses. The dependence can be useful to characterize the spatial structure of the null hypotheses. Asymptotic properties of false discovery proportions and numbers of rejected hypotheses are explored and a large-sample distributional theory is obtained.

Journal ArticleDOI
TL;DR: In this article, lower bounds for the convergence rate of posterior distributions associated with Gaussian process priors were obtained in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior.
Abstract: Upper bounds for rates of convergence of posterior distributions associated to Gaussian process priors are obtained by van der Vaart and van Zanten in [14] and expressed in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower-bound counterparts are obtained. As a corollary, we obtain the precise rate of convergence of posteriors for Gaussian priors in various settings. Additionally, we extend the upper-bound results of [14] about Riemann-Liouville priors to a continuous family of parameters.

Book
01 Jan 2008
TL;DR: The main motivation for this book lies in the breadth of applications in which a statistical model is used to represent small departures from, for example, a Poisson process.
Abstract: The main motivation for this book lies in the breadth of applications in which a statistical model is used to represent small departures from, for example, a Poisson process. Our approach uses information geometry to provide a common context but we need only rather elementary material from differential geometry, information theory and mathematical statistics. Introductory sections serve together to help those interested from the applications side in making use of our methods and results. Reported in this monograph is a body of results, and computer-algebraic methods that seem to have quite general applicability to statistical models admitting representation through parametric families of probability density functions. Some illustrations are given from a variety of contexts for geometric characterization of statistical states near to the three important standard basic reference states: (Poisson) randomness, uniformity, independence. The individual applications are somewhat heuristic models from various fields and we incline more to terminology and notation from the applications rather than from formal statistics. However, a common thread is a geometrical representation for statistical perturbations of the basic standard states, and hence results gain qualitative stability. Moreover, the geometry is controlled by a metric structure that owes its heritage through maximum likelihood to information theory so the quantitative features---lengths of curves, geodesics, scalar curvatures etc.---have some respectable authority. We see in the applications simple models for galactic void distributions and galaxy clustering, amino acid clustering along protein chains, cryptographic protection, stochastic fibre networks, coupled geometric features in hydrology and quantum chaotic behaviour.

Journal ArticleDOI
TL;DR: A method for assigning codes in an uplink of a synchronous code division multiple access (CDMA) telecommunication system is disclosed.
Abstract: We investigate the estimation of the extreme value index when the data are subject to random censorship. We prove, in a unified way, detailed asymptotic normality results for various estimators of the extreme value index and use these estimators as the main building block for estimators of extreme quantiles. We illustrate the quality of these methods by a small simulation study and apply the estimators to medical data.

BookDOI
11 Apr 2008
TL;DR: This comprehensive text presents extensive treatments of data analysis using parametric and nonparametric techniques and effectively links statistical concepts with R procedures, enabling the application of the language to the vast world of statistics.
Abstract: Designed for an intermediate undergraduate course, Probability and Statistics with R shows students how to solve various statistical problems using both parametric and nonparametric techniques via the open source software R. It provides numerous real-world examples, carefully explained proofs, end-of-chapter problems, and illuminating graphs to facilitate hands-on learning. Integrating theory with practice, the text briefly introduces the syntax, structures, and functions of the S language, before covering important graphically and numerically descriptive methods. The next several chapters elucidate probability and random variables topics, including univariate and multivariate distributions. After exploring sampling distributions, the authors discuss point estimation, confidence intervals, hypothesis testing, and a wide range of nonparametric methods. With a focus on experimental design, the book also presents fixed- and random-effects models as well as randomized block and two-factor factorial designs. The final chapter describes simple and multiple regression analyses. Demonstrating that R can be used as a powerful teaching aid, this comprehensive text presents extensive treatments of data analysis using parametric and nonparametric techniques. It effectively links statistical concepts with R procedures, enabling the application of the language to the vast world of statistics.

Posted Content
TL;DR: In this paper, a penalized likelihood method for estimating the mixing distribution was proposed and extensive simulations were conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators.
Abstract: Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based methods, which may have nice practical properties, are inconsistent. In this paper, we recommend a penalized likelihood method for estimating the mixing distribution. We show that the maximum penalized likelihood estimator is strongly consistent when the number of components has a known upper bound. We also explore a convenient EM-algorithm for computing the maximum penalized likelihood estimator. Extensive simulations are conducted to explore the effectiveness and the practical limitations of both the new method and the ratified maximum likelihood estimators. Guidelines are provided based on the simulation results.

Journal ArticleDOI
TL;DR: There are tantalizing similarities between the Dantzig Selector (DS) and the LARS methods, but they are not the same and produce somewhat different models.
Abstract: Discussion of ``The Dantzig selector: Statistical estimation when $p$ is much larger than $n$'' [math/0506081]

Journal ArticleDOI
TL;DR: In this paper, the authors developed the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities.
Abstract: Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors.