scispace - formally typeset
Search or ask a question

Showing papers on "Central limit theorem published in 2018"


Posted Content
TL;DR: A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified.
Abstract: Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, potentially of great use in computational and applied mathematics. That said, there are few rigorous results about the representation error and trainability of neural networks, as well as how they scale with the network size. Here we characterize both the error and scaling by reinterpreting the standard optimization algorithm used in machine learning applications, stochastic gradient descent, as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number $n$ of parameters is large, the empirical distribution of the particles descends on a convex landscape towards a minimizer at a rate independent of $n$. We establish a Law of Large Numbers and a Central Limit Theorem for the empirical distribution, which together show that the approximation error of the network universally scales as $o(n^{-1})$. Remarkably, these properties do not depend on the dimensionality of the domain of the function that we seek to represent. Our analysis also quantifies the scale and nature of the noise introduced by stochastic gradient descent and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural network to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as $d=25$.

187 citations


Posted Content
TL;DR: Conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), are established and the scaling of its error with the size of the network is quantified.
Abstract: Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or "loss" function used to train the network. We show that, when the number $n$ of units is large, the empirical distribution of the particles descends on a convex landscape towards the global minimum at a rate independent of $n$, with a resulting approximation error that universally scales as $O(n^{-1})$. These properties are established in the form of a Law of Large Numbers and a Central Limit Theorem for the empirical distribution. Our analysis also quantifies the scale and nature of the noise introduced by SGD and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural networks to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as $d=25$.

144 citations


Journal ArticleDOI
TL;DR: The asymptotic distribution of empirical Wasserstein distances is derived as the optimal value of a linear programme with random objective function, which facilitates statistical inference in large generality.
Abstract: Summary The Wasserstein distance is an attractive tool for data analysis but statistical inference is hindered by the lack of distributional limits. To overcome this obstacle, for probability measures supported on finitely many points, we derive the asymptotic distribution of empirical Wasserstein distances as the optimal value of a linear programme with random objective function. This facilitates statistical inference (e.g. confidence intervals for sample-based Wasserstein distances) in large generality. Our proof is based on directional Hadamard differentiability. Failure of the classical bootstrap and alternatives are discussed. The utility of the distributional results is illustrated on two data sets.

110 citations


Posted Content
TL;DR: In this paper, the central limit theorem for neural networks with a single hidden layer was proved in the asymptotic regime of simultaneously (a) large numbers of hidden units and (b) large number of stochastic gradient descent training iterations.
Abstract: We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. The proof relies upon weak convergence methods from stochastic analysis. In particular, we prove relative compactness for the sequence of processes and uniqueness of the limiting process in a suitable Sobolev space.

106 citations


Journal ArticleDOI
TL;DR: In this paper, the authors prove a central limit theorem for the components of the eigenvectors corresponding to the largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph.
Abstract: We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and, furthermore, the mean and the covariance matrix of each row are functions of the associated vertex’s block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.

85 citations


Journal ArticleDOI
TL;DR: The Central Limit Theorem for the linear statistics of two-dimensional Coulomb gases, with arbitrary inverse temperature and general confining potential, at the macroscopic and mesoscopic scales and possibly near the boundary of the support of the equilibrium measure was proved in this paper.
Abstract: We prove a Central Limit Theorem for the linear statistics of two-dimensional Coulomb gases, with arbitrary inverse temperature and general confining potential, at the macroscopic and mesoscopic scales and possibly near the boundary of the support of the equilibrium measure. This can be stated in terms of convergence of the random electrostatic potential to a Gaussian Free Field. Our result is the first to be valid at arbitrary temperature and at the mesoscopic scales, and we recover previous results of Ameur-Hendenmalm-Makarov and Rider-Virag concerning the determinantal case, with weaker assumptions near the boundary. We also prove moderate deviations upper bounds, or rigidity estimates, for the linear statistics and a convergence result for those corresponding to energy-minimizers. The method relies on a change of variables, a perturbative expansion of the energy, and the comparison of partition functions deduced from our previous work. Near the boundary, we use recent quantitative stability estimates on the solutions to the obstacle problem obtained by Serra and the second author.

81 citations


Posted Content
TL;DR: These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors.
Abstract: Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much weaker exponential type (namely sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm - the Generalized Bernstein-Orlicz norm - that typifies such tail behaviors. We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in HD statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum k-sub-matrix operator norm which are key quantities of interest in bootstrap, HD covariance matrix estimation and HD inference. The third example concerns the restricted eigenvalue condition, required in HD linear regression, which we verify for all sub-Weibull random vectors through a unified analysis, and also prove a more general result related to restricted strong convexity in the process. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence under much weaker than usual tail assumptions (on the errors as well as the covariates), while also allowing for misspecified models and both fixed and random design. To our knowledge, these are the first such results for Lasso obtained in this generality. The common feature in all our results over all the examples is that the convergence rates under most exponential tails match the usual ones under sub-Gaussian assumptions.

81 citations


Journal ArticleDOI
TL;DR: This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met.
Abstract: Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

79 citations


Journal ArticleDOI
TL;DR: In this paper, the persistence diagram of a stationary point process was studied and the strong law of large numbers for persistence diagrams was shown to hold as the window size tends to infinity and gave a sufficient condition for the support of the limiting persistence diagram to coincide with the geometrically realizable region.
Abstract: The persistent homology of a stationary point process on $\mathbf{R}^{N}$ is studied in this paper. As a generalization of continuum percolation theory, we study higher dimensional topological features of the point process such as loops, cavities, etc. in a multiscale way. The key ingredient is the persistence diagram, which is an expression of the persistent homology. We prove the strong law of large numbers for persistence diagrams as the window size tends to infinity and give a sufficient condition for the support of the limiting persistence diagram to coincide with the geometrically realizable region. We also discuss a central limit theorem for persistent Betti numbers.

75 citations


Journal ArticleDOI
TL;DR: In research-related hypothesis testing, the term “statistically significant” is used to describe when an observed difference or association has met a certain threshold, which is denoted as alpha (&agr;) and is typically set at .05.
Abstract: Inferential statistics relies heavily on the central limit theorem and the related law of large numbers. According to the central limit theorem, regardless of the distribution of the source population, a sample estimate of that population will have a normal distribution, but only if the sample is large enough. The related law of large numbers holds that the central limit theorem is valid as random samples become large enough, usually defined as an n ≥ 30. In research-related hypothesis testing, the term "statistically significant" is used to describe when an observed difference or association has met a certain threshold. This significance threshold or cut-point is denoted as alpha (α) and is typically set at .05. When the observed P value is less than α, one rejects the null hypothesis (Ho) and accepts the alternative. Clinical significance is even more important than statistical significance, so treatment effect estimates and confidence intervals should be regularly reported. A type I error occurs when the Ho of no difference or no association is rejected, when in fact the Ho is true. A type II error occurs when the Ho is not rejected, when in fact there is a true population effect. Power is the probability of detecting a true difference, effect, or association if it truly exists. Sample size justification and power analysis are key elements of a study design. Ethical concerns arise when studies are poorly planned or underpowered. When calculating sample size for comparing groups, 4 quantities are needed: α, type II error, the difference or effect of interest, and the estimated variability of the outcome variable. Sample size increases for increasing variability and power, and for decreasing α and decreasing difference to detect. Sample size for a given relative reduction in proportions depends heavily on the proportion in the control group itself, and increases as the proportion decreases. Sample size for single-group studies estimating an unknown parameter is based on the desired precision of the estimate. Interim analyses assessing for efficacy and/or futility are great tools to save time and money, as well as allow science to progress faster, but are only 1 component considered when a decision to stop or continue a trial is made.

75 citations


Journal ArticleDOI
TL;DR: In this paper, a new toolbox for the analysis of the global behavior of stochastic discrete particle systems has been developed, based on the notion of the Schur generating function of a random discrete configuration.

Journal ArticleDOI
Abstract: We prove quenched versions of (i) a large deviations principle (LDP), (ii) a central limit theorem (CLT), and (iii) a local central limit theorem (LCLT) for non-autonomous dynamical systems. A key advance is the extension of the spectral method, commonly used in limit laws for deterministic maps, to the general random setting. We achieve this via multiplicative ergodic theory and the development of a general framework to control the regularity of Lyapunov exponents of twisted transfer operator cocycles with respect to a twist parameter. While some versions of the LDP and CLT have previously been proved with other techniques, the local central limit theorem is, to our knowledge, a completely new result, and one that demonstrates the strength of our method. Applications include non-autonomous (piecewise) expanding maps, defined by random compositions of the form T σ n−1 ω • · · · • T σω • T ω. An important aspect of our results is that we only assume ergodicity and invertibility of the random driving σ : Ω → Ω; in particular no expansivity or mixing properties are required.

Journal ArticleDOI
TL;DR: In this paper, simultaneous confidence bands are constructed for a general moment condition model with high-dimensional parameters, where the Neyman orthogonality condition is assumed to be satisfied.
Abstract: In this paper, we develop procedures to construct simultaneous confidence bands for p ˜ potentially infinite-dimensional parameters after model selection for general moment condition models where p ˜ is potentially much larger than the sample size of available data, n. This allows us to cover settings with functional response data where each of the p ˜ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for p ˜ ≫ n ). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.

Journal ArticleDOI
TL;DR: In this paper, the central limit theorem for the Euler-Poincare characteristic of excursion sets of random spherical eigenfunctions in dimension 2 was established based on a decomposition of Euler's characteristic into different Wiener-chaos components: its asymptotic behaviour is dominated by a single term corresponding to the chaotic component of order two.
Abstract: We establish here a quantitative central limit theorem (in Wasserstein distance) for the Euler–Poincare characteristic of excursion sets of random spherical eigenfunctions in dimension 2. Our proof is based upon a decomposition of the Euler–Poincare characteristic into different Wiener-chaos components: we prove that its asymptotic behaviour is dominated by a single term, corresponding to the chaotic component of order two. As a consequence, we show how the asymptotic dependence on the threshold level $u$ is fully degenerate, that is, the Euler–Poincare characteristic converges to a single random variable times a deterministic function of the threshold. This deterministic function has a zero at the origin, where the variance is thus asymptotically of smaller order. We discuss also a possible unifying framework for the Lipschitz–Killing curvatures of the excursion sets for Gaussian spherical harmonics.

Journal ArticleDOI
TL;DR: In this article, self-norming central limit theorems for non-stationary time series arising as observations on sequential maps possessing an indifferent fixed point were established by perturbing the slope in the Pomeau-Manneville map.
Abstract: We establish self-norming central limit theorems for non-stationary time series arising as observations on sequential maps possessing an indifferent fixed point. These transformations are obtained by perturbing the slope in the Pomeau–Manneville map. We also obtain quenched central limit theorems for random compositions of these maps.

Journal ArticleDOI
TL;DR: In this article, asymptotics of a domino tiling model on a class of domains which are called rectangular Aztec diamonds are considered and the Law of Large Numbers for the corresponding height functions and explicit formulas for the limit are provided.
Abstract: We consider asymptotics of a domino tiling model on a class of domains which we call rectangular Aztec diamonds. We prove the Law of Large Numbers for the corresponding height functions and provide explicit formulas for the limit. For a special class of examples, the explicit parametrization of the frozen boundary is given. It turns out to be an algebraic curve with very special properties. Moreover, we establish the convergence of the fluctuations of the height functions to the Gaussian Free Field in appropriate coordinates. Our main tool is a recently developed moment method for discrete particle systems.

Journal ArticleDOI
Xing He1, Lei Chu1, Robert C. Qiu1, Qian Ai1, Zenan Ling1 
TL;DR: This paper, based on random matrix theory, proposes a data-driven approach that models massive datasets as large random matrices; it is model-free and requires no knowledge about physical model parameters.
Abstract: Data-driven approaches, when tasked with situation awareness, are suitable for complex grids with massive datasets . It is a challenge, however, to efficiently turn these massive datasets into useful big data analytics. To address such a challenge, this paper, based on random matrix theory, proposes a data-driven approach. The approach models massive datasets as large random matrices; it is model-free and requires no knowledge about physical model parameters. In particular, the large data dimension $N$ and the large time span $T$ , from the spatial aspect and the temporal aspect, respectively, lead to favorable results. The beautiful thing lies in that these linear eigenvalue statistics (LESs) are built from data matrices to follow Gaussian distributions for very general conditions, due to the latest breakthroughs in probability on the central limit theorems of those LESs. Numerous case studies, with both simulated data and field data, are given to validate the proposed new algorithms.

01 Sep 2018
TL;DR: In this article, simultaneous confidence bands are constructed for a general moment condition model with high-dimensional parameters, where the Neyman orthogonality condition is assumed to be satisfied.
Abstract: In this paper, we develop procedures to construct simultaneous confidence bands for p ˜ potentially infinite-dimensional parameters after model selection for general moment condition models where p ˜ is potentially much larger than the sample size of available data, n. This allows us to cover settings with functional response data where each of the p ˜ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for p ˜ ≫ n ). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.

Journal ArticleDOI
TL;DR: In this paper, the vertex reinforced jump process (VRJP), the edge reinforced random walk (ERRW), and their relation to a random Schrodinger operator were studied on infinite graphs.
Abstract: his paper concerns the vertex reinforced jump process (VRJP), the edge reinforced random walk (ERRW), and their relation to a random Schrodinger operator. On infinite graphs, we define a 1-dependent random potential $ \beta $ extending that defined by Sabot, Tarres, and Zeng on finite graphs, and consider its associated random Schrodinger operator $ H_\beta $. We construct a random function $ \psi $ as a limit of martingales, such that $ \psi =0$ when the VRJP is recurrent, and $ \psi $ is a positive generalized eigenfunction of the random Schrodinger operator with eigenvalue 0, when the VRJP is transient. Then we prove a representation of the VRJP on infinite graphs as a mixture of Markov jump processes involving the function $ \psi $, the Green function of the random Schrodinger operator, and an independent Gamma random variable. On $ {\Bbb Z}^d$, we deduce from this representation a zero-one law for recurrence or transience of the VRJP and the ERRW, and a functional central limit theorem for the VRJP and the ERRW at weak reinforcement in dimension $ d\ge 3$, using estimates of Disertori, Sabot, and Tarres and of Disertori, Spencer, and Zimbauer. Finally, we deduce recurrence of the ERRW in dimension $ d=2$ for any initial constant weights (using the estimates of Merkl and Rolles), thus giving a full answer to the question raised by Diaconis. We also raise some questions on the links between recurrence/transience of the VRJP and localization/delocalization of the random Schrodinger operator $ H_\beta $.

Journal ArticleDOI
TL;DR: In this paper, a set of survey data is used to verify that central limit theorem (CLT) for different sample sizes for different data sets was consistent with the Central Limit theorem.
Abstract: It is very important to determine the proper or accurate sample size in any field of research. Sometimes researchers cannot take the decision that how many numbers of individuals or objects will they select for their study purpose. Also, a set of survey data is used to verify that central limit theorem (CLT) for different sample sizes. From the data of 1348 students, we got the average weight for our population of BRAC University students is 62.62 kg with standard deviation 11.79 kg. We observed that our sample means became better estimators of the true population mean. In addition, the shape of the distribution became more Normal as the sample size increased. So it is concluded that our simulation results were consistent with the central limit theorem.

Journal ArticleDOI
19 Apr 2018-Entropy
TL;DR: The proportion of arrangements rather than the number with a given amount of edge length provides a means to calculate unbiased relative configurational entropy, obviating the need to compute all possible configurations of a landscape lattice.
Abstract: Entropy and the second law of thermodynamics are fundamental concepts that underlie all natural processes and patterns. Recent research has shown how the entropy of a landscape mosaic can be calculated using the Boltzmann equation, with the entropy of a lattice mosaic equal to the logarithm of the number of ways a lattice with a given dimensionality and number of classes can be arranged to produce the same total amount of edge between cells of different classes. However, that work seemed to also suggest that the feasibility of applying this method to real landscapes was limited due to intractably large numbers of possible arrangements of raster cells in large landscapes. Here I extend that work by showing that: (1) the proportion of arrangements rather than the number with a given amount of edge length provides a means to calculate unbiased relative configurational entropy, obviating the need to compute all possible configurations of a landscape lattice; (2) the edge lengths of randomized landscape mosaics are normally distributed, following the central limit theorem; and (3) given this normal distribution it is possible to fit parametric probability density functions to estimate the expected proportion of randomized configurations that have any given edge length, enabling the calculation of configurational entropy on any landscape regardless of size or number of classes. I evaluate the boundary limits (4) for this normal approximation for small landscapes with a small proportion of a minority class and show it holds under all realistic landscape conditions. I further (5) demonstrate that this relationship holds for a sample of real landscapes that vary in size, patch richness, and evenness of area in each cover type, and (6) I show that the mean and standard deviation of the normally distributed edge lengths can be predicted nearly perfectly as a function of the size, patch richness and diversity of a landscape. Finally, (7) I show that the configurational entropy of a landscape is highly related to the dimensionality of the landscape, the number of cover classes, the evenness of landscape composition across classes, and landscape heterogeneity. These advances provide a means for researchers to directly estimate the frequency distribution of all possible macrostates of any observed landscape, and then directly calculate the relative configurational entropy of the observed macrostate, and to understand the ecological meaning of different amounts of configurational entropy. These advances enable scientists to take configurational entropy from a concept to an applied tool to measure and compare the disorder of real landscapes with an objective and unbiased measure based on entropy and the second law.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the asymptotic variance of estimators obtained using approximate Bayesian computation in a large-data limit can be approximated by a fixed-dimensional summary statistic that obeys a central limit theorem.
Abstract: Many statistical applications involve models for which it is difficult to evaluate the likelihood, but from which it is relatively easy to sample. Approximate Bayesian computation is a likelihood-free method for implementing Bayesian inference in such cases. We present results on the asymptotic variance of estimators obtained using approximate Bayesian computation in a large-data limit. Our key assumption is that the data is summarized by a fixed-dimensional summary statistic that obeys a central limit theorem. We prove asymptotic normality of the mean of the approximate Bayesian computation posterior. This result also shows that, in terms of asymptotic variance, we should use a summary statistic that is the same dimension as the parameter vector, p; and that any summary statistic of higher dimension can be reduced, through a linear transformation, to dimension p in a way that can only reduce the asymptotic variance of the posterior mean. We look at how the Monte Carlo error of an importance sampling algorithm that samples from the approximate Bayesian computation posterior affects the accuracy of estimators. We give conditions on the importance sampling proposal distribution such that the variance of the estimator will be the same order as that of the maximum likelihood estimator based on the summary statistics used. This suggests an iterative importance sampling algorithm, which we evaluate empirically on a stochastic volatility model.

Journal ArticleDOI
TL;DR: For sums of independent random variables, this article derived Berry-Esseen-type bounds for the power transport distances in terms of Lyapunov coefficients for identically distributed summands under Cramer's condition.
Abstract: For sums of independent random variables $$S_n = X_1 + \cdots + X_n$$ , Berry–Esseen-type bounds are derived for the power transport distances $$W_p$$ in terms of Lyapunov coefficients $$L_{p+2}$$ . In the case of identically distributed summands, the rates of convergence are refined under Cramer’s condition.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the probability distribution of the total displacement of an N-step run and tumble particle on a line, in the presence of a constant nonzero drive.
Abstract: We study the probability distribution $P(X_N=X,N)$ of the total displacement $X_N$ of an $N$-step run and tumble particle on a line, in presence of a constant nonzero drive $E$. While the central limit theorem predicts a standard Gaussian form for $P(X,N)$ near its peak, we show that for large positive and negative $X$, the distribution exhibits anomalous large deviation forms. For large positive $X$, the associated rate function is nonanalytic at a critical value of the scaled distance from the peak where its first derivative is discontinuous. This signals a first-order dynamical phase transition from a homogeneous `fluid' phase to a `condensed' phase that is dominated by a single large run. A similar first-order transition occurs for negative large fluctuations as well. Numerical simulations are in excellent agreement with our analytical predictions.

Journal ArticleDOI
TL;DR: In this article, a functional central limit theorem for stationary Hawkes processes in the asymptotic regime where the baseline intensity is large is proved. And the authors use the resulting approximation to study an infinite-server queue with high-volume Hawkes traffic, for which they compute explicitly the covariance function and the steady-state distribution.
Abstract: A univariate Hawkes process is a simple point process that is self-exciting and has a clustering effect. The intensity of this point process is given by the sum of a baseline intensity and another term that depends on the entire past history of the point process. Hawkes processes have wide applications in finance, neuroscience, social networks, criminology, seismology, and many other fields. In this paper, we prove a functional central limit theorem for stationary Hawkes processes in the asymptotic regime where the baseline intensity is large. The limit is a non-Markovian Gaussian process with dependent increments. We use the resulting approximation to study an infinite-server queue with high-volume Hawkes traffic. We show that the queue length process can be approximated by a Gaussian process, for which we compute explicitly the covariance function and the steady-state distribution. We also extend our results to multivariate stationary Hawkes processes and establish limit theorems for infinite-server queues with multivariate Hawkes traffic.

Journal ArticleDOI
TL;DR: In this paper, a class of multivariate spectral variance estimators for the asymptotic covariance matrix in the Markov chain central limit theorem and conditions for strong consistency are provided.
Abstract: Markov chain Monte Carlo (MCMC) algorithms are used to estimate features of interest of a distribution. The Monte Carlo error in estimation has an asymptotic normal distribution whose multivariate nature has so far been ignored in the MCMC community. We present a class of multivariate spectral variance estimators for the asymptotic covariance matrix in the Markov chain central limit theorem and provide conditions for strong consistency. We examine the finite sample properties of the multivariate spectral variance estimators and its eigenvalues in the context of a vector autoregressive process of order 1.

Journal ArticleDOI
TL;DR: In this paper, the authors study the fluctuations of certain biorthogonal ensembles for which the underlying family { P, Q } satisfies a finite-term recurrence relation of the form x P (x ) = J P(x ).

Posted Content
TL;DR: In this article, the authors propose a local linear regression adjustment to better capture smoothness in random forests, and prove a central limit theorem valid under regularity conditions on the forest and smoothness constraints, and propose a computationally efficient construction for confidence intervals.
Abstract: Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence for random forests with smooth signals, and provides substantial gains in accuracy on both real and simulated data. We prove a central limit theorem valid under regularity conditions on the forest and smoothness constraints, and propose a computationally efficient construction for confidence intervals. Moving to a causal inference application, we discuss the merits of local regression adjustments for heterogeneous treatment effect estimation, and give an example on a dataset exploring the effect word choice has on attitudes to the social safety net. Last, we include simulation results on real and generated data.

Journal ArticleDOI
TL;DR: In this article, the authors studied mesoscopic linear statistics for a class of determinantal point processes which interpolate between Poisson and random matrix statistics, by modifying the spectrum of the correlation kernel of the Gaussian Unitary Ensemble eigenvalue process.
Abstract: We study mesoscopic linear statistics for a class of determinantal point processes which interpolate between Poisson and random matrix statistics. These processes are obtained by modifying the spectrum of the correlation kernel of the Gaussian Unitary Ensemble (GUE) eigenvalue process. An example of such a system comes from considering the distribution of noncolliding Brownian motions in a cylindrical geometry, or a grand canonical ensemble of free fermions in a quadratic well at positive temperature. When the scale of the modification of the spectrum of the correlation kernel, related to the size of the cylinder or the temperature, is different from the scale in the mesoscopic linear statistic, we obtain a central limit theorem (CLT) of either Poisson or GUE type. On the other hand, in the critical regime where the scales are the same, we observe a non-Gaussian process in the limit. Its distribution is characterized by explicit but complicated formulae for the cumulants of smooth linear statistics. These results rely on an asymptotic sine-kernel approximation of the GUE kernel which is valid at all mesoscopic scales, and a generalization of cumulant computations of Soshnikov for the sine process. Analogous determinantal processes on the circle are also considered with similar results.

Posted Content
TL;DR: In this article, the authors introduced a new method for obtaining quantitative convergence rates for the central limit theorem (CLT) in a high dimensional setting and obtained several new bounds for convergence in transportation distance and entropy.
Abstract: We introduce a new method for obtaining quantitative convergence rates for the central limit theorem (CLT) in a high dimensional setting. Using our method, we obtain several new bounds for convergence in transportation distance and entropy, and in particular: (a) We improve the best known bound, obtained by the third named author, for convergence in quadratic Wasserstein transportation distance for bounded random vectors; (b) We derive the first non-asymptotic convergence rate for the entropic CLT in arbitrary dimension, for general log-concave random vectors; (c) We give an improved bound for convergence in transportation distance under a log-concavity assumption and improvements for both metrics under the assumption of strong log-concavity. Our method is based on martingale embeddings and specifically on the Skorokhod embedding constructed by the first named author.