scispace - formally typeset
Search or ask a question

Showing papers on "Sampling distribution published in 2023"


Journal ArticleDOI
TL;DR: The Global Likelihood Sampler (GLS) as discussed by the authors uses the GL bootstrap to assess the Monte Carlo error and shows that the empirical cumulative distribution function of the samples uniformly converges to the target distribution under some conditions.
Abstract: Drawing samples from a target distribution is essential for statistical computations when the analytical solution is infeasible. Many existing sampling methods may be easy to fall into the local mode or strongly depend on the proposal distribution when the target distribution is complicated. In this article, the Global Likelihood Sampler (GLS) is proposed to tackle these problems and the GL bootstrap is used to assess the Monte Carlo error. GLS takes the advantage of the randomly shifted low-discrepancy point set to sufficiently explore the structure of the target distribution. It is efficient for multimodal and high-dimensional distributions and easy to implement. It is shown that the empirical cumulative distribution function of the samples uniformly converges to the target distribution under some conditions. The convergence for the approximate sampling distribution of the sample mean based on the GL bootstrap is also obtained. Moreover, numerical experiments and a real application are conducted to show the effectiveness, robustness, and speediness of GLS compared with some common methods. It illustrates that GLS can be a competitive alternative to existing sampling methods. Supplementary materials for this article are available online.

3 citations


Journal ArticleDOI
04 Jul 2023-Minerals
TL;DR: In this paper , the authors proposed an integrated theory of sampling for the mineral industries, which brought together means of estimation of the variance of sampling due to both the particulate nature of a broken ore and the time variation of composition of process streams.
Abstract: The concepts currently applied to the sampling of broken minerals and particulate materials were recognised over 100 years ago. The first integrated theory of sampling for the mineral industries was proposed by Gy, who brought together means of estimation of the variance of sampling due to both the particulate nature of a broken ore and the time variation of composition of process streams. However, his theory deals only with sampling variance and cannot determine the full sampling distribution. The full distribution of the composition of potential samples arising from their particulate nature is developed in this paper based solely on the assumption that particle numbers in correct samples follow independent Poisson distributions. The enabling mathematical method is the use of the characteristic function of the sampling distribution. The practical means of calculating the full distribution completes the theory of sampling of particulate materials. The method is applied to a sample of gold ore to calculate the full sampling distribution and effectively clarifies several difficulties when sampling this material. The method has general applicability to all particulate materials, including foodstuffs.

1 citations


Book ChapterDOI
14 Feb 2023
TL;DR: In this paper , the authors focus on making inferences regarding a population from a sample, distinguishing between point estimates and interval estimates, and highlight the importance of sampling error, and introduce the concept of a sampling distribution and the notion of standard error.
Abstract: This chapter focuses on making inferences regarding a population from a sample. It begins by distinguishing between point estimates and interval estimates and highlights the importance of sampling error. Next, it introduces the concept of a sampling distribution and the notion of standard error. It then considers several issues associated with the construction and interpretation of confidence intervals. Finally, it provides illustrations involving different population parameters and different sampling distributions.

Posted ContentDOI
05 Jun 2023
TL;DR: This paper proposed a principled approach for sampling from language models with gradient-based methods based on Hamiltonian Monte Carlo to generate fluent and diverse samples while following the control targets significantly better than other methods.
Abstract: Recently, there has been a growing interest in the development of gradient-based sampling algorithms for text generation, especially in the context of controlled generation. However, there exists a lack of theoretically grounded and principled approaches for this task. In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods. We use discrete distributions given by language models to define densities and develop an algorithm based on Hamiltonian Monte Carlo to sample from them. We name our gradient-based technique Structured Voronoi Sampling (SVS). In an experimental setup where the reference distribution is known, we show that the empirical distribution of SVS samples is closer to the reference distribution compared to alternative sampling schemes. Furthermore, in a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods.

Journal ArticleDOI
TL;DR: In this paper , the acceptance sampling plan under the neutrosophic statistical interval method (ASP-NSIM) based on gamma distribution (GD), Burr type XII distribution (BXIID) and the Birnbaum-Saunders distribution (BSD).
Abstract: This research work appertains to the acceptance sampling plan under the neutrosophic statistical interval method (ASP-NSIM) based on gamma distribution (GD), Burr type XII distribution (BXIID) and the Birnbaum-Saunders distribution (BSD). The plan parameters will be determined using the neutrosophic non-linear optimization problem. We will provide numerous tables for the three distributions using various values of shape parameters and degree of indeterminacy. The efficiency of the proposed ASP-NSIM will be discussed over the existing sampling plan in terms of sample size. The application of the proposed ASP-NSIM will be given with the aid of industrial data.

Journal ArticleDOI
TL;DR: In this paper , the single-regression estimator converges to a generalized χ2 distribution, which is well approximated by a Γ distribution, under the null hypothesis of vanishing Granger causality, and this holds too for Geweke's spectral causality averaged over a given frequency band.
Abstract: Summary The single-regression Granger–Geweke causality estimator has previously been shown to solve known problems associated with the more conventional likelihood ratio estimator; however, its sampling distribution has remained unknown. We show that, under the null hypothesis of vanishing Granger causality, the single-regression estimator converges to a generalized χ2 distribution, which is well approximated by a Γ distribution. We show that this holds too for Geweke’s spectral causality averaged over a given frequency band, and derive explicit expressions for the generalized χ2 and Γ-approximation parameters in both cases. We present a Neyman–Pearson test based on the single-regression estimators, and discuss how it may be deployed in empirical scenarios. We outline how our analysis may be extended to the conditional case, point-frequency spectral Granger causality and the important case of state-space Granger causality.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a score-matching-based attack method to perform adversarial sample attacks by manipulating the probability distribution of the samples, which showed good transferability in the face of different datasets and models and provided reasonable explanations from the perspective of mathematical theory and feature space.
Abstract: In recent years, with the rapid development of technology, artificial intelligence (AI) security issues represented by adversarial sample attack have aroused widespread concern in society. Adversarial samples are often generated by surrogate models and then transfer to attack the target model, and most AI models in real-world scenarios belong to black boxes; thus, transferability becomes a key factor to measure the quality of adversarial samples. The traditional method relies on the decision boundary of the classifier and takes the boundary crossing as the only judgment metric without considering the probability distribution of the sample itself, which results in an irregular way of adding perturbations to the adversarial sample, an unclear path of generation, and a lack of transferability and interpretability. In the probabilistic generative model, after learning the probability distribution of the samples, a random term can be added to the sampling to gradually transform the noise into a new independent and identically distributed sample. Inspired by this idea, we believe that by removing the random term, the adversarial sample generation process can be regarded as the static sampling of the probabilistic generative model, which guides the adversarial samples out of the original probability distribution and into the target probability distribution and helps to boost transferability and interpretability. Therefore, we proposed a score-matching-based attack (SMBA) method to perform adversarial sample attacks by manipulating the probability distribution of the samples, which showed good transferability in the face of different datasets and models and provided reasonable explanations from the perspective of mathematical theory and feature space. Compared with the current best methods based on the decision boundary of the classifier, our method increased the attack success rate by 51.36% and 30.54% to the maximum extent in non-targeted and targeted attack scenarios, respectively. In conclusion, our research established a bridge between probabilistic generative models and adversarial samples, provided a new entry angle for the study of adversarial samples, and brought new thinking to AI security.

Journal ArticleDOI
13 Jan 2023-Stat
TL;DR: In this article , the authors proposed a new type of incomplete U-statistics called ICUDO, which needs substantially less time of computing than all existing methods, and studied the asymptotic distributions of ICudO to facilitate the corresponding statistical inference.
Abstract: The U-statistic has been an important part of the arsenal of statistical tools. Meanwhile, the computation of it could easily become expensive. As a remedy, the idea of incomplete U-statistics has been adopted in practice, where only a small fraction of combinations of units are evaluated. Recently, researchers proposed a new type of incomplete U-statistics called ICUDO, which needs substantially less time of computing than all existing methods. This paper aims to study the asymptotic distributions of ICUDO to facilitate the corresponding statistical inference. This is a non-trivial task due to the restricted randomization in the sampling scheme of ICUDO. The bootstrap approach for the finite sample distribution of ICUDO is also discussed. Lastly, we observe some intrinsic connections between U-statistics and computer experiments in the context of integration approximation. This allows us to generalize some existing theoretical results in the latter topic.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. They showed that the generated samples are very close to the true conditional distribution in terms of total variation distance.
Abstract: The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known under the null hypothesis and then it is compared to the distribution of the observed samples of the original data. The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. Specifically, we utilize the computationally efficient 1-nearest-neighbor to approximate the conditional distribution that encodes the null hypothesis. Then, theoretically, we show that the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance. Furthermore, we take the classifier-based conditional mutual information estimator as our test statistic. The test statistic as an empirical fundamental information theoretic quantity is able to well capture the conditional-dependence feature. We show that our proposed test is computationally very fast, while controlling type I and II errors quite well. Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.

Book ChapterDOI
Warwick Stent1
13 Apr 2023
TL;DR: In this article , the central limit theorem is applied to the distribution of the sample mean of a dataset and applied to this type of dataset can allow researchers to make predictions about future datasets as well as to better understand current datasets.
Abstract: Sampling of a distribution and as well as how to treat the sampling distribution of the sample mean is an important topic for understanding statistics. The central limit theorem, when applied to this type of dataset can allow researchers to make predictions about future datasets as well as to better understand current datasets. Normally distributed populations are important to correctly treating the data occurring within the dataset.

Posted ContentDOI
11 May 2023
TL;DR: In this article , a multi-dimensional supercritical branching process with offspring distribution in a parametric family is considered, where each vector coordinate corresponds to the number of offspring of a given type, and the sampling distribution of the observed sizes converges to the product of identical distributions.
Abstract: Consider a multi-dimensional supercritical branching process with offspring distribution in a parametric family. Here, each vector coordinate corresponds to the number of offspring of a given type. The process is observed under family-size sampling: a random sample is drawn, each individual reporting its vector of brood sizes. In this work, we show that the set in which no siblings are sampled (so that the sample can be considered independent) has probability converging to one under certain conditions on the sampling size. Furthermore, we show that the sampling distribution of the observed sizes converges to the product of identical distributions, hence developing a framework for which the process can be considered iid, and the usual methods for parameter estimation apply. We provide asymptotic distributions for the resulting estimators.

Journal ArticleDOI
TL;DR: In this paper , the authors describe how to use simulation in R-programming language to perform a chi-square test and show the distribution of most commonly used Chi-square statistics found in statistical methods in both derivation and simulation.
Abstract: Computer simulation has become an important tool in teaching statistics. Teaching using computer simulation would enhance the understanding of the concept using visual illustrations. This paper describes how to use simulation in R-programming language to perform a chi-square test. We try to show the distribution of most commonly used chi-square statistics we often found in statistical methods in both derivation and simulation. In statistical methods in such cases as test of independency, test of goodness of fit, test of significance, log likelihood ratio test, significance test and model selection we use chi-square statistic. The approach of the paper will enhance the students’ and researchers’ ability to understand simulation and sampling distribution. The paper contains an expository discussion of chi-square statistic, its derivation and distribution and its derivatives such as t-distribution and F-distribution. We consider two chi-squares, the empirical chi-square statistic and the theoretical chi-square distribution. The empirical distribution of chi-square statistic agrees closely with the theoretical chi-square distribution for large simulations, only the empirical distribution near to zero has lower density compared to the theoretical one for one degree of freedom. This is because the theoretical chi-square distribution at 1 degree of freedom has infinite density near to zero, but for any number of simulation the empirical distribution has finite density near to zero. Chi-square itself turns to normal distribution as the degree of freedom is large.

Book ChapterDOI
01 Jan 2023
TL;DR: A second stream of resampling methods was introduced by Quenouille (1949, 1956) and Tukey (1958) under the name jackknife as discussed by the authors , which estimate aspects of the sampling distribution of random quantities like the sample mean, sample median, correlation coefficient, hypothesis testing statistics, and pivotal quantities.
Abstract: Resampling in statistics started with the fundamental permutation methods of R. A. Fisher in the 1930s. A second stream of resampling methods was introduced by Quenouille (1949, 1956) and Tukey (1958) under the name jackknife. However, the real explosion of resampling use in statistics started with Brad Efron's elegant and innovative first bootstrap paper, Efron (1979). All three methods estimate aspects of the sampling distribution of random quantities like the sample mean, sample median, correlation coefficient, hypothesis testing statistics, and pivotal quantities. The bootstrap is the most general method and most widely used, but permutation methods have a unique role in producing exact level-α tests, and the jackknife has a solid niche producing variance estimates.

Posted ContentDOI
09 Apr 2023
TL;DR: In this paper , a 1-nearest-neighbor sampling method was proposed to approximate the conditional distribution that encodes the null hypothesis, and theoretically, the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance.
Abstract: The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known under the null hypothesis and then it is compared to the distribution of the observed samples of the original data. The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. Specifically, we utilize the computationally efficient 1-nearest-neighbor to approximate the conditional distribution that encodes the null hypothesis. Then, theoretically, we show that the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance. Furthermore, we take the classifier-based conditional mutual information estimator as our test statistic. The test statistic as an empirical fundamental information theoretic quantity is able to well capture the conditional-dependence feature. We show that our proposed test is computationally very fast, while controlling type I and II errors quite well. Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.

Book ChapterDOI
01 Jan 2023
TL;DR: The setup of inferential statistics is as follows as discussed by the authors , where a population is a set of all individuals or objects that we are interested in, but is too large to study in its entirety.
Abstract: The setup of inferential statistics is as follows. There is a population which is a set of all individuals or objects that we are interested in, but is too large to study in its entirety. Instead we obtain a random sample which is a subset of the population and use the information available in the subset to generalize to the population. Usually we specify a model for the probability distribution for the population. This is a probability density function (pdf) for continuous or probability mass function (pmf) for discrete distributions. Although the form of the distribution can be specified depending on the background information on the population, certain numerical characters may be unknown. These unknown but fixed numerical characteristics are associated with the model and are called parameters. A sample statistic is a numerical measure of a sample that can be calculated from the observations. The sample statistics are used to draw inference about population parameters.

Posted ContentDOI
12 May 2023
TL;DR: Zhang et al. as discussed by the authors proposed a score matching-based attack (SMBA) method to perform the adversarial sample attacks by manipulating the probability distribution of the samples, which can show good transferability in the face of different datasets and models, and give reasonable explanations from the perspective of mathematical theory and feature space.
Abstract: In recent years, with the rapid development of technology, artificial intelligence(AI) security issues represented by adversarial sample attack have aroused widespread concern in society. Adversarial samples are often generated by surrogate models and then transfer to attack the target model, and most AI models in real-world scenarios belong to black boxes, thus transferability becomes a key factor to measure the quality of adversarial samples. The traditional method relies on the decision boundary of the classifier and takes the boundary crossing as the only judgment metric without considering the probability distribution of the sample itself, which results in an irregular way of adding perturbations to the adversarial sample, an unclear path of generation, and a lack of transferability and interpretability. In the probabilistic generative model, after learning the probability distribution of the samples, a random term can be added to the sampling to gradually transform the noise into a new independent and identically distributed sample. Inspired by this idea, we believe that by removing the random term, the adversarial sample generation process can be regarded as static sampling of the probabilistic generative model, which guides the adversarial samples out of the original probability distribution and into the target probability distribution, and helps to improve transferability and interpretability. Therefore, we propose a Score Matching-Based Attack(SMBA) method to perform the adversarial sample attacks by manipulating the probability distribution of the samples, which can show good transferability in the face of different datasets and models, and give reasonable explanations from the perspective of mathematical theory and feature space. In conclusion, our research establishes a bridge between probabilistic generative models and adversarial samples, provides a new entry angle for the study of adversarial samples, and brings new thinking to AI security.

Journal ArticleDOI
TL;DR: In this article , the authors compare the performance of calibrated and matched estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the notprobable sample, and the randomization distribution for the probability sample.
Abstract: Abstract Matching a nonprobability sample to a probability sample is one strategy both for selecting the nonprobability units and for weighting them. This approach has been employed in the past to select subsamples of persons from a large panel of volunteers. One method of weighting, introduced here, is to assign a unit in the nonprobability sample the weight from its matched case in the probability sample. The properties of resulting estimators depend on whether the probability sample weights are inverses of selection probabilities or are calibrated. In addition, imperfect matching can cause estimates from the matched sample to be biased so that its weights need to be adjusted, especially when the size of the volunteer panel is small. Calibration weighting combined with matching is one approach to correct bias and reduce variances. We explore the theoretical properties of the matched and matched, calibrated estimators with respect to a quasirandomization distribution that is assumed to describe how units in the nonprobability sample are observed, a superpopulation model for analysis variables collected in the nonprobability sample, and the randomization distribution for the probability sample. Numerical studies using simulated and real data from the 2015 US Behavioral Risk Factor Surveillance Survey are conducted to examine the performance of the alternative estimators.

Journal ArticleDOI
01 Mar 2023
TL;DR: In this paper , the authors proposed a split-based sequential sampling approach based on optimisation that generates more diverse operation scenarios for training ML models than state-of-the-art approaches.
Abstract: Machine learning (ML) for real-time security assessment requires a diverse training database to be accurate for scenarios beyond historical records. Generating diverse operating conditions is highly relevant for the uncertain future of emerging power systems that are completely different to historical power systems. In response, for the first time, this work proposes a novel split-based sequential sampling approach based on optimisation that generates more diverse operation scenarios for training ML models than state-of-the-art approaches. This work also proposes a volume-based coverage metric, the convex hull volume (CHV), to quantify the quality of samplers based on the coverage of generated data. This metric accounts for the distribution of samples across multidimensional space to measure coverage within the physical network limits. Studies on IEEE test cases with 6, 68 and 118 buses demonstrate the efficiency of the approach. Samples generated using the proposed split-based sampling cover 37.5% more volume than random sampling in the IEEE 68-bus system. The proposed CHV metric can assess the quality of generated samples (standard deviation of 0.74) better than a distance-based coverage metric which outputs the same value (standard deviation of <0.001) for very different data distributions in the IEEE 68-bus system. As we demonstrate, the proposed split-based sampling is relevant as a pre-step for training ML models for critical tasks such as security assessment.

Posted ContentDOI
28 Apr 2023
TL;DR: The authors employ conditional normalizing flows to learn the full conditional probability distribution from which they sample new events for conditional values drawn from the target distribution to produce the desired, altered distribution, which leads to a statistical precision up to three times greater than using reweighting techniques with identical sample sizes for the source and target distributions.
Abstract: We present an alternative to reweighting techniques for modifying distributions to account for a desired change in an underlying conditional distribution, as is often needed to correct for mis-modelling in a simulated sample. We employ conditional normalizing flows to learn the full conditional probability distribution from which we sample new events for conditional values drawn from the target distribution to produce the desired, altered distribution. In contrast to common reweighting techniques, this procedure is independent of binning choice and does not rely on an estimate of the density ratio between two distributions. In several toy examples we show that normalizing flows outperform reweighting approaches to match the distribution of the target.We demonstrate that the corrected distribution closes well with the ground truth, and a statistical uncertainty on the training dataset can be ascertained with bootstrapping. In our examples, this leads to a statistical precision up to three times greater than using reweighting techniques with identical sample sizes for the source and target distributions. We also explore an application in the context of high energy particle physics.

Book ChapterDOI
23 Mar 2023
TL;DR: In this paper , statistical inference from observed data, statistical inference infers the properties of the underlying probability distribution, and the t-test and some nonparametric alternatives are covered.
Abstract: From observed data, statistical inference infers the properties of the underlying probability distribution. For hypothesis testing, the t-test and some non-parametric alternatives are covered. Ways to infer confidence intervals and estimate goodness of fit are followed by the F-test (for test of variances) and the Mann-Kendall trend test. Bootstrap sampling and field significance are also covered.