scispace - formally typeset
Search or ask a question
Topic

Probability distribution

About: Probability distribution is a research topic. Over the lifetime, 40928 publications have been published within this topic receiving 1105809 citations. The topic is also known as: distribution.


Papers
More filters
01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.
Abstract: We discuss the following problem given a random sample X = (X 1, X 2,…, X n) from an unknown probability distribution F, estimate the sampling distribution of some prespecified random variable R(X, F), on the basis of the observed data x. (Standard jackknife theory gives an approximate mean and variance in the case R(X, F) = \(\theta \left( {\hat F} \right) - \theta \left( F \right)\), θ some parameter of interest.) A general method, called the “bootstrap”, is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.

14,483 citations

Book
01 Jun 1969
TL;DR: In this paper, Monte Carlo techniques are used to fit dependent and independent variables least squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

12,737 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of the estimation of a probability density function and of determining the mode of the probability function is discussed. Only estimates which are consistent and asymptotically normal are constructed.
Abstract: : Given a sequence of independent identically distributed random variables with a common probability density function, the problem of the estimation of a probability density function and of determining the mode of a probability function are discussed. Only estimates which are consistent and asymptotically normal are constructed. (Author)

10,114 citations

Journal ArticleDOI
TL;DR: In this article, the applicability of statistics to a wide field of problems is discussed, and examples of simple and complex distributions are given, as well as a discussion of the application of statistics in a wide range of problems.
Abstract: This paper discusses the applicability of statistics to a wide field of problems. Examples of simple and complex distributions are given.

9,091 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
86% related
Nonlinear system
208.1K papers, 4M citations
86% related
Monte Carlo method
95.9K papers, 2.1M citations
85% related
Matrix (mathematics)
105.5K papers, 1.9M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023241
2022517
20211,770
20202,067
20191,987
20181,871