The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

/pdf/some-methods-for-classification-and-analysis-of-multivariate-4pswti19oz.pdf

Some methods for classification and analysis of multivariate observations

From the Publisher:
This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

/pdf/neural-networks-for-pattern-recognition-34puigiau1.pdf

Neural networks for pattern recognition

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

/pdf/a-tutorial-on-support-vector-regression-10mfvogjkf.pdf

A tutorial on support vector regression

: Given a sequence of independent identically distributed random variables with a common probability density function, the problem of the estimation of a probability density function and of determining the mode of a probability function are discussed. Only estimates which are consistent and asymptotically normal are constructed. (Author)

/pdf/on-estimation-of-a-probability-density-function-and-mode-2op98ai4s4.pdf

On Estimation of a Probability Density Function and Mode

Any series of observations ordered along a single dimension, such as time, may be thought of as a time series. The emphasis in time series analysis is on studying the dependence among observations at different points in time. What distinguishes time series analysis from general multivariate analysis is precisely the temporal order imposed on the observations. Many economic variables, such as GNP and its components, price indices, sales, and stock returns are observed over time. In addition to being interested in the contemporaneous relationships among such variables, we are often concerned with relationships between their current and past values, that is, relationships over time.

Time Series Analysis

Stochastic Processes

Probability Theory as the Study of Mathematical Models of Random Phenomena. Basic Probability Theory. Independence and Dependence. Numerical-Valued Random Phenomena. Mean and Variance of a Probability Law. Normal, Poisson, and Related Probability Laws. Random Variables. Expectation of a Random Variable. Sums of Independent Random Variables. Sequences of Random Variables. Tables. Answers to Odd-Numbered Exercises. Index.

Modern probability theory and its applications

This article attempts to describe an approach to statistical data analysis which is simultaneously parametric and nonparametric. Given a random sample X 1, …, X n of a random variable X, one would like (1) to test the parametric goodness-of-fit hypothesis H 0 that the true distribution function F is of the form F(x) = F0[(x − μ)/σ)], where F 0 is specified, and (2) when H 0 is not accepted, to estimate nonparametrically the true density-quantile function fQ(u) and score function J(u) = − (fQ)'(u). The article also introduces density-quantile functions, autoregressive density estimation, estimation of location and scale parameters by regression analysis of the sample quantile function, and quantile-box plots.

Nonparametric Statistical Data Modeling

This paper is concerned with the spectral analysis of wide sense stationary time series which possess a spectral density function and whose fourth moment functions satisfy an integrability condition (which includes Gaussian processes). Consistent estimates are obtained for the spectral density function as well as for the spectral distribution function and a general class of spectral averages. Optimum consistent estimates are chosen on the basis of criteria involving the notions of order of consistency and asymptotic variance. The problem of interpolating the estimated spectral density, so that only a finite number of quantities need be computed to determine the entire graph, is also discussed. Both continuous and discrete time series are treated.

/pdf/on-consistent-estimates-of-the-spectrum-of-a-stationary-time-9pmqum825u.pdf

Emanuel Parzen

Papers

On Estimation of a Probability Density Function and Mode

Stochastic Processes

Modern probability theory and its applications

Nonparametric Statistical Data Modeling

On Consistent Estimates of the Spectrum of a Stationary Time Series