scispace - formally typeset
Search or ask a question
Author

James B. MacQueen

Bio: James B. MacQueen is an academic researcher from University of California, Los Angeles. The author has contributed to research in topics: Monotone polygon & Population. The author has an hindex of 9, co-authored 17 publications receiving 24224 citations. Previous affiliations of James B. MacQueen include Saint Petersburg State University.

Papers
More filters
01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that a random probability measure P* on X has a Ferguson distribution with parameter p if for every finite partition (B1, *. *, B) of X, the vector p*(B,), * * *, p *(B) has a Dirichlet distribution with parameters (Bj), *--, cp(B,) (when p(B), = 0, this means p*) = 0 with probability 1).
Abstract: Let p be any finite positive measure on (the Borel sets of) a complete separable metric space X. We shall say that a random probability measure P* on X has a Ferguson distribution with parameter p if for every finite partition (B1, * . *, B) of X the vector p*(B,), * * *, p*(B,) has a Dirichlet distribution with parameter (Bj), *--, cp(B,) (when p(B,) = 0, this means p*(B,) = 0 with probability 1). Ferguson (3) has shown that, for any p, Ferguson p* exist and when used as prior distributions yield Bayesian counterparts to well-known classical nonpa- rametric tests. He also shows that p* is a.s. discrete. His approach involves a rather deep study of the gamma process. One of us (1) has given a different and perhaps simpler proof that Ferguson priors concentrate on discrete distributions. In this note we give still a third approach to Ferguson distributions, exploiting their connection with generalized Polya urn schemes. We shall say that a sequence (X,, n > 1} of random variables with values in X is a Poilya sequence with parameter 1a if for every B c X (1) P(X1 e B) = p(B)/p(X) and (2) P{X,+1 e B I1 **,, X = pn(B)/1p(X) where p. = p + 3 l(Xi) and 3(x) denotes the unit measure concentrating at x. Note that, for finite X, the sequence {XJ} represents the results of successive draws from an urn where initially the urn has p(x) balls of color x and, after each draw, the ball drawn is replaced and another ball of its same color is added to the urn. Note also that, without the restriction to finite X, for any (Borel measurable) function zS on X, the sequence {0(X")} is a P6lya sequence with parameter qSp, where q4(A) = p{l e Al. We now describe the connections between Polya sequences and Ferguson distributions.

1,469 citations

Journal ArticleDOI
TL;DR: In this article, a modified dynamic programming method for the problem of choosing the action at the beginning of each period which will maximize future total discounted income is described, and the convergence appears to be quite rapid.

147 citations

Journal ArticleDOI
TL;DR: In this article, the problem of maximizing expected net value of an action, i.e., the expected value of the action finally taken minus the expected total cost of searching and testing, is studied.
Abstract: A person searches through a population of possible actions, looking for one with high value. He discovers these actions one after another, paying a certain cost each time. On his first encounter with a possible action, the person obtains some preliminary information about its actual value, and at this point he can take the action, he can continue looking, or, at a certain cost, he can perform a test and obtain some more information about its actual value. If he decides to test, then having obtained the additional information he can again either take the action or continue looking. The problem is to conduct this process in such a manner as to maximize expected net value, that is, the expected value of the action finally taken minus the expected total cost of searching and testing. This problem is analyzed and optimal policies are given in the case where the possible actions are regarded as independent selections from a large population. The joint distribution in this population of the actual value, the pre...

41 citations


Cited by
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Book
01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Abstract: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

14,825 citations

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations