scispace - formally typeset
Search or ask a question
Book

Sequential Methods in Pattern Recognition and Machine Learning

17 Jan 2012-
About: The article was published on 2012-01-17 and is currently open access. It has received 248 citations till now. The article focuses on the topics: Feature (machine learning) & Unsupervised learning.
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations

Journal ArticleDOI
01 Jun 1991
TL;DR: The subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed, and the relation between decision trees and neutral networks (NN) is also discussed.
Abstract: A survey is presented of current methods for decision tree classifier (DTC) designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, the subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed. The relation between decision trees and neutral networks (NN) is also discussed. >

3,176 citations

Journal ArticleDOI
TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
Abstract: A new recursive algorithm of stochastic approximation type with the averaging of trajectories is investigated. Convergence with probability one is proved for a variety of classical optimization and identification problems. It is also demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

1,970 citations

Journal ArticleDOI
TL;DR: A class of weight-setting methods for lazy learning algorithms which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings.
Abstract: Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN‘s performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.

762 citations

Journal ArticleDOI
TL;DR: It may be argued, rather persuasively, that most of the concepts encountered in various domains of human knowledge are, in reality, much too complex to admit of simple or precise definition.
Abstract: It may be argued, rather persuasively, that most of the concepts encountered in various domains of human knowledge are, in reality, much too complex to admit of simple or precise definition. This is true, for example, of the concepts of recession and utility in economics; schizophrenia and arthritis in medicine; stability and adaptivity in system theory; sparseness and stiffness in numerical analysis; grammaticality and meaning in linguistics; performance measurement and correctness in computer science; truth and causality in philosophy; intelligence and creativity in psychology; and obscenity and insanity in law.

655 citations

References
More filters
Journal ArticleDOI
TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
Abstract: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

12,243 citations

01 Apr 1965
TL;DR: ISODATA, a novel method of data analysis and pattern classification, is described in verbal and pictorial terms, in terms of a two-dimensional example, and by giving the mathematical calculations that the method uses.
Abstract: : ISODATA, a novel method of data analysis and pattern classification, is described in verbal and pictorial terms, in terms of a two-dimensional example, and by giving the mathematical calculations that the method uses. The technique clusters many-variable data around points in the data's original high- dimensional space and by doing so provides a useful description of the data. A brief summary of results from analyzing alphanumeric, gaussian, sociological and meteorological data is given. In the appendix, generalizations of the existing technique to clustering around lines and planes are discussed and a tentative algorithm for clustering around lines is given.

1,080 citations

Journal ArticleDOI
TL;DR: This formidable book has offered a somewhat different program involving the design and testing of brain models described as perceptrons, concerned not with devices for artificial intelligence, but, rather, with "the physical structures and neurodynamics principles which underly "natural intelligence.
Abstract: In recent years, there have been a number of engineering projects concerned with the design of brain models for pattern recognition and artificial intelligence. The basic assumption, underlying these projects is that the brain operates by built-in algorithmic methods similar to those employed in modern digital computers. Hence, nervous activity can be simulated by these computers. The value of such a program has been challenged by Lashley and others on the grounds that computer-simulated behavior is artificial, that the model is an invention operating on extrabiological principles. In this formidable book, Rosenblatt has offered a somewhat different program involving the design and testing of brain models described as perceptrons. His program is concerned not with devices for artificial intelligence, but, rather, with "the physical structures and neurodynamics principles which underly "natural intelligence." A perceptron consists of a set of signal generating units ("neuro-mimes") connected together to form a network.

538 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable, and the identifiability of all finite mixtures (either of normal distributions or of binomial distributions) is shown.
Abstract: In general, the class of mixtures of the family of normal distributions or of Gamma (Type III) distributions or binomial distributions is not identifiable (see [3], [4] or Section 2 below for the meaning of this statement). In [4] it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable. Here, attention will be confined to finite mixtures and a theorem will be proved yielding the identifiability of all finite mixtures of Gamma (or of normal) distributions. Thus, estimation of the mixing distribution on the basis of observations from the mixture is feasible in these cases. Some separate results on identifiability of finite mixtures of binomial distributions also appear.

502 citations

Journal ArticleDOI
TL;DR: In this article, the Robbins-Monro procedure and the Kiefer-Wolfowitz procedure are considered, for which the magnitude of the $n$th step depends on the number of changes in sign in $(X_i - X_{i - 1})$ for n = 2, \cdots, n.
Abstract: Using a stochastic approximation procedure $\{X_n\}, n = 1, 2, \cdots$, for a value $\theta$, it seems likely that frequent fluctuations in the sign of $(X_n - \theta) - (X_{n - 1} - \theta) = X_n - X_{n - 1}$ indicate that $|X_n - \theta|$ is small, whereas few fluctuations in the sign of $X_n - X_{n - 1}$ indicate that $X_n$ is still far away from $\theta$. In view of this, certain approximation procedures are considered, for which the magnitude of the $n$th step (i.e., $X_{n + 1} - X_n$) depends on the number of changes in sign in $(X_i - X_{i - 1})$ for $i = 2, \cdots, n$. In theorems 2 and 3, $$X_{n + 1} - X_n$$ is of the form $b_nZ_n$, where $Z_n$ is a random variable whose conditional expectation, given $X_1, \cdots, X_n$, has the opposite sign of $X_n - \theta$ and $b_n$ is a positive real number. $b_n$ depends in our processes on the changes in sign of $$X_i - X_{i - 1}(i \leqq n)$$ in such a way that more changes in sign give a smaller $b_n$. Thus the smaller the number of changes in sign before the $n$th step, the larger we make the correction on $X_n$ at the $n$th step. These procedures may accelerate the convergence of $X_n$ to $\theta$, when compared to the usual procedures ([3] and [5]). The result that the considered procedures converge with probability one may be useful for finding optimal procedures. Application to the Robbins-Monro procedure (Theorem 2) seems more interesting than application to the Kiefer-Wolfowitz procedure (Theorem 3).

403 citations