Sequential Methods in Pattern Recognition and Machine Learning

Home
/
Papers
/
Sequential Methods in Pattern Recognition and Machine Learning

Book•

Sequential Methods in Pattern Recognition and Machine Learning

17 Jan 2012-

About: The article was published on 2012-01-17 and is currently open access. It has received 248 citations till now. The article focuses on the topics: Feature (machine learning) & Unsupervised learning.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Generalizing the hough transform to detect arbitrary shapes

[...]

Dana H. Ballard¹•Institutions (1)

University of Rochester¹

01 Jan 1987-Pattern Recognition

TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

...read moreread less

4,310 citations

Journal Article•DOI•

A survey of decision tree classifier methodology

[...]

S.R. Safavian¹, David A. Landgrebe¹•Institutions (1)

Purdue University¹

01 Jun 1991

TL;DR: The subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed, and the relation between decision trees and neutral networks (NN) is also discussed.

...read moreread less

Abstract: A survey is presented of current methods for decision tree classifier (DTC) designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, the subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed. The relation between decision trees and neutral networks (NN) is also discussed. >

...read moreread less

3,176 citations

Journal Article•DOI•

Acceleration of stochastic approximation by averaging

[...]

Boris T. Polyak, Anatoli Juditsky

01 Jul 1992-Siam Journal on Control and Optimization

TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

...read moreread less

Abstract: A new recursive algorithm of stochastic approximation type with the averaging of trajectories is investigated. Convergence with probability one is proved for a variety of classical optimization and identification problems. It is also demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

...read moreread less

1,970 citations

Journal Article•DOI•

A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms

[...]

Dietrich Wettschereck¹, David W. Aha², Takao Mohri³•Institutions (3)

Center for Information Technology¹, United States Naval Research Laboratory², University of Tokyo³

01 Feb 1997-Artificial Intelligence Review

TL;DR: A class of weight-setting methods for lazy learning algorithms which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings.

...read moreread less

Abstract: Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN‘s performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.

...read moreread less

762 citations

Journal Article•DOI•

A fuzzy-algorithmic approach to the definition of complex or imprecise concepts

[...]

Lotfi A. Zadeh¹•Institutions (1)

University of California, Berkeley¹

01 May 1976-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: It may be argued, rather persuasively, that most of the concepts encountered in various domains of human knowledge are, in reality, much too complex to admit of simple or precise definition.

...read moreread less

Abstract: It may be argued, rather persuasively, that most of the concepts encountered in various domains of human knowledge are, in reality, much too complex to admit of simple or precise definition. This is true, for example, of the concepts of recession and utility in economics; schizophrenia and arthritis in medicine; stability and adaptivity in system theory; sparseness and stiffness in numerical analysis; grammaticality and meaning in linguistics; performance measurement and correctness in computer science; truth and causality in philosophy; intelligence and creativity in psychology; and obscenity and insanity in law.

...read moreread less

655 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Nearest neighbor pattern classification

[...]

Thomas M. Cover¹, Peter E. Hart²•Institutions (2)

Stanford University¹, SRI International²

01 Jan 1967-IEEE Transactions on Information Theory

TL;DR: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points, so it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

...read moreread less

Abstract: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.

...read moreread less

12,243 citations

Isodata, a novel method of data analysis and pattern classification

[...]

Geoffrey H. Ball, David J. Hall

01 Apr 1965

TL;DR: ISODATA, a novel method of data analysis and pattern classification, is described in verbal and pictorial terms, in terms of a two-dimensional example, and by giving the mathematical calculations that the method uses.

...read moreread less

Abstract: : ISODATA, a novel method of data analysis and pattern classification, is described in verbal and pictorial terms, in terms of a two-dimensional example, and by giving the mathematical calculations that the method uses. The technique clusters many-variable data around points in the data's original high- dimensional space and by doing so provides a useful description of the data. A brief summary of results from analyzing alphanumeric, gaussian, sociological and meteorological data is given. In the appendix, generalizations of the existing technique to clustering around lines and planes are discussed and a tentative algorithm for clustering around lines is given.

...read moreread less

1,080 citations

Journal Article•DOI•

Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms.

[...]

J. Orbach

01 Sep 1962-Archives of General Psychiatry

TL;DR: This formidable book has offered a somewhat different program involving the design and testing of brain models described as perceptrons, concerned not with devices for artificial intelligence, but, rather, with "the physical structures and neurodynamics principles which underly "natural intelligence.

...read moreread less

Abstract: In recent years, there have been a number of engineering projects concerned with the design of brain models for pattern recognition and artificial intelligence. The basic assumption, underlying these projects is that the brain operates by built-in algorithmic methods similar to those employed in modern digital computers. Hence, nervous activity can be simulated by these computers. The value of such a program has been challenged by Lashley and others on the grounds that computer-simulated behavior is artificial, that the model is an invention operating on extrabiological principles. In this formidable book, Rosenblatt has offered a somewhat different program involving the design and testing of brain models described as perceptrons. His program is concerned not with devices for artificial intelligence, but, rather, with "the physical structures and neurodynamics principles which underly "natural intelligence." A perceptron consists of a set of signal generating units ("neuro-mimes") connected together to form a network.

...read moreread less

538 citations

Journal Article•DOI•

Identifiability of Finite Mixtures

[...]

Henry Teicher

01 Dec 1963-Annals of Mathematical Statistics

TL;DR: In this paper, it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable, and the identifiability of all finite mixtures (either of normal distributions or of binomial distributions) is shown.

...read moreread less

Abstract: In general, the class of mixtures of the family of normal distributions or of Gamma (Type III) distributions or binomial distributions is not identifiable (see [3], [4] or Section 2 below for the meaning of this statement). In [4] it was shown that the class of all mixtures of a one-parameter additively-closed family of distributions is identifiable. Here, attention will be confined to finite mixtures and a theorem will be proved yielding the identifiability of all finite mixtures of Gamma (or of normal) distributions. Thus, estimation of the mixing distribution on the basis of observations from the mixture is feasible in these cases. Some separate results on identifiability of finite mixtures of binomial distributions also appear.

...read moreread less

502 citations

Journal Article•DOI•

Accelerated Stochastic Approximation

[...]

Harry Kesten

01 Mar 1958-Annals of Mathematical Statistics

TL;DR: In this article, the Robbins-Monro procedure and the Kiefer-Wolfowitz procedure are considered, for which the magnitude of the $n$th step depends on the number of changes in sign in $(X_i - X_{i - 1})$ for n = 2, \cdots, n.

...read moreread less

Abstract: Using a stochastic approximation procedure $\{X_n\}, n = 1, 2, \cdots$, for a value $\theta$, it seems likely that frequent fluctuations in the sign of $(X_n - \theta) - (X_{n - 1} - \theta) = X_n - X_{n - 1}$ indicate that $|X_n - \theta|$ is small, whereas few fluctuations in the sign of $X_n - X_{n - 1}$ indicate that $X_n$ is still far away from $\theta$. In view of this, certain approximation procedures are considered, for which the magnitude of the $n$th step (i.e., $X_{n + 1} - X_n$) depends on the number of changes in sign in $(X_i - X_{i - 1})$ for $i = 2, \cdots, n$. In theorems 2 and 3, $$X_{n + 1} - X_n$$ is of the form $b_nZ_n$, where $Z_n$ is a random variable whose conditional expectation, given $X_1, \cdots, X_n$, has the opposite sign of $X_n - \theta$ and $b_n$ is a positive real number. $b_n$ depends in our processes on the changes in sign of $$X_i - X_{i - 1}(i \leqq n)$$ in such a way that more changes in sign give a smaller $b_n$. Thus the smaller the number of changes in sign before the $n$th step, the larger we make the correction on $X_n$ at the $n$th step. These procedures may accelerate the convergence of $X_n$ to $\theta$, when compared to the usual procedures ([3] and [5]). The result that the considered procedures converge with probability one may be useful for finding optimal procedures. Application to the Robbins-Monro procedure (Theorem 2) seems more interesting than application to the Kiefer-Wolfowitz procedure (Theorem 3).

...read moreread less

403 citations