scispace - formally typeset
Search or ask a question

Showing papers on "k-nearest neighbors algorithm published in 1995"


Proceedings ArticleDOI
22 May 1995
TL;DR: This paper presents an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalizes it to finding the k nearest neighbors.
Abstract: A frequently encountered type of query in Geographic Information Systems is to find the k nearest neighbor objects to a given point in space. Processing such queries requires substantially different search algorithms than those for location or range queries. In this paper we present an efficient branch-and-bound R-tree traversal algorithm to find the nearest neighbor object to a point, and then generalize it to finding the k nearest neighbors. We also discuss metrics for an optimistic and a pessimistic search ordering strategy as well as for pruning. Finally, we present the results of several experiments obtained using the implementation of our algorithm and examine the behavior of the metrics and the scalability of the algorithm.

1,600 citations


Book ChapterDOI
01 May 1995
TL;DR: In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of Dempster-Shafer theory to provide a global treatment of such issues as ambiguity and distance rejection, and imperfect knowledge regarding the class membership of training patterns.
Abstract: In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of Dempster-Shafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses regarding the class membership of that pattern. The degree of support is defined as a function of the distance between the two vectors. The evidence of the k nearest neighbors is then pooled by means of Dempster's rule of combination. This approach provides a global treatment of such issues as ambiguity and distance rejection, and imperfect knowledge regarding the class membership of training patterns. The effectiveness of this classification scheme as compared to the voting and distance-weighted k-NN procedures is demonstrated using several sets of simulated and real-world data. >

889 citations


Journal ArticleDOI
TL;DR: The notion of a well-separated pair decomposition of points in d-dimensional space is defined and the resulting decomposition is applied to the efficient computation of nearest neighbors and body potential fields.
Abstract: We define the notion of a well-separated pair decomposition of points in d-dimensional space We then develop efficient sequential and parallel algorithms for computing such a decomposition We apply the resulting decomposition to the efficient computation of k-nearest neighbors and n-body potential fields

587 citations


Journal ArticleDOI
TL;DR: This work improves a neural network and nearest-neighbor scoring system for predicting protein secondary structure by taking into consideration N and C-terminal positions of alpha-helices and beta-strands and also beta-turns as distinctive types of secondary structure.

310 citations


Journal ArticleDOI
TL;DR: The mean nearest-neighbor distance {lambda} as a function of the packing fraction is computed for such many-body systems and compared to rigorous bounds on {lambda}, found to be in excellent agreement with available computer-simulation data.
Abstract: The probability of finding a nearest neighbor at some radial distance from a given particle in a system of interacting particles is of fundamental importance in a host of fields in the physical as well as biological sciences. A procedure is developed to obtain analytical expressions for nearest-neighbor probability functions for random isotropic packings of hard D-dimensional spheres that are accurate for all densities, i.e., up to the random close-packing fraction. Using these results, the mean nearest-neighbor distance \ensuremath{\lambda} as a function of the packing fraction is computed for such many-body systems and compared to rigorous bounds on \ensuremath{\lambda} derived here. Our theoretical results are found to be in excellent agreement with available computer-simulation data.

271 citations


Journal ArticleDOI
TL;DR: In domains where the decision boundaries are axis-parallel, the NGE approach can produce excellent generalization with interpretable hypotheses, and in all domains tested, NGE algorithms require much less memory to store generalized exemplars than is required by NN algorithms.
Abstract: Algorithms based on Nested Generalized Exemplar (NGE) theory (Salzberg, 1991) classify new data points by computing their distance to the nearest “generalized exemplar” (i.e., either a point or an axis-parallel rectangle). They combine the distance-based character of nearest neighbor (NN) classifiers with the axis-parallel rectangle representation employed in many rule-learning systems. An implementation of NGE was compared to the k-nearest neighbor (kNN) algorithm in 11 domains and found to be significantly inferior to kNN in 9 of them. Several modifications of NGE were studied to understand the cause of its poor performance. These show that its performance can be substantially improved by preventing NGE from creating overlapping rectangles, while still allowing complete nesting of rectangles. Performance can be further improved by modifying the distance metric to allow weights on each of the features (Salzberg, 1991). Best results were obtained in this study when the weights were computed using mutual information between the features and the output class. The best version of NGE developed is a batch algorithm (BNGE FWMI) that has no user-tunable parameters. BNGE FWMI's performance is comparable to the first-nearest neighbor algorithm (also incorporating feature weights). However, the k-nearest neighbor algorithm is still significantly superior to BNGE FWMI in 7 of the 11 domains, and inferior to it in only 2. We conclude that, even with our improvements, the NGE approach is very sensitive to the shape of the decision boundaries in classification problems. In domains where the decision boundaries are axis-parallel, the NGE approach can produce excellent generalization with interpretable hypotheses. In all domains tested, NGE algorithms require much less memory to store generalized exemplars than is required by NN algorithms.

229 citations


Journal ArticleDOI
TL;DR: A frequently encountered type of query in Geographic Information Systems is to find the k nearest neighbor objects to a given point in space.
Abstract: A frequently encountered type of query in Geographic Information Systems is to find the k nearest neighbor objects to a given point in space. Processing such queries requires substantially differen...

211 citations


Journal ArticleDOI
TL;DR: A genetic algorithm is applied for selecting a reference set for the k-Nearest Neighbors rule and the results are commented together with those obtained with the standard k-NN, random selection, Wilson's technique, and the MULTIEDIT algorithm.

173 citations


Proceedings Article
J. Bala1, J. Huang1, H. Vafaie1, K. Dejong1, Harry Wechsler1 
20 Aug 1995
TL;DR: A hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification is introduced.
Abstract: This paper introduces a hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification. A GA is used to search the space of all possible subsets of a large set of candidate discrimination features. For a given feature subset, ID3 is invoked to produce a decision tree. The classification performance of the decision tree on unseen data is used as a measure of fitness for the given feature set, which, in turn, is used by the GA to evolve better feature sets. This GA-ID3 process iterates until a feature subset is found with satisfactory classification performance. Experimental results are presented which illustrate the feasibility of our approach on difficult problems involving recognizing visual concepts in satellite and facial image data. The results also show improved classification performance and reduced description complexity when compared against standard methods for feature selection.

139 citations


Journal ArticleDOI
TL;DR: Rates of convergence for nearest neighbor estimation are established in a general framework in terms of metric covering numbers of the underlying space and a consistency result is established for k/sub n/-nearest neighbor estimation under arbitrary sampling and a convergence rate matching established rates for i.i.d. sampling is established.
Abstract: Rates of convergence for nearest neighbor estimation are established in a general framework in terms of metric covering numbers of the underlying space. The first result is to find explicit finite sample upper bounds for the classical independent and identically distributed (i.i.d.) random sampling problem in a separable metric space setting. The convergence rate is a function of the covering numbers of the support of the distribution. For example, for bounded subsets of R/sup r/, the convergence rate is O(1/n/sup 2/r/). The main result is to extend the problem to allow samples drawn from a completely arbitrary random process in a separable metric space and to examine the performance in terms of the individual sample sequences. The authors show that for every sequence of samples the asymptotic time-average of nearest neighbor risks equals twice the time-average of the conditional Bayes risks of the sequence. Finite sample upper bounds under arbitrary sampling are again obtained in terms of the covering numbers of the underlying space. In particular, for bounded subsets of R/sup r/ the convergence rate of the time-averaged risk is O(1/n/sup 2/r/). The authors then establish a consistency result for k/sub n/-nearest neighbor estimation under arbitrary sampling and prove a convergence rate matching established rates for i.i.d. sampling. Finally, they show how their arbitrary sampling results lead to some classical i.i.d. sampling results and in fact extend them to stationary sampling. The framework and results are quite general while the proof techniques are surprisingly elementary. >

136 citations


Journal ArticleDOI
TL;DR: In this paper, the exact ground states of grids of sizes up to 100×100 can be determined in a moderate amount of computation time, using a technique called branch and cut, based on a Gaussian bond distribution, and an exterior magnetic field.
Abstract: In this paper we study two-dimensional Ising spin glasses on a grid with nearest neighbor and periodic boundary interactions, based on a Gaussian bond distribution, and an exterior magnetic field We show how using a technique called branch and cut, the exact ground states of grids of sizes up to 100×100 can be determined in a moderate amount of computation time, and we report on extensive computational tests With our method we produce results based on more than 20,000 experiments on the properties of spin glasses whose errors depend only on the assumptions on the model and not on the computational process This feature is a clear advantage of the method over other, more popular ways to compute the ground state, like Monte Carlo simulation including simulated annealing, evolutionary, and genetic algorithms, that provide only approximate ground states with a degree of accuracy that cannot be determineda priori Our ground-state energy estimation at zero field is −1317

Proceedings Article
27 Nov 1995
TL;DR: A locally adaptive form of nearest neighbor classification is proposed to try to finesse this curse of dimensionality, and a method for global dimension reduction, that combines local dimension information is proposed.
Abstract: Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions We propose a locally adaptive form of nearest neighbor classification to try to finesse this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighborhoods. We determine the local decision boundaries from centroid information, and then shrink neighborhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighborhood-based classifier can be employed, using the modified neighborhoods. We also propose a method for global dimension reduction, that combines local dimension information. We indicate how these techniques can be extended to the regression problem.

01 Sep 1995
TL;DR: Algorithms that combine a small number of component nearest neighbor classifiers, where each of the components stores aSmall number of prototypical instances are introduced, which yield composite classifiers that are more accurate than a nearest neighbors classifier that stores all training instances as prototypes.
Abstract: Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number of other classifiers?". The goal of this dissertation is to provide answers to these and closely related questions with respect to a particular model class, the class of nearest neighbor classifiers. We undertake the first study that investigates in depth the combination of nearest neighbor classifiers. Although previous research has questioned the utility of combining nearest neighbor classifiers, we introduce algorithms that combine a small number of component nearest neighbor classifiers, where each of the components stores a small number of prototypical instances. In a variety of domains, we show that these algorithms yield composite classifiers that are more accurate than a nearest neighbor classifier that stores all training instances as prototypes. The research presented in this dissertation also extends previous work on prototype selection for an independent nearest neighbor classifier. We show that in many domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes.

Journal ArticleDOI
TL;DR: Investigating how different features can be used for discrimination, alone or when integrated into an extended feature vector demonstrated that no feature set alone was sufficient for recognition whereas the extended feature vectors could discriminate between subjects successfully.
Abstract: Many features can be used to describe a human face but few have been used in combination. Extending the feature vector using orthogonal sets of measurements can reduce the variance of a matching measure, to improve discrimination capability. This paper investigates how different features can be used for discrimination, alone or when integrated into an extended feature vector. This study concentrates on improving feature definition and extraction from a frontal view image, incorporating and extending established measurements. These form an extended feature vector based on four feature sets: geometric (distance) measurements, the eye region, the outline contour, and the profile. The profile, contour, and eye region are described by the Walsh power spectrum, normalized Fourier descriptors, and normalized moments, respectively. Although there is some correlation between the geometrical measures and the other sets, their bases (distance, shape description, sequency, and statistics) are orthogonal and hence appropriate for this research. A database of face images was analyzed using two matching measures which were developed to control differently the contributions of elements of the feature sets. The match was evaluated for both measures for the separate feature sets and for the extended feature vector. Results demonstrated that no feature set alone was sufficient for recognition whereas the extended feature vector could discriminate between subjects successfully.

Journal ArticleDOI
TL;DR: A memory-based approach to robot learning is explored, using memory- based neural networks to learn models of the task to be performed, and how this approach has been used to enable a robot to learn a difficult juggling task.

Journal ArticleDOI
TL;DR: Based on the divide-and-conquer method in the density functional theory, an efficient approach was developed to compute analytically the energy gradients with respect to the nuclear coordinates as discussed by the authors.
Abstract: Based on the divide‐and‐conquer method in the density‐functional theory, an efficient approach is developed to compute analytically the energy gradients with respect to the nuclear coordinates. Tests performed show that both energy gradients and optimized molecular geometry converge to the corresponding results of the Kohn–Sham method when the nearest neighbor contributions are increased.

01 Sep 1995
TL;DR: This dissertation develops a data structure that organizes the pairs of points into pairs of clusters, or well-separated pairs, in a way that preserves approximate distance, and leads to substantial improvements over algorithms that enumerate all distinct pair of points.
Abstract: Many problems of great practical importance and theoretical interest can be posed as computations on finite point sets in fixed-dimensional Euclidean space. Often, these problems are expressed most naturally in terms of the set of all distinct pairs of points, along with the Euclidean distance between each such pair. Such problems include finding the nearest neighbor of each point and constructing a Euclidean minimum spanning tree. Though it is not purely geometric, computing the electrostatic interaction between point charges can also be posed in these terms, and has applications to particle simulation, a fundamental technique in computational physics. Because there are quadratically many distinct pairs, one would like to avoid enumerating these pairs explicitly. This dissertation develops a data structure, called the well-separated pair decomposition, that organizes the pairs of points into pairs of clusters, or well-separated pairs, in a way that preserves approximate distance. This structure has linear size and can be computed efficiently. Hence, it leads to substantial improvements over algorithms that enumerate all distinct pairs of points. The well-separated pair decomposition uses several ideas that have been exploited in previous algorithms. However, the development presented here has led to many simplifications and refinements to these approaches. Additionally, a more flexible treatment of tree construction has led to efficient parallel and dynamic algorithms that could not be developed using previous techniques. This dissertation presents sequential, parallel, and dynamic algorithms for computing the well-separated pair decomposition, along with applications to specific problems posed on point sets. Some empirical testing is also performed on a C implementation in order to estimate the geometric constants in the complexity.

Proceedings ArticleDOI
01 Sep 1995
TL;DR: An accurate analysis of the number of cells visited in nearest-neighbor searching by the bucketing andk-d tree algorithms is provided and empirical evidence is presented showing that the analysis applies even in low dimensions.
Abstract: Given n data points in d-dimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is xed and that n is so large relative to d that boundary eects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2 d . Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching by the bucketing and k-d tree algorithms. We assume m d points uniformly distributed in dimension d, where m is a xed integer 2. Further, we assume that distances are measured in the L1 metric. Our analysis is tight in the limit as d approaches innity. Empirical evidence is presented showing that the analysis applies even in low dimensions.

Journal ArticleDOI
TL;DR: A two-stage neural network structure that combines the characteristics of self-organizing map (SOM) and multilayer perceptron (MLP) for the problem of texture classification is presented and it is found that this mechanism increases the interclass distance and decreases the intraclass distance in the feature space, thereby reducing the complexity of classification.

Journal ArticleDOI
TL;DR: A theoretical model is proposed, in which the teacher knows the classification algorithm and chooses examples in the best way possible, using the nearest-neighbor learning algorithm, and upper and lower bounds on sample complexity for several different concept classes are developed.
Abstract: Proposes a theoretical model for analysis of classification methods, in which the teacher knows the classification algorithm and chooses examples in the best way possible. The authors apply this model using the nearest-neighbor learning algorithm, and develop upper and lower bounds on sample complexity for several different concept classes. For some concept classes, the sample complexity turns out to be exponential even using this best-case model, which implies that the concept class is inherently difficult for the NN algorithm. The authors identify several geometric properties that make learning certain concepts relatively easy. Finally the authors discuss the relation of their work to helpful teacher models, its application to decision tree learning algorithms, and some of its implications for experimental work. >

11 Nov 1995
TL;DR: Martin et al. as mentioned in this paper explored the rationale behind various estimators and the causes of the sometimes conflicting claims regarding their bias and precision and examined empirically the biases and variances of each of the estimators.
Abstract: Author(s): Martin, J. Kent; Hirschberg, D.S. S. | Abstract: Several methods (independent subsamples, leave-one-out, cross-validation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explored in this paper. The biases and variances of each of the estimators are examined empirically. Cross-validation, 10-fold or greater, seems to be the best approach, the other methods are biased, have poorer precision, or are inconsistent. (Though unbiased for linear discriminant classifiers, the 632b bootstrap estimator isbiased for nearest neighbors classifiers, more so for single nearest neighbor than for three nearest neighbors. The 632b estimator is also biased for CART-style decision trees. Weiss LOO* estimator is unbiased and has better precision than cross-validation for discriminant and nearest neighbors classifiers, but its lack of bias and improved precision for those classifiers do not carry over to decision trees for nominal attributes.

Journal ArticleDOI
TL;DR: The role of non-transitivity of the tolerance relation in estimating properties using similarity methods in estimating mutagenicity of 95 aromatic amines and boiling points of a large set of over 2,900 compounds is examined.
Abstract: Molecular similarity methods were used in selecting K nearest neighbors and in estimating mutagenicity of 95 aromatic amines and boiling points of a large set of over 2,900 compounds. Similarity is analyzed in terms of the concept of tolerance space. Specifically, the role of non-transitivity of the tolerance relation in estimating properties using similarity methods is examined.

Journal ArticleDOI
TL;DR: Through simulations on synthetic sources, it is shown that ECPNN and ECVQ have indistinguishable mean-square-error versus rate performance and that the E CPNN and AECPNN algorithms obtain as close performance by the same measure as theECVQ and AecVQ algorithms.
Abstract: A clustering algorithm for the design of efficient vector quantizers to be followed by entropy coding is proposed. The algorithm, called entropy-constrained pairwise nearest neighbor (ECPNN), designs codebooks by merging the pair of Voronoi regions which gives the least increase in distortion for a given decrease in entropy. The algorithm can be used as an alternative to the entropy-constrained vector quantizer design (ECVQ) proposed by Chou, Lookabaugh, and Gray (1989). By a natural extension of the ECPNN algorithm the authors develop another algorithm that designs alphabet and entropy-constrained vector quantizers and call it alphabet- and entropy-constrained pairwise nearest neighbor (AECPNN) design. Through simulations on synthetic sources, it is shown that ECPNN and ECVQ have indistinguishable mean-square-error versus rate performance and that the ECPNN and AECPNN algorithms obtain as close performance by the same measure as the ECVQ and AECVQ (Rao and Pearlman, 1993) algorithms. The advantages over ECVQ are that the ECPNN approach enables much faster codebook design and uses smaller codebooks. A single pass through the ECPNN (or AECPNN) design algorithm, which progresses from larger to successively smaller rates, allows the storage of any desired number of intermediate codebooks. In the context of multirate subband (or transform) coders, this feature is especially desirable. The performance of coding image pyramids using ECPNN and AECPNN codebooks at rates from 1/3 to 1.0 bit/pixel is discussed. >

Proceedings ArticleDOI
25 Apr 1995
TL;DR: This paper compares a couple of fairly well-known nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both one-port and all-port communication architectures and it turns out that thedimension exchange method outperforms the diffusion method in the one- port communication model.
Abstract: With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages work load migrations within its neighborhood. This paper compares a couple of fairly well-known nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both one-port and all-port communication architectures. It turns out that the dimension exchange method outperforms the diffusion method in the one-port communication model, and that the strength of the diffusion method is in asynchronous implementations in the all-port communication model. The underlying communication networks considered assume the most popular topologies, the mesh and the torus and their special cases: the hypercube and the k-ary n-cube. >

Book ChapterDOI
Seishi Okamoto1, Ken Satoh1
23 Oct 1995
TL;DR: An average-case analysis of k-nearest neighbor classifier (k-NNC) for a subclass of Boolean threshold functions based on the formal computation for the predictive accuracy of the classifier under the assumption of noise-free Boolean features and a uniform instance distribution is performed.
Abstract: In this paper, we perform an average-case analysis of k-nearest neighbor classifier (k-NNC) for a subclass of Boolean threshold functions. Our average-case analysis is based on the formal computation for the predictive accuracy of the classifier under the assumption of noise-free Boolean features and a uniform instance distribution. The predictive accuracy is represented as a function of the number of features, the threshold, the number of training instances, and the number of nearest neighbors. We also present the predictive behavior of the classifier by systematically varying the values of the parameters of the accuracy function. We plot the behavior of the classifier by varying the value of k, and then we observe that the performance of the classifier improves as k increases, then reaches a maximum before starting to deteriorate. We further investigate the relationship between the number of training instances and the optimal value of k. We then observe that optimum k increases gradually as the number of training instances increases.


Proceedings ArticleDOI
22 Oct 1995
TL;DR: Tests of the effectiveness of letter tuples in information retrieval systems using neural network classifiers and nearest neighbor classifiers as the retrieval method found them to be effective.
Abstract: Previous work has shown that statistics of letter tuples extracted from text samples can be effective in determining authorship. These statistics have been used to create displays that visually separate the works of different authors, and have been used as input to neural network classifiers which can accurately discriminate between authors. Similar applications are described by Bennett (1976), Clausing (1993), and Damashek (1995). The present paper extends this work by testing the effectiveness of letter tuples in information retrieval systems using neural network classifiers and nearest neighbor classifiers as the retrieval method. Testing was performed using 855 full-text Wall Street Journal articles and 50 narrative queries. Performance of neural and nearest neighbor methods was similar, with the product of recall and precision exceeding 0.1 on the given data.

Journal ArticleDOI
TL;DR: The nearest neighbor classification rule is extended to reject outlier data and is implemented with an analog electronic circuit and a continuous membership function is derived from an optimization formulation of the classification rule.

Proceedings ArticleDOI
28 Sep 1995
TL;DR: This paper proposes hybrid techniques attempting to combine the advantages of decision trees with nearest neighbor methods, by coupling them while using genetic algorithms to further enhance their performances.
Abstract: Decision trees are a rather unique automatic learning approach to power system security assessment, in particular due to their interpretability, their capability to identify the main driving parameters, and their computational efficiency. Yet, other automatic learning methods may have complementary potentials. This paper proposes hybrid techniques attempting to combine the advantages of decision trees with nearest neighbor methods, by coupling them while using genetic algorithms to further enhance their performances. The derived approaches are then applied to a real world study case. It is shown that the hybrid approaches are indeed superior to the corresponding "pure" ones. In particular, the proposed genetic algorithm-based nearest neighbor-decision tree technique shows to be very accurate and efficient for real-time applications.

Journal ArticleDOI
01 Mar 1995
TL;DR: An algorithm is presented for selecting templates from a set of training points, and organizing them into a template tree which is guaranteed to correctly identify all of the training points.
Abstract: Template trees provide a means of accelerating nearest neighbor searches for problems in which KD-trees and similar data structures do not work well because of the high dimension and/or sophisticated distance function used. Suppose the points for which nearest neighbors are being sought are noisy 16/spl times/24 images of characters. Each point has 16/spl times/24=384 dimensions. Images which a good distance function would classify as similar may have very different values at a dozen or more randomly chosen pixels. Template trees work directly with the distance function rather than with the 384 components of the points. An algorithm is presented for selecting templates from a set of training points, and organizing them into a template tree which is guaranteed to correctly identify all of the training points. The tree construction algorithm is similar in many ways to the condensation algorithm for template selection, although it organizes templates into a tree as it selects them. A tree containing approximately 2000 images of capital letters was constructed using a training set of about 8000 points. Using the tree, an average of only about 140 point to point distance calculations were needed to identify an unknown image. Identification accuracy was comparable to that obtained using 2000 templates without a tree. >