scispace - formally typeset
Search or ask a question

Showing papers on "k-nearest neighbors algorithm published in 1970"


Journal ArticleDOI
TL;DR: Here the (k,k?) nearest neighbor rule with a reject option is examined, which looks at the k nearest neighbors and rejects if less than k? of these are from the same class; if k? or more are from one class, a decision is made in favor of that class.
Abstract: An observation comes from one of two possible classes. If all the statistics of the problem are known, Bayes' classification scheme yields the minimum probability of error. If, instead, the statistics are not known and one is given only a labeled training set, it is known that the nearest neighbor rule has an asymptotic error no greater than twice that of Bayes' rule. Here the (k,k?) nearest neighbor rule with a reject option is examined. This rule looks at the k nearest neighbors and rejects if less than k? of these are from the same class; if k? or more are from one class, a decision is made in favor of that class. The error rate of such a rule is bounded in terms of the Bayes' error rate.

206 citations


Journal ArticleDOI
TL;DR: A family of supervised, nonparametric decision rules, based on tolerance regions, is described which includes the k -Nearest Neighbor decision rules when there are two classes.
Abstract: A family of supervised, nonparametric decision rules, based on tolerance regions, is described which includes the k -Nearest Neighbor decision rules when there are two classes. There are two practical reasons for doing so: first, a family of decision rules similar to the k -Nearest Neighbor rules can be specified which applies to a broader collection of pattern recognition problems. This is because in the general class of rules constraints are weakened between the number of training samples required in each training sample set and the respective a priori class probabilities; and, a discrete loss function weighting the importance of the finite number of ways to make a decision error can be introduced. Second, within the family of decision rules based on tolerance regions, there are decision rules which have a property allowing for preprocessing of the training set data resulting in significant data reduction. Theoretical performance for a special case is presented.

168 citations



Journal ArticleDOI
TL;DR: An approach is developed which can frequently be used to find a nonorthogonal transformation to project the patterns into a feature space of considerably lower dimensionality.
Abstract: It is known that R linearly separable classes of multidimensional pattern vectors can always be represented in a feature space of at most R dimensions. An approach is developed which can frequently be used to find a nonorthogonal transformation to project the patterns into a feature space of considerably lower dimensionality. Examples involving classification of handwritten and printed digits are used to illustrate the technique.

28 citations


Journal ArticleDOI
TL;DR: Theoretical results include demonstrations of the facts that the proximity of the nearest neighbor to a new sample in a collection of n samples becomes (in probability) arbitrarily small as n is increased and that the convergence is often (but not always) with probability 1.
Abstract: This paper focuses on the problem of the relationship between the risk incurred using a nearest neighbor rule and the size of the data base. Theoretical results include demonstrations of the facts that the proximity of the nearest neighbor to a new sample in a collection of n samples becomes (in probability) arbitrarily small as n is increased; that the convergence is often (but not always) with probability 1; that as a result of these convergences, the risk associated with a decision may be closely controlled; and that these facts and their demonstrations aid one in determining the size of a sample of data to be used as a nearest neighbor decision-making base. An example serves to demonstrate that the size of the data base required to meet performance criteria other than the relatively lax expected risk criterion can be unreasonably large.

19 citations


Journal ArticleDOI
B.U Felderhof1
TL;DR: In this article, an explicit expression for the pair correlation function g(r) in the class of one-dimensional many-body cluster interaction models was derived, and it was shown that the product property and the compressibility theorem are satisfied.

16 citations



Proceedings ArticleDOI
T. Wagner1
01 Dec 1970
TL;DR: It is shown that when the samples lie in n-dimensional Euclidean space, the probability of error for the nearest nearest neighbor rule conditioned on the n known samples converges to R with probability 1 for mild continuity and moment assumptions on the class densities.
Abstract: If the nearest neighbor rule is used to classify unknown samples then Cover and Hart have shown that the average probability of error using n known samples (denoted by Rn)converges to a number R as n tends to infinity where R* ? R ? 2R* (1-R*) and R* is the Bayes probability of error. Here it is shown that when the samples lie in n-dimensional Euclidean space, the probability of error for the nearest nearest neighbor rule conditioned on the n known samples (denoted by Ln so that ELn = Rn) converges to R with probability 1 for mild continuity and moment assumptions on the class densities. Two estimates of R from the n known samples are shown to be consistent. Rates of convergence of Ln to R are also given.

4 citations


Journal ArticleDOI
TL;DR: Dacey (1958) introduced into geography the technique of nearest neighbor analysis, which was developed originally by Clark and Evans (1954) for measuring spatial relationships among biological populations, and extended it to study spacing along lines, and generalized these concepts to describe patterns in multi-dimensional patterns.
Abstract: Geographers interested in describing and analyzing settlement patterns have been concerned with trying to distinguish among different observed patterns. Although they have been able to describe them as dispersed, random, or clustered, until recently they have had no way of deciding objectively which of these types of patterns prevailed in a given area. Dacey (1958) introduced into geography the technique of nearest neighbor analysis, which was developed originally by Clark and Evans (1954) for measuring spatial relationships among biological populations. By this technique the departure from randomness in the distribution of a two-dimensional point pattern can be ascertained. The distance from each point to its nearest neighbor is measured and the mean observed distance is compared with the distance which would be expected if the same number of points were distributed at random over the same area. The nearest neighbor statistic R is a measure of the degree of departure from randomness in either of two directions: towards clustering or towards uniformity. Dacey (1960b, 1962) later applied the technique of nearest neighbor analysis to central places in southwestern Wisconsin. He extended it to study spacing along lines, and generalized these concepts to describe patterns in multi-dimensional

4 citations



Journal ArticleDOI
TL;DR: In this paper, an exact normal mode analysis is carried out for a free-ended monatomic linear lattice with both nearest-neighbor and next-nearest-nighbor harmonic interactions.

Journal ArticleDOI
TL;DR: In this paper, the Curie point anomaly in thea-axis linear thermal expansion coefficient of CuK2Cl4 · 2H2O (Tc=0.88° K) has been observed using the three-terminal capacitance technique.
Abstract: The Curie-point anomaly in thea-axis linear thermal expansion coefficient of CuK2Cl4 · 2H2O (Tc=0.88° K) has been observed using the three-terminal capacitance technique. Length changes of the 3.8-mm single-crystal sample were determined to within approximately 0.1 A. In the critical region our data suggest a logarithmic singularity as found previously for the specific heat. However, imperfections in the sample limit the divergence of the expansion coefficient at temperatures closer than 0.01Tc to the transition. From a comparison of the linear expansion coefficient with the specific heat in the critical region, the stress dependence of the Curie temperature is calculated. We find that the temperature derivative of the spin-correlation function describing nearest neighbor magnetic ions is not proportional to the temperature derivative of the spin-correlation function describing next nearest neighbors. Furthermore, the exchange parameters characterizing nearest and next nearest neighbor interactions do not have equal stress dependences. Between 1.5 and 2.5° K the thermal expansion coefficient is proportional to the inverse square of the temperature. Comparison of the expansion coefficient with the specific heat in this temperature range indicates that the temperature derivative of both spin-correlation functions is proportional toT−2. The stress dependence of the Curie temperature calculated from data in this region agrees within experimental error with the value found from different considerations using data in the critical region.