scispace - formally typeset

k-nearest neighbors algorithm

About: k-nearest neighbors algorithm is a(n) research topic. Over the lifetime, 10728 publication(s) have been published within this topic receiving 319394 citation(s). The topic is also known as: k-nearest neighbor algorithm & K-NN. more


Open accessJournal ArticleDOI: 10.1109/TIT.1967.1053964
Thomas M. Cover1, Peter E. Hart2Institutions (2)
Abstract: The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor. more

10,453 Citations

Open accessJournal ArticleDOI: 10.1109/TAC.2003.812781
Ali Jadbabaie1, Jie Lin1, A.S. Morse1Institutions (1)
Abstract: In a recent Physical Review Letters article, Vicsek et al. propose a simple but compelling discrete-time model of n autonomous agents (i.e., points or particles) all moving in the plane with the same speed but with different headings. Each agent's heading is updated using a local rule based on the average of its own heading plus the headings of its "neighbors." In their paper, Vicsek et al. provide simulation results which demonstrate that the nearest neighbor rule they are studying can cause all agents to eventually move in the same direction despite the absence of centralized coordination and despite the fact that each agent's set of nearest neighbors change with time as the system evolves. This paper provides a theoretical explanation for this observed behavior. In addition, convergence results are derived for several other similarly inspired models. The Vicsek model proves to be a graphic example of a switched linear system which is stable, but for which there does not exist a common quadratic Lyapunov function. more

7,860 Citations

Open accessJournal ArticleDOI: 10.1162/089976698300017467
01 Jul 1998-Neural Computation
Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition. more

Topics: Kernel principal component analysis (69%), Kernel method (63%), Polynomial kernel (63%) more

7,611 Citations

Open accessProceedings Article
08 Jul 1997-
Abstract: This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholding with a k nearest neighbor classi er on the Reuters cor pus removal of up to removal of unique terms actually yielded an improved classi cation accuracy measured by average preci sion DF thresholding performed similarly Indeed we found strong correlations between the DF IG and CHI values of a term This suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive TS compares favorably with the other methods with up to vocabulary reduction but is not competitive at higher vo cabulary reduction levels In contrast MI had relatively poor performance due to its bias towards favoring rare terms and its sen sitivity to probability estimation errors more

5,276 Citations

Open accessJournal ArticleDOI: 10.1023/A:1022689900470
03 Jan 1991-Machine Learning
Abstract: Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm. more

Topics: Instance-based learning (72%), Lazy learning (60%), Decision tree learning (56%) more

4,492 Citations

No. of papers in the topic in previous years

Top Attributes

Show by:

Topic's top 5 most impactful authors

Jeng-Shyang Pan

18 papers, 209 citations

Francisco Herrera

17 papers, 972 citations

Yunjun Gao

13 papers, 266 citations

Piotr Indyk

13 papers, 7.7K citations

Ippei Torii

11 papers, 32 citations

Network Information
Related Topics (5)
Entropy (information theory)

23.2K papers, 472.2K citations

87% related

40.6K papers, 905.2K citations

85% related
Cluster analysis

146.5K papers, 2.9M citations

85% related
Dimensionality reduction

21.9K papers, 579.2K citations

85% related
Probability distribution

40.9K papers, 1.1M citations

84% related