Showing papers on "k-nearest neighbors algorithm published in 2003"

PDF

Open Access

Journal Article•DOI•

Coordination of groups of mobile autonomous agents using nearest neighbor rules

[...]

Ali Jadbabaie¹, Jie Lin¹, A.S. Morse¹•Institutions (1)

20 Jun 2003-IEEE Transactions on Automatic Control

TL;DR: A theoretical explanation for the observed behavior of the Vicsek model, which proves to be a graphic example of a switched linear system which is stable, but for which there does not exist a common quadratic Lyapunov function.

...read moreread less

Abstract: In a recent Physical Review Letters article, Vicsek et al. propose a simple but compelling discrete-time model of n autonomous agents (i.e., points or particles) all moving in the plane with the same speed but with different headings. Each agent's heading is updated using a local rule based on the average of its own heading plus the headings of its "neighbors." In their paper, Vicsek et al. provide simulation results which demonstrate that the nearest neighbor rule they are studying can cause all agents to eventually move in the same direction despite the absence of centralized coordination and despite the fact that each agent's set of nearest neighbors change with time as the system evolves. This paper provides a theoretical explanation for this observed behavior. In addition, convergence results are derived for several other similarly inspired models. The Vicsek model proves to be a graphic example of a switched linear system which is stable, but for which there does not exist a common quadratic Lyapunov function.

...read moreread less

8,233 citations

Proceedings Article•DOI•

Fast pose estimation with parameter-sensitive hashing

[...]

Shakhnarovich¹, Viola², Darrell¹•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

13 Oct 2003

TL;DR: A new algorithm is introduced that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task, and can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.

...read moreread less

Abstract: Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends locality-sensitive hashing, a recently developed method to find approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call parameter-sensitive hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.

...read moreread less

929 citations

Journal Article•DOI•

Consistency-based search in feature selection

[...]

Manoranjan Dash¹, Huan Liu²•Institutions (2)

Northwestern University¹, Arizona State University²

01 Dec 2003-Artificial Intelligence

TL;DR: An empirical study is conducted to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

...read moreread less

846 citations

Proceedings Article•

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data.

[...]

Levent Ertoz¹, Michael Steinbach¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

01 Jan 2003

TL;DR: A novel clustering technique that addresses problems with varying densities and high dimensionality, while the use of core points handles problems with shape and size, and a number of optimizations that allow the algorithm to handle large data sets are discussed.

...read moreread less

Abstract: Finding clusters in data, especially high dimensional data, is challenging when the clusters are of widely diering shapes, sizes, and densities, and when the data contains noise and outliers. We present a novel clustering technique that addresses these issues. Our algorithm rst nds the nearest neighbors of each data point and then redenes the similarity between pairs of points in terms of how many nearest neighbors the two points share. Using this denition of similarity, our algorithm identies core points and then builds clusters around the core points. The use of a shared nearest neighbor denition of similarity alleviates problems with varying densities and high dimensionality, while the use of core points handles problems with shape and size. While our algorithm can nd the \dense" clusters that other clustering algorithms nd, it also nds clusters that these approaches overlook, i.e., clusters of low or medium density which represent relatively uniform regions \surrounded" by non-uniform or higher density areas. We experimentally show that our algorithm performs better than traditional methods (e.g., K-means, DBSCAN, CURE) on a variety of data sets: KDD Cup '99 network intrusion data, NASA Earth science time series data, two-dimensional point sets, and documents. The run-time complexity of our technique is O(n 2 ) if the similarity matrix has to be constructed. However, we discuss a number of optimizations that allow the algorithm to handle large data sets ecien tly.

...read moreread less

715 citations

Journal Article•DOI•

Rational selection of training and test sets for the development of validated QSAR models.

[...]

Alexander Golbraikh¹, Min Shen¹, Zhiyan Xiao¹, Yun De Xiao¹, Kuo Hsiung Lee¹, Alexander Tropsha¹ - Show less +2 more•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Feb 2003-Journal of Computer-aided Molecular Design

TL;DR: There is additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and it is argued that this observation is a general property of any QSAR model developed with LOO cross-validation.

...read moreread less

Abstract: Quantitative Structure–Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

...read moreread less

591 citations

Journal Article•DOI•

Handwritten digit recognition: benchmarking of state-of-the-art techniques

[...]

Cheng-Lin Liu¹, Kazuki Nakashima¹, Hiroshi Sako¹, Hiromichi Fujisawa¹•Institutions (1)

Hitachi¹

01 Oct 2003-Pattern Recognition

TL;DR: The results of handwritten digit recognition on well-known image databases using state-of-the-art feature extraction and classification techniques are competitive to the best ones previously reported on the same databases.

...read moreread less

545 citations

Proceedings Article•DOI•

[...]

Ronald Fagin¹, Ravi Kumar¹, Dandapani Sivakumar¹•Institutions (1)

IBM¹

09 Jun 2003

TL;DR: This work proposes a novel approach to performing efficient similarity search and classification in high dimensional data and proves that with high probability, it produces a result that is a (1 + ε) factor approximation to the Euclidean nearest neighbor.

...read moreread less

Abstract: We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the query. In our approach, a small number of independent "voters" rank the database elements based on similarity to the query. These rankings are then combined by a highly efficient aggregation algorithm. Our methodology leads both to techniques for computing approximate nearest neighbors and to a conceptually rich alternative to nearest neighbors.One instantiation of our methodology is as follows. Each voter projects all the vectors (database elements and the query) on a random line (different for each voter), and ranks the database elements based on the proximity of the projections to the projection of the query. The aggregation rule picks the database element that has the best median rank. This combination has several appealing features. On the theoretical side, we prove that with high probability, it produces a result that is a (1 + e) factor approximation to the Euclidean nearest neighbor. On the practical side, it turns out to be extremely efficient, often exploring no more than 5% of the data to obtain very high-quality results. This method is also database-friendly, in that it accesses data primarily in a pre-defined order without random accesses, and, unlike other methods for approximate nearest neighbors, requires almost no extra storage. Also, we extend our approach to deal with the k nearest neighbors.We conduct two sets of experiments to evaluate the efficacy of our methods. Our experiments include two scenarios where nearest neighbors are typically employed---similarity search and classification problems. In both cases, we study the performance of our methods with respect to several evaluation criteria, and conclude that they are uniformly excellent, both in terms of quality of results and in terms of efficiency.

...read moreread less

442 citations

Journal Article•DOI•

Feature fusion: parallel strategy vs. serial strategy

[...]

Jian Yang¹, Jian Yang², Jingyu Yang², Dapeng Zhang¹, Jian Feng Lu² - Show less +1 more•Institutions (2)

Hong Kong Polytechnic University¹, Nanjing University of Science and Technology²

01 Jun 2003-Pattern Recognition

TL;DR: The experimental results indicate that the classification accuracy is increased significantly under parallel feature fusion and also demonstrate that the developed parallel fusion is more effective than the classical serial feature fusion.

...read moreread less

418 citations

Journal Article•DOI•

Feature Weighting in k -Means Clustering

[...]

Dharmendra S. Modha¹, W. Scott Spangler¹•Institutions (1)

IBM¹

01 Sep 2003-Machine Learning

TL;DR: An abstract framework for integrating multiple feature spaces in the k-means clustering algorithm is presented and the effectiveness of feature weighting in clustering on several different application domains is demonstrated.

...read moreread less

Abstract: Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

...read moreread less

414 citations

Journal Article•DOI•

Correction to "Coordination of groups of mobile autonomous agents using nearest neighbor rules"

[...]

Ali Jadbabaie, Jie Lin, A.S. Morse

01 Sep 2003-IEEE Transactions on Automatic Control

345 citations

Journal Article•DOI•

Feature extraction based on the Bhattacharyya distance

[...]

Euisun Choi¹, Chulhee Lee¹•Institutions (1)

Yonsei University¹

01 Aug 2003-Pattern Recognition

TL;DR: A feature extraction method is presented by utilizing an error estimation equation based on the Bhattacharyya distance to use classification errors in the transformed feature space, which are estimated using the error estimation equations, as a criterion for feature extraction.

...read moreread less

Journal Article•DOI•

Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition

[...]

Xuechuan Wang¹, Kuldip K. Paliwal¹•Institutions (1)

Griffith University¹

01 Oct 2003-Pattern Recognition

TL;DR: The minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction and a generalized MCE (GMCE)Training algorithm is proposed to mend the shortcomings of the MCE training algorithm.

...read moreread less

Journal Article•DOI•

Nearest Neighbor Estimates of Entropy

[...]

Harshinder Singh¹, Neeraj Misra², Vladimir Hnizdo¹, Adam Fedorowicz¹, Eugene Demchuk¹ - Show less +1 more•Institutions (2)

National Institute for Occupational Safety and Health¹, Indian Institute of Technology Kanpur²

01 Feb 2003-American Journal of Mathematical and Management Sciences

TL;DR: In this article, the kth nearest neighbor distance between the n sample points, where k (< n − 1) is a fixed positive integer, is used to estimate the entropy of internal rotation in the methanol molecule and of diethyl ether.

...read moreread less

Abstract: SYNOPTIC ABSTRACTMotivated by the problems in molecular sciences, we introduce new nonparametric estimators of entropy which are based on the kth nearest neighbor distances between the n sample points, where k (< n – 1) is a fixed positive integer. These provide competing estimators to an estimator proposed by Kozachenko and Leonenko (1987), which is based on the first nearest neighbor distances of the sample points. These estimators are helpful in the evaluation of entropies of random vectors. We establish the asymptotic unbiasedness and consistency of the proposed estimators. For some standard distributions, we also investigate their performance for finite sample sizes using Monte Carlo simulations. The proposed estimators are applied to estimate the entropy of internal rotation in the methanol molecule, which can be characterized by a one-dimensional random vector, and of diethyl ether, which is described by a four-dimensional random vector.

...read moreread less

Proceedings Article•DOI•

A study of a target tracking algorithm using global nearest neighbor approach

[...]

Pavlina Konstantinova, Alexander Udvarev, Tzvetan Semerdjiev

19 Jun 2003

TL;DR: This paper compares two algorithms for Multiple Target Tracking, using Global Nearest neighbor (GNN) and Suboptimal Nearest Neighbor (SNN) approach respectively and results reveal that in some cases the GNN approach gives batter solution than the SNN approach.

...read moreread less

Abstract: This paper compares two algorithms for Multiple Target Tracking (MTT), using Global Nearest Neighbor (GNN) and Suboptimal Nearest Neighbor (SNN) approach respectively. For both algorithms the observations are divided in clusters to reduce computational efforts. For each cluster the assignment problem is solved by using Munkres algorithm or according SNN rules. Results reveal that in some cases the GNN approach gives batter solution than.SNN approach. The computational time, needed for assignment problem solution using Munkres algorithm is studied and results prove that it is suitable for real time implementations.

...read moreread less

Journal Article•DOI•

A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases

[...]

Cyrus Shahabi¹, Mohammad R. Kolahdouzan¹, Mehdi Sharifzadeh¹•Institutions (1)

University of Southern California¹

01 Sep 2003-Geoinformatica

TL;DR: An embedding technique to transform a road network to a high dimensional space in order to utilize computationally simple Minkowski metrics for distance measurement is applied and the Chessboard distance metric (L∞) in the embedding space preserves the ordering of the distances between a point and its neighbors more precisely.

...read moreread less

Abstract: A very important class of queries in GIS applications is the class of K-nearest neighbor queries. Most of the current studies on the K-nearest neighbor queries utilize spatial index structures and hence are based on the Euclidean distances between the points. In real-world road networks, however, the shortest distance between two points depends on the actual path connecting the points and cannot be computed accurately using one of the Minkowski metrics. Thus, the Euclidean distance may not properly approximate the real distance. In this paper, we apply an embedding technique to transform a road network to a high dimensional space in order to utilize computationally simple Minkowski metrics for distance measurement. Subsequently, we extend our approach to dynamically transform new points into the embedding space. Finally, we propose an efficient technique that can find the actual shortest path between two points in the original road network using only the embedding space. Our empirical experiments indicate that the Chessboard distance metric (L∞) in the embedding space preserves the ordering of the distances between a point and its neighbors more precisely as compared to the Euclidean distance in the original road network.

...read moreread less

Posted Content•

An Improved k-Nearest Neighbor Algorithm for Text Categorization

[...]

Baoli Li, Shiwen Yu, Qin Lu

16 Jun 2003-arXiv: Computation and Language

TL;DR: An improved kNN algorithm is proposed, which uses different numbers of nearest neighbors for different categories, rather than a fixed number across all categories, and is promising for some cases, where estimating the parameter k via cross-validation is not allowed.

...read moreread less

Abstract: k is the most important parameter in a text categorization system based on k-Nearest Neighbor algorithm (kNN).In the classification process, k nearest documents to the test one in the training set are determined firstly. Then, the predication can be made according to the category distribution among these k nearest neighbors. Generally speaking, the class distribution in the training set is uneven. Some classes may have more samples than others. Therefore, the system performance is very sensitive to the choice of the parameter k. And it is very likely that a fixed k value will result in a bias on large categories. To deal with these problems, we propose an improved kNN algorithm, which uses different numbers of nearest neighbors for different categories, rather than a fixed number across all categories. More samples (nearest neighbors) will be used for deciding whether a test document should be classified to a category, which has more samples in the training set. Preliminary experiments on Chinese text categorization show that our method is less sensitive to the parameter k than the traditional one, and it can properly classify documents belonging to smaller classes with a large k. The method is promising for some cases, where estimating the parameter k via cross-validation is not allowed.

...read moreread less

Journal Article•DOI•

Nonparametric discriminant analysis and nearest neighbor classification

[...]

M. Bressan¹, Jordi Vitrià¹•Institutions (1)

Autonomous University of Barcelona¹

01 Nov 2003-Pattern Recognition Letters

TL;DR: It is observed that when the authors seek a linear representation adapted to improve NN performance, what they obtain not surprisingly is quite close to NDA.

...read moreread less

Journal Article•DOI•

Constructing gravitational dimensions

[...]

Matthew D. Schwartz¹•Institutions (1)

Harvard University¹

30 Jul 2003-Physical Review D

TL;DR: In this article, the Fierz-Pauli Lagrangian model with multiple interacting massive gravitons is considered, and it is shown that any model with only nearest neighbor interactions is doomed.

...read moreread less

Abstract: It would be extremely useful to know whether a particular low energy effective theory might have come from a compactification of a higher dimensional space. Here, this problem is approached from the ground up by considering theories with multiple interacting massive gravitons. It is actually very difficult to construct discrete gravitational dimensions which have a local continuum limit. In fact, any model with only nearest neighbor interactions is doomed. If we could find a non-linear extension for the Fierz-Pauli Lagrangian for a graviton of mass ${m}_{g},$ which does not break down until the scale ${\ensuremath{\Lambda}}_{2}=\sqrt{{m}_{g}{M}_{\mathrm{Pl}}},$ this could be used to construct a large class of models whose continuum limit is local in the extra dimension. But this is shown to be impossible: a theory with a single graviton must break down by ${\ensuremath{\Lambda}}_{3}{=(m}_{g}^{2}{M}_{\mathrm{Pl}}{)}^{1/3}.$ Next, we look at how the discretization prescribed by the truncation of the Kaluza-Klein tower of an honest extra dimension raises the scale of strong coupling. It dictates an intricate set of interactions among various fields which conspire to soften the strongest scattering amplitudes and allow for a local continuum limit, at least at the tree level. A number of candidate symmetries associated with locality in the discretized dimension are also discussed.

...read moreread less

Book Chapter•DOI•

HPAT indexing for fast object/scene recognition based on local appearance

[...]

Hao Shao¹, Tomas Svoboda¹, Tinne Tuytelaars², Luc Van Gool¹•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

24 Jul 2003

TL;DR: The paper describes a fast system for appearance based image recognition that uses local invariant descriptors and efficient nearest neighbor search to overcomes the drawbacks of most binary tree-like indexing techniques, namely the high complexity in high dimensional data sets and the boundary problem.

...read moreread less

Abstract: The paper describes a fast system for appearance based image recognition. It uses local invariant descriptors and efficient nearest neighbor search. First, local affine invariant regions are found nested at multiscale intensity extremas. These regions are characterized by nine generalized color moment invariants. An efficient novel method called HPAT (hyper-polyhedron with adaptive threshold) is introduced for efficient localization of the nearest neighbor in feature space. The invariants make the method robust against changing illumination and viewpoint. The locality helps to resolve occlusions. The proposed indexing method overcomes the drawbacks of most binary tree-like indexing techniques, namely the high complexity in high dimensional data sets and the boundary problem. The database representation is very compact and the retrieval close to realtime on a standard PC. The performance of the proposed method is demonstrated on a public database containing 1005 images of urban scenes. Experiments with an image database containing objects are also presented.

...read moreread less

Journal Article•DOI•

A nearest-neighbor chain based approach to skew estimation in document images

[...]

Yue Lu¹, Chew Lim Tan¹•Institutions (1)

National University of Singapore¹

01 Oct 2003-Pattern Recognition Letters

TL;DR: Experimental results show that the proposed nearest-neighbor chain (NNC) based approach has achieved an improved accuracy for estimating document image skew angle and has an advantage of being language independent.

...read moreread less

Patent•

Method and system for classification of semantic content of audio/video data

[...]

Li-Qun Xu, Yongmin Li

09 Jul 2003

TL;DR: In this paper, audio/visual data is classified into semantic classes such as News, Sports, Music video or the like by providing class models for each class and comparing input audio visual data to the models.

...read moreread less

Abstract: Audio/Visual data is classified into semantic classes such as News, Sports, Music video or the like by providing class models for each class and comparing input audio visual data to the models. The class models are generated by extracting feature vectors from training samples, and then subjecting the feature vectors to kernel discriminant analysis or principal component analysis to give discriminatory basis vectors. These vectors are then used to obtain further feature vector of much lower dimension than the original feature vectors, which may then be used directly as a class model, or used to train a Gaussian Mixture Model or the like. During classification of unknown input data, the same feature extraction and analysis steps are performed to obtain the low-dimensional feature vectors, which are then fed into the previously created class models to identify the data genre.

...read moreread less

Proceedings Article•DOI•

[...]

Chun Zeng¹, Chunxiao Xing¹, Lizhu Zhou¹•Institutions (1)

Tsinghua University¹

20 May 2003

TL;DR: This paper adopts two techniques: a matrix conversion method for similarity measure and an instance selection method and presents an improved collaborative filtering algorithm based on these two methods that shows its satisfactory accuracy and performance.

...read moreread less

Abstract: Collaborative filtering has been very successful in both research and applications such as information filtering and E-commerce. The k-Nearest Neighbor (KNN) method is a popular way for its realization. Its key technique is to find k nearest neighbors for a given user to predict his interests. However, this method suffers from two fundamental problems: sparsity and scalability. In this paper, we present our solutions for these two problems. We adopt two techniques: a matrix conversion method for similarity measure and an instance selection method. And then we present an improved collaborative filtering algorithm based on these two methods. In contrast with existing collaborative algorithms, our method shows its satisfactory accuracy and performance.

...read moreread less

Journal Article•DOI•

Breast cancer detection using rank nearest neighbor classification rules

[...]

Subhash C. Bagui¹, Sikha Bagui¹, Kuhu Pal, Nikhil R. Pal²•Institutions (2)

University of West Florida¹, Indian Statistical Institute²

01 Jan 2003-Pattern Recognition

TL;DR: A new generalization of the rank nearest neighbor (RNN) rule for multivariate data for diagnosis of breast cancer is proposed, and the computational complexity of the proposed k-RNN is much less than the conventional k-NN rule.

...read moreread less

Proceedings Article•DOI•

Computer recognition of unconstrained handwritten numerals

[...]

Hatem Abou-zeid, A. El-ghazal, A.A. Al-khatib

27 Dec 2003

TL;DR: This paper proposes a simple yet highly accurate system for the recognition or unconstrained handwritten numerals and illustrates how the basic CL implementation can be extended and used in conjunction with a multilayer perception neural network classifier to increase the recognition rate to 98%.

...read moreread less

Abstract: This paper proposes a simple yet highly accurate system for the recognition or unconstrained handwritten numerals. It starts with an examination of the basic characteristic loci (CL) features used along with a nearest neighbor classifier achieving a recognition rate of 90.5%. We then illustrate how the basic CL implementation can be extended and used in conjunction with a multilayer perception neural network classifier to increase the recognition rate to 98%. This proposed recognition system was tested on a totally unconstrained handwritten numeral database while training it with only 600 samples exclusive from the test set. An accuracy exceeding 98% is also expected if a larger training set is used. Lastly, to demonstrate the effectiveness of the system its performance is also compared to that of some other common recognition schemes. These systems use moment Invariants as features along with nearest neighbor classification schemes.

...read moreread less

Journal Article•DOI•

LDA/SVM driven nearest neighbor classification

[...]

Jing Peng¹, Douglas R. Heisterkamp², Ho-Kwok Dai²•Institutions (2)

Tulane University¹, Oklahoma State University–Stillwater²

01 Jul 2003-IEEE Transactions on Neural Networks

TL;DR: This work uses local support vector machine learning to estimate an effective metric for producing neighborhoods that are elongated along less discriminant feature dimensions and constricted along most discriminant ones, whereby better classification performance can be achieved.

...read moreread less

Abstract: Nearest neighbor (NN) classification relies on the assumption that class conditional probabilities are locally constant. This assumption becomes false in high dimensions with finite samples due to the curse of dimensionality. The NN rule introduces severe bias under these conditions. We propose a locally adaptive neighborhood morphing classification method to try to minimize bias. We use local support vector machine learning to estimate an effective metric for producing neighborhoods that are elongated along less discriminant feature dimensions and constricted along most discriminant ones. As a result, the class conditional probabilities can be expected to be approximately constant in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other competing techniques using a number of datasets.

...read moreread less

Proceedings Article•DOI•

Processing in-route nearest neighbor queries: a comparison of alternative approaches

[...]

Shashi Shekhar¹, Jin Soung Yoo¹•Institutions (1)

University of Minnesota¹

07 Nov 2003

TL;DR: This paper addresses the problem of finding the in-route nearest neighbor (IRNN) for a query object tuple which consists of a given route with a destination and a current location on it and addresses four alternative solution methods.

...read moreread less

Abstract: Nearest neighbor query is one of the most important operations in spatial databases and their application domains, e.g., location-based services, advanced traveler information systems, etc. This paper addresses the problem of finding the in-route nearest neighbor (IRNN) for a query object tuple which consists of a given route with a destination and a current location on it. The IRNN is a facility instance via which the detour from the original route on the way to the destination is smallest. This paper addresses four alternative solution methods. Comparisons among them are presented using an experimental framework. Several experiments using real road map datasets are conducted to examine the behavior of the solutions in terms of three parameters affecting the performance. Our experiments show that the computation costs for all methods except the precomputed zone-based method increase with increases in the road map size and the query route length but decreases with increase in the facility density. The precomputed zone-based method shows the most efficiency when there are no updates on the road map.

...read moreread less

Proceedings Article•DOI•

Simultaneous feature selection and classifier training via linear programming: a case study for face expression recognition

[...]

Guodong Guo¹, Charles R. Dyer¹•Institutions (1)

University of Wisconsin-Madison¹

18 Jun 2003

TL;DR: The linear programming technique used in this paper, which is called feature selection via linear programming (FSLP), can determine the number of features and which features to use in the resulting classification function based on recent results in optimization.

...read moreread less

Abstract: A linear programming technique is introduced that jointly performs feature selection and classifier training so that a subset of features is optimally selected together with the classifier. Because traditional classification methods in computer vision have used a two-step approach: feature selection followed by classifier training, feature selection has often been ad hoc using heuristics or requiring a time-consuming forward and backward search process. Moreover, it is difficult to determine which features to use and how many features to use when these two steps are separated. The linear programming technique used in this paper, which we call feature selection via linear programming (FSLP), can determine the number of features and which features to use in the resulting classification function based on recent results in optimization. We analyze why FSLP can avoid the curse of dimensionality problem based on margin analysis. As one demonstration of the performance of this FSLP technique for computer vision tasks, we apply it to the problem of face expression recognition. Recognition accuracy is compared with results using support vector machines, the AdaBoost algorithm, and a Bayes classifier.

...read moreread less

Dissertation•

Sequence distance embeddings

[...]

Graham Cormode

01 Jan 2003

TL;DR: The embeddings are shown to be practical, with a series of large scale experiments which demonstrate that given only a small space, approximate solutions to several similarity and clustering problems can be found that are as good as or better than those found with prior methods.

...read moreread less

Abstract: Sequences represent a large class of fundamental objects in Computer Science sets, strings, vectors and permutations are considered to be sequences. Distances between sequences measure their similarity, and computations based on distances are ubiquitous: either to compute the distance, or to use distance computation as part of a more complex problem. This thesis takes a very specific approach to solving questions of sequence distance: sequences are embedded into other distance measures, so that distance in the new space approximates the original distance. This allows the solution of a variety of problems including: Fast computation of short sketches in a variety of computing models, which allow sequences to be compared in constant time and space irrespective of the size of the original sequences. Approximate nearest neighbor and clustering problems, significantly faster than the naive exact solutions. Algorithms to find approximate occurrences of pattern sequences in long text sequences in near linear time. Efficient communication schemes to approximate the distance between, and exchange, sequences in close to the optimal amount of communication. Solutions are given for these problems for a variety of distances, including fundamental distances on sets and vectors; distances inspired by biological problems for permutations; and certain text editing distances for strings. Many of these embeddings are computable in a streaming model where the data is too large to store in memory, and instead has to be processed as and when it arrives, piece by piece. The embeddings are also shown to be practical, with a series of large scale experiments which demonstrate that given only a small space, approximate solutions to several similarity and clustering problems can be found that are as good as or better than those found with prior methods.

...read moreread less

Book Chapter•DOI•

Supervised Locally Linear Embedding Algorithm for Pattern Recognition

[...]

Olga Kouropteva¹, Oleg Okun¹, Matti Pietikäinen¹•Institutions (1)

University of Oulu¹

04 Jun 2003

TL;DR: This paper investigates its extension, called supervised locally linear embedding (SLLE), using class labels of data points in their mapping into a low-dimensional space, and derives an efficient eigendecomposition scheme for SLLE.

...read moreread less

Abstract: The dimensionality of the input data often far exceeds their intrinsic dimensionality. As a result, it may be difficult to recognize multidimensional data, especially if the number of samples in a dataset is not large. In addition, the more dimensions the data have, the longer the recognition time is. This leads to the necessity of performing dimensionality reduction before pattern recognition. Locally linear embedding (LLE) 5,6 is one of the methods intended for this task. In this paper, we investigate its extension, called supervised locally linear embedding (SLLE), using class labels of data points in their mapping into a low-dimensional space. An efficient eigendecomposition scheme for SLLE is derived. Two variants of SLLE are analyzed coupled with a k nearest neighbor classifier and tested on real-world images. Preliminary results demonstrate that both variants yield identical best accuracy, despite of being conceptually different.

...read moreread less

Proceedings Article•

Parametric distance metric learning with label information

[...]

Zhihua Zhang¹, James T. Kwok¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

09 Aug 2003

TL;DR: A parametric method for metric learning based on class label information is proposed, which performs parametric learning to find a regression mapping from the input space to a feature space, such that the dissimilarity between patterns in theinput space is approximated by the Euclidean distance between points in the feature space.

...read moreread less

Abstract: Distance-based methods in pattern recognition and machine learning have to rely on a similarity or dissimilarity measure between patterns in the input space. For many applications, Euclidean distance in the input space is not a good choice and hence more complicated distance metrics have to be used. In this paper, we propose a parametric method for metric learning based on class label information. We first define a dissimilarity measure that can be proved to be metric. It has the favorable property that between-class dissimilarity is always larger than within-class dissimilarity. We then perform parametric learning to find a regression mapping from the input space to a feature space, such that the dissimilarity between patterns in the input space is approximated by the Euclidean distance between points in the feature space. Parametric learning is performed using the iterative majorization algorithm. Experimental results on real-world benchmark data sets show that this approach is promising.

...read moreread less

Collapse