scispace - formally typeset
Search or ask a question

Showing papers on "k-nearest neighbors algorithm published in 2009"


Journal ArticleDOI
TL;DR: This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
Abstract: The accuracy of k-nearest neighbor (kNN) classification depends significantly on the metric used to compute distances between different examples. In this paper, we show how to learn a Mahalanobis distance metric for kNN classification from labeled examples. The Mahalanobis metric can equivalently be viewed as a global linear transformation of the input space that precedes kNN classification using Euclidean distances. In our approach, the metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. As in support vector machines (SVMs), the margin criterion leads to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our approach requires no modification or extension for problems in multiway (as opposed to binary) classification. In our framework, the Mahalanobis distance metric is obtained as the solution to a semidefinite program. On several data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification. Sometimes these results can be further improved by clustering the training examples and learning an individual metric within each cluster. We show how to learn and combine these local metrics in a globally integrated manner.

4,157 citations


Journal ArticleDOI

1,155 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: A new time series primitive, time series shapelets, is introduced, which can be interpretable, more accurate and significantly faster than state-of-the-art classifiers.
Abstract: Classification of time series has been attracting great interest over the past decade. Recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data.In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. As we shall show with extensive empirical evaluations in diverse domains, algorithms based on the time series shapelet primitives can be interpretable, more accurate and significantly faster than state-of-the-art classifiers.

930 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work proposes TagProp, a discriminatively trained nearest neighbor model that allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set, and introduces a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words.
Abstract: Image auto-annotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearest-neighbor model to exploit labeled training images. Neighbor weights are based on neighbor rank or distance. TagProp allows the integration of metric learning by directly maximizing the log-likelihood of the tag predictions in the training set. In this manner, we can optimally combine a collection of image similarity metrics that cover different aspects of image content, such as local shape descriptors, or global color histograms. We also introduce a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words. We investigate the performance of different variants of our model and compare to existing work. We present experimental results for three challenging data sets. On all three, TagProp makes a marked improvement as compared to the current state-of-the-art.

739 citations


Journal ArticleDOI
TL;DR: This paper proposes a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances and achieves comparable performance to other well-established multi- label learning algorithms.

433 citations


Journal ArticleDOI
TL;DR: It is shown that the speed of convergence of the k-NN method can be further improved by an adaptive choice of k.i.d., and the new universal estimator of divergence is proved to be asymptotically unbiased and mean-square consistent.
Abstract: A new universal estimator of divergence is presented for multidimensional continuous densities based on k-nearest-neighbor (k-NN) distances. Assuming independent and identically distributed (i.i.d.) samples, the new estimator is proved to be asymptotically unbiased and mean-square consistent. In experiments with high-dimensional data, the k-NN approach generally exhibits faster convergence than previous algorithms. It is also shown that the speed of convergence of the k-NN method can be further improved by an adaptive choice of k.

372 citations


Journal ArticleDOI
TL;DR: There is a single approach to nearest neighbor searching, which both improves upon existing results and spans the spectrum of space-time tradeoffs, and new algorithms for constructing AVDs and tools for analyzing their total space requirements are provided.
Abstract: Nearest neighbor searching is the problem of preprocessing a set of n point points in d-dimensional space so that, given any query point q, it is possible to report the closest point to q rapidly. In approximate nearest neighbor searching, a parameter e > 0 is given, and a multiplicative error of (1 + e) is allowed. We assume that the dimension d is a constant and treat n and e as asymptotic quantities. Numerous solutions have been proposed, ranging from low-space solutions having space O(n) and query time O(log n + 1/ed−1) to high-space solutions having space roughly O((n log n)/ed) and query time O(log (n/e)).We show that there is a single approach to this fundamental problem, which both improves upon existing results and spans the spectrum of space-time tradeoffs. Given a tradeoff parameter γ, where 2 ≤ γ ≤ 1/e, we show that there exists a data structure of space O(nγd−1 log(1/e)) that can answer queries in time O(log(nγ) + 1/(eγ)(d−1)/2. When γ = 2, this yields a data structure of space O(n log (1/e)) that can answer queries in time O(log n + 1/e(d−1)/2). When γ = 1/e, it provides a data structure of space O((n/ed−1)log(1/e)) that can answer queries in time O(log(n/e)).Our results are based on a data structure called a (t,e)-AVD, which is a hierarchical quadtree-based subdivision of space into cells. Each cell stores up to t representative points of the set, such that for any query point q in the cell at least one of these points is an approximate nearest neighbor of q. We provide new algorithms for constructing AVDs and tools for analyzing their total space requirements. We also establish lower bounds on the space complexity of AVDs, and show that, up to a factor of O(log (1/e)), our space bounds are asymptotically tight in the two extremes, γ = 2 and γ = 1/e.

266 citations


Journal Article
TL;DR: Two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d) and an additional refinement step is performed to improve the accuracy of the graph.
Abstract: Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.

253 citations


Proceedings ArticleDOI
29 Jun 2009
TL;DR: This work proposes a new access method called the locality sensitive B-tree (LSB-tree) that enables fast high-dimensional NN search with excellent quality and reduces its space and query cost dramatically, and outperforms adhoc-LSH even though the latter has no quality guarantee.
Abstract: Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast high-dimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality.

236 citations


Journal ArticleDOI
TL;DR: In this paper, a two-stage feature selection and weighting technique (TFSWT) via Euclidean distance evaluation technique (EDET) is presented and adopted to select sensitive features and remove fault-unrelated features.

224 citations


Journal ArticleDOI
TL;DR: An algorithm for land mine detection using sensor data generated by a ground-penetrating radar (GPR) system that uses edge histogram descriptors for feature extraction and a possibilistic K -nearest neighbors (K-NNs) rule for confidence assignment is described, demonstrating the best performance among several high-performance algorithms.
Abstract: This paper describes an algorithm for land mine detection using sensor data generated by a ground-penetrating radar (GPR) system that uses edge histogram descriptors for feature extraction and a possibilistic K -nearest neighbors (K-NNs) rule for confidence assignment. The algorithm demonstrated the best performance among several high-performance algorithms in extensive testing on a large real-world datasets associated with the difficult problem of land mine detection. The superior performance of the algorithm is attributed to the use of the possibilistic K -NN algorithm, thereby providing important evidence supporting the use of possibilistic methods in real-world applications. The GPR produces a 3-D array of intensity values, representing a volume below the surface of the ground. First, a computationally inexpensive prescreening algorithm for anomaly detection is used to focus attention and identify candidate signatures that resemble mines. The identified regions of interest are processed further by a feature extraction algorithm to capture their salient features. We use translation-invariant features that are based on the local edge distribution of the 3-D GPR signatures. Specifically, each 3-D signature is divided into subsignatures, and the local edge distribution for each subsignature is represented by a histogram. Next, the training signatures are clustered to identify prototypes. The main idea is to identify few prototypes that can capture the variations of the signatures within each class. These variations could be due to different mine types, different soil conditions, different weather conditions, etc. Fuzzy memberships are assigned to these representatives to capture their degree of sharing among the mines and false alarm classes. Finally, a possibilistic K-NN-based rule is used to assign a confidence value to distinguish true detections from false alarms. The proposed algorithm is implemented and integrated within a complete land mine prototype system. It is trained, field-tested, evaluated, and compared using a large-scale cross-validation experiment that uses a diverse dataset acquired from four outdoor test sites at different geographic locations. This collection covers over 41 807 m2 of ground and includes 1593 mine encounters.

Journal ArticleDOI
TL;DR: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments TACOA, which can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.
Abstract: Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning.

Journal ArticleDOI
TL;DR: The proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection, and the irrelevant or redundant features are removed from high dimensional input feature space.
Abstract: In this paper, we have proposed a new feature selection method called kernel F-score feature selection (KFFS) used as pre-processing step in the classification of medical datasets. KFFS consists of two phases. In the first phase, input spaces (features) of medical datasets have been transformed to kernel space by means of Linear (Lin) or Radial Basis Function (RBF) kernel functions. By this way, the dimensions of medical datasets have increased to high dimension feature space. In the second phase, the F-score values of medical datasets with high dimensional feature space have been calculated using F-score formula. And then the mean value of calculated F-scores has been computed. If the F-score value of any feature in medical datasets is bigger than this mean value, that feature will be selected. Otherwise, that feature is removed from feature space. Thanks to KFFS method, the irrelevant or redundant features are removed from high dimensional input feature space. The cause of using kernel functions transforms from non-linearly separable medical dataset to a linearly separable feature space. In this study, we have used the heart disease dataset, SPECT (Single Photon Emission Computed Tomography) images dataset, and Escherichia coli Promoter Gene Sequence dataset taken from UCI (University California, Irvine) machine learning database to test the performance of KFFS method. As classification algorithms, Least Square Support Vector Machine (LS-SVM) and Levenberg-Marquardt Artificial Neural Network have been used. As shown in the obtained results, the proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection.

Journal ArticleDOI
TL;DR: For a range of phenylene- and thiophene-based conjugated polymers of practical relevance for optoelectronic applications, exciton couplings in one-dimensional stacks deviate significantly from the nearest neighbor approximation, so long-range interactions with non-nearest neighbors have to be included.
Abstract: We demonstrate that for a range of phenylene- and thiophene-based conjugated polymers of practical relevance for optoelectronic applications, exciton couplings in one-dimensional stacks deviate significantly from the nearest neighbor approximation. Instead, long-range interactions with non-nearest neighbors have to be included, which become increasingly important with growing oligomer size. While the exciton coupling vanishes for infinitely long ideal polymer chains and provides a sensitive measure of the actual conjugation length, the electronic coupling mediating charge transport shows rapid convergence with molecular size. Similar results have been obtained for very different molecular backbones, thus highlighting the general character of these findings.

Journal ArticleDOI
01 Aug 2009
TL;DR: This paper studies a related problem called MaxBRNN: find an optimal region that maximizes the size of BRNNs and comes up with an efficient algorithm called MaxOverlap, which is many times faster than the best-known technique.
Abstract: Bichromatic reverse nearest neighbor (BRNN) has been extensively studied in spatial database literature. In this paper, we study a related problem called MaxBRNN: find an optimal region that maximizes the size of BRNNs. Such a problem has many real life applications, including the problem of finding a new server point that attracts as many customers as possible by proximity. A straightforward approach is to determine the BRNNs for all possible points that are not feasible since there are a large (or infinite) number of possible points. To the best of our knowledge, the fastest known method has exponential time complexity on the data size. Based on some interesting properties of the problem, we come up with an efficient algorithm called MaxOverlap. Extensive experiments are conducted to show that our algorithm is many times faster than the best-known technique.

Journal ArticleDOI
TL;DR: A metaheuristic algorithm is proposed in order to classify the cells of pap-smear samples, and shows that classification accuracy generally outperforms other previously applied intelligent approaches.

Journal ArticleDOI
TL;DR: Experiments show that the proposed approach effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN, and is a simple and a fast condensing algorithm.
Abstract: The K-nearest neighbor (KNN) rule is one of the most widely used pattern classification algorithms. For large data sets, the computational demands for classifying patterns using KNN can be prohibitive. A way to alleviate this problem is through the condensing approach. This means we remove patterns that are more of a computational burden but do not contribute to better classification accuracy. In this brief, we propose a new condensing algorithm. The proposed idea is based on defining the so-called chain. This is a sequence of nearest neighbors from alternating classes. We make the point that patterns further down the chain are close to the classification boundary and based on that we set a cutoff for the patterns we keep in the training set. Experiments show that the proposed approach effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN. Moreover, it is a simple and a fast condensing algorithm.

Book ChapterDOI
01 Jan 2009
TL;DR: The k-nearest neighbor (k-NN) method uses the well-known principle of Cicero pares cum paribus facillime congregantur (birds of a feather flock together or literally equals with equals easily associate) to classify an unknown sample based on the known classification of its neighbors.
Abstract: The k-nearest neighbor (k-NN) method is one of the data mining techniques considered to be among the top 10 techniques for data mining [237]. The k-NN method uses the well-known principle of Cicero pares cum paribus facillime congregantur (birds of a feather flock together or literally equals with equals easily associate). It tries to classify an unknown sample based on the known classification of its neighbors. Let us suppose that a set of samples with known classification is available, the so-called training set. Intuitively, each sample should be classified similarly to its surrounding samples. Therefore, if the classification of a sample is unknown, then it could be predicted by considering the classification of its nearest neighbor samples. Given an unknown sample and a training set, all the distances between the unknown sample and all the samples in the training set can be computed. The distance with the smallest value corresponds to the sample in the training set closest to the unknown sample. Therefore, the unknown sample may be classified based on the classification of this nearest neighbor.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A text-based image feature is introduced and it is demonstrated that it consistently improves performance on hard object classification problems, and is particularly effective when the training dataset is small.
Abstract: We introduce a text-based image feature and demonstrate that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. We do not inspect or correct the tags and expect that they are noisy. We obtain the text feature of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. Our text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. We test the performance of this feature using PASCAL VOC 2006 and 2007 datasets. Our feature performs well, consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work proposes a novel framework involving the feature- tree to index large scale motion features using Sphere/Rectangle-tree (SR-tree) and provides an effective way for practical incremental action recognition.
Abstract: Action recognition methods suffer from many drawbacks in practice, which include (1)the inability to cope with incremental recognition problems; (2)the requirement of an intensive training stage to obtain good performance; (3) the inability to recognize simultaneous multiple actions; and (4) difficulty in performing recognition frame by frame. In order to overcome all these drawbacks using a single method, we propose a novel framework involving the feature- tree to index large scale motion features using Sphere/Rectangle-tree (SR-tree). The recognition consists of the following two steps: 1) recognizing the local features by non-parametric nearest neighbor (NN), 2) using a simple voting strategy to label the action. The proposed method can provide the localization of the action. Since our method does not require feature quantization, the feature- tree can be efficiently grown by adding features from new training examples of actions or categories. Our method provides an effective way for practical incremental action recognition. Furthermore, it can handle large scale datasets due to the fact that the SR-tree is a disk-based data structure. We have tested our approach on two publicly available datasets, the KTH and the IXMAS multi-view datasets, and obtained promising results.

Journal ArticleDOI
TL;DR: This paper presents a fast minimum spanning tree-inspired clustering algorithm, which, by using an efficient implementation of the cut and the cycle property of the minimum spanning trees, can have much better performance than O(N2).
Abstract: Due to their ability to detect clusters with irregular boundaries, minimum spanning tree-based clustering algorithms have been widely used in practice. However, in such clustering algorithms, the search for nearest neighbor in the construction of minimum spanning trees is the main source of computation and the standard solutions take O(N2) time. In this paper, we present a fast minimum spanning tree-inspired clustering algorithm, which, by using an efficient implementation of the cut and the cycle property of the minimum spanning trees, can have much better performance than O(N2).

Posted Content
TL;DR: In this article, a non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on $n$-point nominal data is proposed.
Abstract: We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on $n$-point nominal data. Anomalies are declared whenever the score of a test sample falls below $\alpha$, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, $\alpha$, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper studies a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set, and shows that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-Occurrences).
Abstract: High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.

Proceedings Article
11 Jul 2009
TL;DR: ECTS (Early Classification on Time Series), an effective 1-nearest neighbor classification method that makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the full-length time series.
Abstract: In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as health-informatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective 1-nearest neighbor classification method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the full-length time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classification is effective.

Journal ArticleDOI
TL;DR: In this paper, a computationally efficient mode space simulation method for atomistic simulation of a graphene nanoribbon field-effect transistor in the ballistic limits is developed, which is based on the atomistic Hamiltonian in a decoupled mode space.
Abstract: A computationally efficient mode space simulation method for atomistic simulation of a graphene nanoribbon field-effect transistor in the ballistic limits is developed. The proposed simulation scheme, which solves the nonequilibrium Green's function coupled with a three dimensional Poisson equation, is based on the atomistic Hamiltonian in a decoupled mode space. The mode space approach, which only treats a few modes (subbands), significantly reduces the simulation time. Additionally, the edge bond relaxation and the third nearest neighbor effects are also included in the quantum transport solver. Simulation examples show that, mode space approach can significantly decrease the simulation cost by about an order of magnitude, yet the results are still accurate. This article also demonstrates that the effects of edge bond relaxation and third nearest neighbor significantly influence the transistor's performance and are necessary to be included in the modeling.

Journal ArticleDOI
TL;DR: All of the models developed in this work are fast and precise enough to be applicable for virtual screening of CYP1A2 inhibitors or noninhibitors or can be used as simple filters in the drug discovery process.
Abstract: The cytochrome P450 (P450) superfamily plays an important role in the metabolism of drug compounds, and it is therefore highly desirable to have models that can predict whether a compound interacts with a specific isoform of the P450s. In this work, we provide in silico models for classification of CYP1A2 inhibitors and noninhibitors. Training and test sets consisted of approximately 400 and 7000 compounds, respectively. Various machine learning techniques, such as binary quantitative structure activity relationship, support vector machine (SVM), random forest, kappa nearest neighbor (kNN), and decision tree methods were used to develop in silico models, based on Volsurf and Molecular Operating Environment descriptors. The best models were obtained using the SVM, random forest, and kNN methods in combination with the BestFirst variable selection method, resulting in models with 73 to 76% of accuracy on the test set prediction (Matthews correlation coefficients of 0.51 and 0.52). Finally, a decision tree model based on Lipinski's Rule-of-Five descriptors was also developed. This model predicts 67% of the compounds correctly and gives a simple and interesting insight into the issue of classification. All of the models developed in this work are fast and precise enough to be applicable for virtual screening of CYP1A2 inhibitors or noninhibitors or can be used as simple filters in the drug discovery process.

Journal ArticleDOI
TL;DR: In this article, a computationally efficient mode space simulation method for atomistic simulation of a graphene nanoribbon field-effect transistor in the ballistic limits is developed, which is based on the atomistic Hamiltonian in a decoupled mode space.
Abstract: A computationally efficient mode space simulation method for atomistic simulation of a graphene nanoribbon field-effect transistor in the ballistic limits is developed. The proposed simulation scheme, which solves the nonequilibrium Green’s function coupled with a three dimensional Poisson equation, is based on the atomistic Hamiltonian in a decoupled mode space. The mode space approach, which only treats a few modes (subbands), significantly reduces the simulation time. Additionally, the edge bond relaxation and the third nearest neighbor effects are also included in the quantum transport solver. Simulation examples show that the mode space approach can significantly decrease the simulation cost by about an order of magnitude, yet the results are still accurate. This article also demonstrates that the effects of the edge bond relaxation and the third nearest neighbor significantly influence the transistor’s performance and are necessary to be included in the modeling.

Journal ArticleDOI
TL;DR: This paper proposes a learning algorithm that attempts to maximize the leave-one-out (LV1) classification rate of the NN rule by adjusting the weights of the training instances, and shows that this scheme has comparable or better performance than some recent methods proposed in the literature for the task of learning the distance function and/or prototype reduction.

Proceedings Article
07 Dec 2009
TL;DR: The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the anomaly density is a mixture of the nominal and a known density.
Abstract: We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Anomalies are declared whenever the score of a test sample falls below α, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces.

Journal ArticleDOI
TL;DR: A new pseudo nearest neighbor classification rule that utilizes the distance weighted local learning in each class to get a new nearest neighbor of the unlabeled pattern-pseudo nearest neighbor (PNN), and then assigns the label associated with the PNN for the unl Isabeled pattern using the NNR.
Abstract: In this paper, we propose a new pseudo nearest neighbor classification rule (PNNR). It is different from the previous nearest neighbor rule (NNR), this new rule utilizes the distance weighted local learning in each class to get a new nearest neighbor of the unlabeled pattern-pseudo nearest neighbor (PNN), and then assigns the label associated with the PNN for the unlabeled pattern using the NNR. The proposed PNNR is compared with the k-NNR, distance weighted k-NNR, and the local mean-based nonparametric classification [Mitani, Y., & Hamamoto, Y. (2006). A local mean-based nonparametric classifier. Pattern Recognition Letters, 27, 1151-1159] in terms of the classification accuracy on the unknown patterns. Experimental results confirm the validity of this new classification rule even in practical situations.