scispace - formally typeset
Search or ask a question

Showing papers on "k-nearest neighbors algorithm published in 2023"


Journal ArticleDOI
TL;DR: In this article , the authors proposed an ensemble centroid displacement-based k-NN (EC-k-NN) algorithm, which leverages the homogeneity of the nearest neighbors of test instances.

5 citations


Journal ArticleDOI
TL;DR: In this article , a triplet network approach for two-stage representation learning has been proposed for multi-view classification with limited sample size and data augmentation is a very common machine learning (ML) problem in medicine.
Abstract: Multi-view classification with limited sample size and data augmentation is a very common machine learning (ML) problem in medicine. With limited data, a triplet network approach for two-stage representation learning has been proposed. However, effective training and verifying the features from the representation network for their suitability in subsequent classifiers are still unsolved problems. Although typical distance-based metrics for the training capture the overall class separability of the features, the performance according to these metrics does not always lead to an optimal classification. Consequently, an exhaustive tuning with all feature-classifier combinations is required to search for the best end result. To overcome this challenge, we developed a novel nearest-neighbor (NN) validation strategy based on the triplet metric. This strategy is supported by a theoretical foundation to provide the best selection of the features with a lower bound of the highest end performance. The proposed strategy is a transparent approach to identify whether to improve the features or the classifier. This avoids the need for repeated tuning. Our evaluations on real-world medical imaging tasks (i.e., radiation therapy delivery error prediction and sarcoma survival prediction) show that our strategy is superior to other common deep representation learning baselines [i.e., autoencoder (AE) and softmax]. The strategy addresses the issue of feature's interpretability which enables more holistic feature creation such that the medical experts can focus on specifying relevant data as opposed to tedious feature engineering.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the authors developed a QSAR model using the KNN algorithm to predict the corrosion inhibition performance of the inhibitor compound, where virtual samples are generated and added to the training set using a Virtual Sample Generation (VSG) method.
Abstract: Abstract In this work, we developed a QSAR model using the K-Nearest Neighbor (KNN) algorithm to predict the corrosion inhibition performance of the inhibitor compound. To overcome the small dataset problems, virtual samples are generated and added to the training set using a Virtual Sample Generation (VSG) method. The generalizability of the proposed KNN + VSG model is verified by using six small datasets from references and comparing their prediction performances. The research shows that for the six datasets, the proposed model is able to make predictions with the best accuracy. Adding virtual samples to the training data helps the algorithm recognize feature-target relationship patterns, and therefore increases the number of chemical quantum parameters correlated with corrosion inhibition efficiency. This proposed method strengthens the prospect of ML for developing material designs, especially in the case of small datasets.

3 citations



Journal ArticleDOI
TL;DR: In this article , the authors adapt two existing strategies used in a multivariate version of the well-known Dynamic Time Warping (DTW), namely, Independent and Dependent DTW, to these seven measures.
Abstract: Abstract This paper contributes multivariate versions of seven commonly used elastic similarity and distance measures for time series data analytics. Elastic similarity and distance measures can compensate for misalignments in the time axis of time series data. We adapt two existing strategies used in a multivariate version of the well-known Dynamic Time Warping (DTW), namely, Independent and Dependent DTW, to these seven measures. While these measures can be applied to various time series analysis tasks, we demonstrate their utility on multivariate time series classification using the nearest neighbor classifier. On 23 well-known datasets, we demonstrate that each of the measures but one achieves the highest accuracy relative to others on at least one dataset, supporting the value of developing a suite of multivariate similarity and distance measures. We also demonstrate that there are datasets for which either the dependent versions of all measures are more accurate than their independent counterparts or vice versa. In addition, we also construct a nearest neighbor-based ensemble of the measures and show that it is competitive to other state-of-the-art single-strategy multivariate time series classifiers.

2 citations


Journal ArticleDOI
TL;DR: In this article , a reconfigurable elastic metamaterial with beyond-nearest-neighbor (BNN) coupling was designed and analyzed. The structure is composed from the popular British model-construction system Meccano and supports backward waves with opposite directions of phase and group velocities.
Abstract: We design, simulate, and experimentally characterize a reconfigurable elastic metamaterial with beyond-nearest-neighbor (BNN) coupling. The structure is composed from the popular British model-construction system Meccano and supports backward waves with opposite directions of phase and group velocities. We experimentally verify three distinct configurations and acoustically infer their spatial vibration spectra.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a two-layer decomposition scheme is proposed to fuse the detail layers while retaining brightness feature, which can provide a more objective and comprehensive description of lesions and has significant clinical medical aid potential.

2 citations


Journal ArticleDOI
TL;DR: GGNN as mentioned in this paper proposes a novel GPU-friendly search structure based on nearest neighbor graphs and information propagation on graphs, which is designed to take advantage of GPU architectures to accelerate the hierarchical construction of the index structure and for performing the query.
Abstract: Approximate nearest neighbor (ANN) search in high dimensions is an integral part of several computer vision systems and gains importance in deep learning with explicit memory representations. Since PQT, FAISS, and SONG started to leverage the massive parallelism offered by GPUs, GPU-based implementations are a crucial resource for today's state-of-the-art ANN methods. While most of these methods allow for faster queries, less emphasis is devoted to accelerating the construction of the underlying index structures. In this paper, we propose a novel GPU-friendly search structure based on nearest neighbor graphs and information propagation on graphs. Our method is designed to take advantage of GPU architectures to accelerate the hierarchical construction of the index structure and for performing the query. Empirical evaluation shows that GGNN significantly surpasses the state-of-the-art CPU- and GPU-based systems in terms of build-time, accuracy and search speed.

2 citations


Journal ArticleDOI
01 Jan 2023-Sensors
TL;DR: In this article , a comprehensive survey of exact kNN queries over high-dimensional data space is presented, which covers 20 kNN Search methods and 9 kNN Join methods, based on indexing, data and space partitioning strategies, clustering techniques and computing paradigm.
Abstract: k nearest neighbours (kNN) queries are fundamental in many applications, ranging from data mining, recommendation system and Internet of Things, to Industry 4.0 framework applications. In mining, specifically, it can be used for the classification of human activities, iterative closest point registration and pattern recognition and has also been helpful for intrusion detection systems and fault detection. Due to the importance of kNN queries, many algorithms have been proposed in the literature, for both static and dynamic data. In this paper, we focus on exact kNN queries and present a comprehensive survey of exact kNN queries. In particular, we study two fundamental types of exact kNN queries: the kNN Search queries and the kNN Join queries. Our survey focuses on exact approaches over high-dimensional data space, which covers 20 kNN Search methods and 9 kNN Join methods. To the best of our knowledge, this is the first work of a comprehensive survey of exact kNN queries over high-dimensional datasets. We specifically categorise the algorithms based on indexing strategies, data and space partitioning strategies, clustering techniques and the computing paradigm. We provide useful insights for the evolution of approaches based on the various categorisation factors, as well as the possibility of further expansion. Lastly, we discuss some open challenges and future research directions.

2 citations


Journal ArticleDOI
TL;DR: In this article , the authors focused on the detection of fault in transmission lines through the use of k-nearest neighbor algorithm, and the characteristics were obtained (voltage, current), and these characteristics enable the identification of faults in the transmission lines, and in the specific location (the entire system, phase B, and phase A).
Abstract: The critical factors to consider when implementing a maintenance plan for energy transmission lines are, accuracy, speed, and time, because of the increased global demand for electricity power caused by rapid development, and overuse of electric power transmission lines (both underground cables and overhead transmission lines), which in turn reduces the efficiency of the lines. Consequently, the efficiency of the lines may be reduced as a result of overuse or other activities like excavation that may have tampered with the cables. Thus, it becomes important to investigate the faults to which the lines are exposed. To this end, this article focuses on the detection of fault in transmission lines through the use of k-nearest neighbor algorithm. Using this algorithm, the characteristics were obtained (voltage, current), and these characteristics enable the identification of faults in the transmission lines, and in the specific location (the entire system, phase B, and phase A). The benefits that can be derived from the use of this algorithm include time, accuracy, speed, which are the requirements for the maintenance of transmission lines. Euclidean distance used in the application of the k-nearest neighbor technique for weights, and K = 3 for number of neighbors. The dataset was split into two parts, 70% training set and 30% testing set.

2 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a method to predict the spatio-temporal characteristics of short-term traffic flow by combining the k-nearest neighbor algorithm and bidirectional LSTM network model.
Abstract: In the previous research on traffic flow prediction models, most of the models mainly studied the time series of traffic flow, and the spatial correlation of traffic flow was not fully considered. To solve this problem, this paper proposes a method to predict the spatio-temporal characteristics of short-term traffic flow by combining the k-nearest neighbor algorithm and bidirectional long short-term memory network model. By selecting the real-time traffic flow data observed on high-speed roads in the United Kingdom, the K-nearest neighbor algorithm is used to spatially screen the station data to determine the points with high correlation and then input the BILSTM model for prediction. The experimental results show that compared with SVR, LSTM, GRU, KNN-LSTM, and CNN-LSTM models, the model proposed in this paper has better prediction accuracy, and its performance has been improved by 77%, 19%, 18%, 22%, and 13%, respectively. The proposed K-nearest neighbor-bidirectional long short-time memory model shows better prediction performance.

Proceedings ArticleDOI
12 Feb 2023
TL;DR: FNNG as discussed by the authors is the first FPGA-based accelerator to support k-nearest neighbor graph construction, which is equipped with the block-based scheduling technique to exploit the inherent data locality between vertices.
Abstract: The k-nearest neighbor graph has emerged as the key data structure for many critical applications. However, it can be notoriously challenging to construct k-nearest neighbor graphs over large graph datasets, especially with a high-dimensional vector feature. Many solutions have been recently proposed to support the construction of k-nearest neighbor graphs. However, these solutions involve substantial memory access and computational overheads and an architecture-level solution is still absent. To address these issues, we architect FNNG, the first FPGA-based accelerator to support k-nearest neighbor graph construction. Specifically, FNNG is equipped with the block-based scheduling technique to exploit the inherent data locality between vertices. It divides the vertices that are close in space into blocks and process the vertices according to the granularity of the blocks during the construction process. FNNG also adopts the useless computation aborting technique to identify superfluous useless computations. It keeps the existing maximum similarity values of all vertices inside the computing unit. In addition, we propose an improved architecture in order to fully utilize both techniques. We implement FNNG on the Xilinx Alveo U280 FPGA card. The results show that FNNG achieves 190x and 2.1x speedups over the state-of-the-art CPU and GPU solutions, running on Intel Xeon Gold 5117 CPU and NVIDIA GeForce RTX 3090 GPU, respectively.

Journal ArticleDOI
Shuang Jin, Yu Su, Chuanxin Guo, Ya Fan, Zhi-Yong Tao 
TL;DR: In this paper , a combined classification method is proposed based on the improved empirical mode decomposition to process the underwater signals of offshore ships, where the signals are separated into a series of samples with a signal duration of 100 ms and a set of intrinsic mode functions (IMFs) is generated for each sample.

Journal ArticleDOI
TL;DR: In this paper , a top-down approach was proposed for describing the structure and dynamics of liquid and glass, where global collective forces were used to drive liquid to form density waves.
Abstract: Abstract The structure beyond the nearest neighbor atoms in liquid and glass is characterized by the medium-range order (MRO). In the conventional approach, the MRO is considered to result directly from the short-range order (SRO) in the nearest neighbors. To this bottom–up approach starting with the SRO, we propose to add a top–down approach in which global collective forces drive liquid to form density waves. The two approaches are in conflict with each other, and the compromise produces the structure with the MRO. The driving force to produce density waves provides the stability and stiffness to the MRO, and controls various mechanical properties. This dual framework provides a novel perspective for description of the structure and dynamics of liquid and glass.

Posted ContentDOI
21 May 2023-bioRxiv
TL;DR: Spatial Pattern Analysis using Closest Events (SPACE) as mentioned in this paper leverages nearest neighbor-based point pattern analysis to characterize the spatial relationship of fluorescence microscopy signals from image data.
Abstract: The quantitative description of biological structures is a valuable yet difficult task in the life sciences. This is commonly accomplished by imaging samples using fluorescence microscopy and analyzing resulting images using Pearson’s correlation or Manders’ co-occurrence intensity-based colocalization paradigms. Though conceptually and computationally simple, these approaches are critically flawed due to their reliance on signal overlap, sensitivity to cursory signal qualities, and inability to differentiate true and incidental colocalization. Point pattern analysis provides a framework for quantitative characterization of spatial relationships between spatial patterns using the distances between observations rather than their overlap, thus overcoming these issues. Here we introduce an image analysis tool called Spatial Pattern Analysis using Closest Events (SPACE) that leverages nearest neighbor-based point pattern analysis to characterize the spatial relationship of fluorescence microscopy signals from image data. The utility of SPACE is demonstrated by assessing the spatial association between mRNA and cell nuclei from confocal images of cardiac myocytes. Additionally, we use synthetic and empirical images to characterize the sensitivity of SPACE to image segmentation parameters and cursory image qualities such as signal abundance and image resolution. Ultimately, SPACE delivers performance superior to traditional colocalization methods and offers a valuable addition to the microscopist’s toolbox.

Journal ArticleDOI
TL;DR: In this article , each label is predicted with a different value of k, and the problem of finding the best k for each label was formulated as an optimization problem, and three different algorithms were proposed for this task, depending on which multi-label metric is the target of the optimization process.
Abstract: Multi-label classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one category. These problems require the development of new, efficient methods. Multi-label k-nearest neighbors rule, ML-kNN, is among the best-performing methods for multi-label problems. Current methods use a unique k value for all labels, as in the single-label method. However, the distributions of the labels are frequently very different. In such scenarios, a unique k value for the labels might be suboptimal. In this paper, we propose a novel approach in which each label is predicted with a different value of k. Obtaining the best k for each label is stated as an optimization problem. Three different algorithms are proposed for this task, depending on which multi-label metric is the target of our optimization process. In a large set of 40 real-world multi-label problems, our approach improves the results of two different tested ML-kNN implementations.

Journal ArticleDOI
TL;DR: In this paper , a multidimensional spectral information laser-induced breakdown spectroscopy (MSI-LIBS) was proposed, which fully mined the effective information in the spectra by integrating the absolute intensity, the first derivative spectra, and the ratio spectra.
Abstract: To improve the qualitative accuracy of foreign protein adulteration in milk powder, a novel method named multidimensional spectral information laser-induced breakdown spectroscopy (MSI-LIBS) was proposed, which fully mined the effective information in the spectra by integrating the absolute intensity, the first derivative spectra, and the ratio spectra. Compared with traditional LIBS, the performance of the models based on MSI-LIBS was significantly improved. The accuracy of the cross-validation set of support vector machine, k-nearest neighbor, and random subspace method-linear discriminant analysis models increased from 80.98%, 75.61%, and 79.25% to 85.17%, 79.32%, and 81.18%, respectively. The accuracy of the prediction set increased from 81.50%, 76.03%, and 79.07% to 85.82%, 79.74%, and 81.28%, respectively. Furthermore, the visualization results of t-distributed stochastic neighbor embedding also showed that there was a more obvious boundary between the spectra of different samples based on MSI-LIBS. Therefore, these results fully prove the effectiveness of MSI-LIBS in improving the performance of LIBS classification.

Journal ArticleDOI
01 Jan 2023-Entropy
TL;DR: In this paper , the authors proposed a new similarity measure to replace the Euclidean distance, which is defined as Polar distance and considers both angular and module length information, introducing a weight parameter adjusted to the specific application data.
Abstract: The K-nearest neighbor (KNN) algorithm is one of the most extensively used classification algorithms, while its high time complexity limits its performance in the era of big data. The quantum K-nearest neighbor (QKNN) algorithm can handle the above problem with satisfactory efficiency; however, its accuracy is sacrificed when directly applying the traditional similarity measure based on Euclidean distance. Inspired by the Polar coordinate system and the quantum property, this work proposes a new similarity measure to replace the Euclidean distance, which is defined as Polar distance. Polar distance considers both angular and module length information, introducing a weight parameter adjusted to the specific application data. To validate the efficiency of Polar distance, we conducted various experiments using several typical datasets. For the conventional KNN algorithm, the accuracy performance is comparable when using Polar distance for similarity measurement, while for the QKNN algorithm, it significantly outperforms the Euclidean distance in terms of classification accuracy. Furthermore, the Polar distance shows scalability and robustness superior to the Euclidean distance, providing an opportunity for the large-scale application of QKNN in practice.

Proceedings ArticleDOI
04 Jan 2023
TL;DR: In this article , the authors used a two-level localization approach to improve both the accuracy and the response time of the indoor localization system, which is more suitable than single level localization in terms of localization accuracy and speed.
Abstract: Indoor localization is defined as the process of locating a user or device in an indoor environment. It plays a crucial role for first responders in disaster and emergency situations. In situations where the environment does not change much after an incident happens, fingerprint based indoor localization can be used for effective localization and positioning. Due to the growth in smartphone users in the last few years, indoor localization using Wi-Fi fingerprints has been studied by researchers. The measured Wi-Fi signal strength can be used as an indication of the distribution of users in various indoor locations. In disaster and emergency situations, localization services should be highly accurate and fast. We can model localization as a classification problem and address using machine learning (ML) approaches. However, these two requirements are conflicting since an accurate fingerprint-based indoor localization system needs to process a large amount of data, and this leads to slow response. This problem becomes even worse when both the number of floors and the number of reference points increase. To address this challenge, we use a two-level localization in order to improve both the accuracy and the response time. First, the fingerprint database is used to train ML models. The localization phase has two steps: (i) floor prediction, and (ii) reference point prediction on the predicted floor. For floor prediction, we use K-Nearest Neighbors (KNN) classification algorithm. Then we use various ML models such as Random Forest, Decision Tree, and Support Vector Machine. We use a dataset having two files with different floor numbers. Experiment results showed that random forest gives the best accuracy among other ML models. So two-level localization method is more suitable than single level localization in terms of localization accuracy and speed, and thus can be utilized in many applications.

Journal ArticleDOI
TL;DR: In this article , the authors investigated the joint effect of the number of neighbours and distance measure on kNN performance and found that the effect of these parameters in conjunction with dataset characteristics (DC) yielded a statistically insignificant change in mean accuracy (p>0.5).
Abstract: The number of Neighbours (k) and distance measure (DM) are widely modified for improved kNN performance. This work investigates the joint effect of these parameters in conjunction with dataset characteristics (DC) on kNN performance. Euclidean; Chebychev; Manhattan; Minkowski; and Filtered distances, eleven k values, and four DC, were systematically selected for the parameter tuning experiments. Each experiment had 20 iterations, 10-fold cross-validation method and thirty-three randomly selected datasets from the UCI repository. From the results, the average root mean squared error of kNN is significantly affected by the type of task (p<0.05, 14.53% variability effect), while DC collectively caused 74.54% change in mean RMSE values, k and DM accumulated the least effect of 25.4%. The interaction effect of tuning k, DC, and DM resulted in DM='Minkowski', 3≤k≤20, 7≤target dimension ≤9, and sample size (SS) >9000, as optimal performance pattern for classification tasks. For regression problems, the experimental configuration should be7000≤SS≤9000; 4≤number of attributes ≤6, and DM = 'Filtered'. The type of task performed is the most influential kNN performance determinant, followed by DM. The variation in kNN accuracy resulting from changes in k values only occurs by chance, as it does not depict any consistent pattern, while its joint effect of k value with other parameters yielded a statistically insignificant change in mean accuracy (p>0.5). As further work, the discovered patterns would serve as the standard reference for comparative analytics of kNN performance with other classification and regression algorithms.

Journal ArticleDOI
TL;DR: In this paper , the authors developed a hybrid model to forecast the demolition-waste-generation rate in redevelopment areas in South Korea by combining principal component analysis (PCA) with decision tree, k-nearest neighbors, and linear regression algorithms.
Abstract: Construction and demolition waste accounts for a sizable proportion of global waste and is harmful to the environment. Its management is therefore a key challenge in the construction industry. Many researchers have utilized waste generation data for waste management, and more accurate and efficient waste management plans have recently been prepared using artificial intelligence models. Here, we developed a hybrid model to forecast the demolition-waste-generation rate in redevelopment areas in South Korea by combining principal component analysis (PCA) with decision tree, k-nearest neighbors, and linear regression algorithms. Without PCA, the decision tree model exhibited the highest predictive performance (R2 = 0.872) and the k-nearest neighbors (Chebyshev distance) model exhibited the lowest (R2 = 0.627). The hybrid PCA–k-nearest neighbors (Euclidean uniform) model exhibited significantly better predictive performance (R2 = 0.897) than the non-hybrid k-nearest neighbors (Euclidean uniform) model (R2 = 0.664) and the decision tree model. The mean of the observed values, k-nearest neighbors (Euclidean uniform) and PCA–k-nearest neighbors (Euclidean uniform) models were 987.06 (kg·m−2), 993.54 (kg·m−2) and 991.80 (kg·m−2), respectively. Based on these findings, we propose the k-nearest neighbors (Euclidean uniform) model using PCA as a machine-learning model for demolition-waste-generation rate predictions.

Journal ArticleDOI
TL;DR: In this paper , a deep nearest neighbor neural network based on attention mechanism (DN4AM) is proposed to solve the few-shot scene classification task of remote sensing images in order to reduce interference from scene-semantic irrelevant objects to improve the classification accuracy.
Abstract: Remote sensing image scene classification has become more and more popular in recent years. As we all know, it is very difficult and time-consuming to obtain a large number of manually labeled remote sensing images. Therefore, few-shot scene classification of remote sensing images has become an urgent and important research task. Fortunately, the recently proposed deep nearest neighbor neural network (DN4) has made a breakthrough in few-shot classification. However, due to the complex background in remote sensing images, DN4 is easily affected by irrelevant local features, so DN4 cannot be directly applied in remote sensing images. For this reason, a deep nearest neighbor neural network based on attention mechanism (DN4AM) is proposed to solve the few-shot scene classification task of remote sensing images in this paper. Scene class-related attention maps are used in our method to reduce interference from scene-semantic irrelevant objects to improve the classification accuracy. Three remote sensing image datasets are used to verify the performance of our method. Compared with several state-of-the-art methods, including MatchingNet, RelationNet, MAML, Meta-SGD and DN4, our method achieves promising results in the few-shot scene classification of remote sensing images.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed the density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets (DPC-FWSN), which can better characterize the distribution characteristics of the sample by balancing the contribution of sample density in dense and sparse areas.


Journal ArticleDOI
TL;DR: In this article , the effects of geometric confinement on the point statistics in a quasi-low-dimensional system were studied, where the authors focused on the nearest-neighbor statistics and found that the NDDs conform to an extreme value Weibull distribution with the shape parameter depending on the confinement ratio.
Abstract: In this work, we study the effects of geometric confinement on the point statistics in a quasi-low-dimensional system. Specifically, we focus on the nearest-neighbor statistics. Accordingly, we have performed comprehensive numerical simulations of binomial point process on quasi-one-dimensional rectangle strips for different values of the confinement ratio defined as the ratio of the strip width to the mean nearest-neighbor distance. We found that the nearest-neighbor distance distributions (NNDDs) conform to an extreme value Weibull distribution with the shape parameter depending on the confinement ratio, while the process intensity remains constant. This finding reveals the reduction of effective spatial degrees of freedom in a quasi-low-dimensional system under the geometric confinement. The scale dependence of the number of effective spatial degrees of freedom is found to obey the crossover ansatz. We stress that the functional form of the crossover ansatz is determined by the nature of the studied point process. Accordingly, different physical processes in the quasi-low-dimensional system obey different crossover ansatzes. The relevance of these results for quasi-low-dimensional systems is briefly highlighted.

Journal ArticleDOI
TL;DR: In this article , an experiment has been carried out based on a simple k-nearest neighbor (kNN) classifier to investigate the capabilities of three extracted facial features for the better recognition of facial emotions.
Abstract: In this paper, an experiment has been carried out based on a simple k-nearest neighbor (kNN) classifier to investigate the capabilities of three extracted facial features for the better recognition of facial emotions. The feature extraction techniques used are histogram of oriented gradient (HOG), Gabor, and local binary pattern (LBP). A comparison has been made using performance indices such as average recognition accuracy, overall recognition accuracy, precision, recall, kappa coefficient, and computation time. Two databases, i.e., Cohn-Kanade (CK+) and Japanese female facial expression (JAFFE) have been used here. Different training to testing data division ratios is explored to find out the best one from the performance point of view of the three extracted features, Gabor produced 94.8%, which is the best among all in terms of average accuracy though the computational time required is the highest. LBP showed 88.2% average accuracy with a computational time less than that of Gabor while HOG showed minimum average accuracy of 55.2% with the lowest computation time.


Journal ArticleDOI
TL;DR: NanBDOS as discussed by the authors is an adaptive and completely parameter-free synthetic oversampling method, which determines the intrinsic neighbor relationship between instances through a natural neighbor search procedure, and then those informative borderline instances are identified and used as base instances, i.e., sampling seeds.
Abstract: Learning class-imbalance data has become a challenging task in machine learning. Oversampling is an effective way to achieve rebalancing between classes by generating new minority instances. However, most existing oversampling methods exhibit significant ill-posedness due to the involvement of k-nearest neighbors. Recently, several novel techniques have been developed using natural neighbors instead of k-nearest neighbors. But these budding techniques fail to take into account the natural distribution of data, resulting in degraded adaptability of natural neighbors to data characteristics. In this paper, we propose an adaptive and completely parameter-free synthetic oversampling method called NanBDOS (short for borderline oversampling via natural neighbor search). NanBDOS first determines the intrinsic neighbor relationship between instances through a natural neighbor search procedure. Then, those informative borderline instances are identified and used as base instances, i.e., sampling seeds. NanBDOS assigns dynamic sampling weights to the base instances so that the data complexity can be well represented. Finally, new synthetic instances are generated by interpolating between the base instances and their natural neighbors. NanBDOS is experimentally compared with seven baseline methods on twenty-four real-world datasets. The results confirm the effectiveness of the proposed method. Moreover, the statistical analysis also indicates its higher-level Friedman ranking.

Journal ArticleDOI
TL;DR: NeiEA-Net as discussed by the authors optimizes the local neighbors in 3D Euclidean space by taking full advantage of high-dimensional feature space as much as possible, and introduces a neighbor feature aggregation module to adaptively aggregate features with different scales in the local neighbours to further reduce the redundant information.
Abstract: 3D point cloud semantic segmentation is crucial for 3D environment perception and scene understanding, where learning of local context in point clouds is a crucial challenge. Existing approaches typically explore local context based on the predefined neighbors of point clouds. However, the widely used K-nearest neighbor algorithm (KNN) is far from optimal in defining local neighbors. In this study, we propose NeiEA-Net, a conceptually simple and effective network for point cloud semantic segmentation. The key to our approach is to optimize the local neighbors in 3D Euclidean space by taking full advantage of high-dimensional feature space as much as possible. In addition, we introduce a neighbor feature aggregation module to adaptively aggregate features with different scales in the local neighbors to further reduce the redundant information, thereby effectively learning the local details of point clouds. Experiments conducted on three large-scale benchmarks, S3DIS, Toronto3D and SensatUrban, demonstrate the superiority of our network.

Journal ArticleDOI
TL;DR: In this article , a comparison method of Naive Bayes Classifier and K-Nearest Neighbor Classifier using TF-IDF weighting was used to detect positive or negative sentiments in public information.
Abstract: Social media used in communicating that is very popular in Indonesia. One of the most popular is Twitter. Twitter is a social media site where people can share information publicly. This information can be processed to make sentiment analysis. This research attempts to create a system that can detect positive or negative sentiments in public information. The method used for this sentiment classification is the comparison method of Naive Bayes Classifier and K-Nearest Neighbor Classifier using TF-IDF weighting. The input to this system is in the form of tweet data for Transjakarta, while the output of this system is in the form of visualization of positive and negative sentiment data using Streamlit which is a library from python. Based on testing the accuracy of the Naive Bayes approach for sentiment analysis of Twitter data related to the use of Transjakarta transportation is 61.1%, and the accuracy of the K-Nearest Neighbor method is 75.7%. For the two methods used in determining the level of accuracy, it can be concluded that the K-nearest-neighbor method produces better accuracy.