scispace - formally typeset
Search or ask a question

Showing papers on "Euclidean distance published in 2019"


Journal ArticleDOI
01 Dec 2019
TL;DR: Several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features.
Abstract: Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.

124 citations


Proceedings ArticleDOI
01 Nov 2019
TL;DR: This paper proposes a mapping system called FIESTA to build global ESDF map incrementally by introducing two independent updating queues for inserting and deleting obstacles separately, and using Indexing Data Structures and Doubly Linked Lists for map maintenance, which has high computational performance and produces near-optimal results.
Abstract: Euclidean Signed Distance Field (ESDF) is useful for online motion planning of aerial robots since it can easily query the distance and gradient information against obstacles. Fast incrementally built ESDF map is the bottleneck for conducting real-time motion planning. In this paper, we investigate this problem and propose a mapping system called FIESTA to build global ESDF map incrementally. By introducing two independent updating queues for inserting and deleting obstacles separately, and using Indexing Data Structures and Doubly Linked Lists for map maintenance, our algorithm updates as few as possible nodes using a BFS framework. Our ESDF map has high computational performance and produces near-optimal results. We show our method outperforms other up-to-date methods in term of performance and accuracy by both theory and experiments. We integrate FIESTA into a completed quadrotor system and validate it by both simulation and onboard experiments. We release our method as open-source software for the community 11https://github.com/hlx1996/FIESTA

123 citations


Journal ArticleDOI
TL;DR: A state-of-the-art kernel-based clustering algorithm (SIMLR) is modified using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering.
Abstract: Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.

101 citations


Proceedings Article
24 May 2019
TL;DR: A key feature of the results is that, when the number of tasks grows and their variance is relatively small, the learning-to-learn approach has a significant advantage over learning each task in isolation by Stochastic Gradient Descent without a bias term.
Abstract: We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the square euclidean distance to a bias vector. We present an average excess risk bound for such a learning algorithm. This result quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then address the problem of estimating the bias from a sequence of tasks. We propose a meta-algorithm which incrementally updates the bias, as new tasks are observed. The low space and time complexity of this approach makes it appealing in practice. We provide guarantees on the learning ability of the meta-algorithm. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by Stochastic Gradient Descent without a bias term. We report on numerical experiments which demonstrate the effectiveness of our approach.

93 citations


Journal ArticleDOI
TL;DR: Experimental results show that square chord distance is the most robust and accurate metric and significantly outperforms the commonly used Euclidean distance metric.
Abstract: This paper reports the development of a practical visible light positioning (VLP) system using received signal strength. The indoor localization system is accurate and easy to train and calibrate despite using fingerprinting technique. The VLP system consists of cheap photodiode-based receiver and consumer grade LED luminaires. The impact of distance metrics used to compute the weights of the weighted $K$ -nearest neighbor (WKNN) algorithm on the localization accuracy of the VLP is investigated. Experimental results show that square chord distance is the most robust and accurate metric and significantly outperforms the commonly used Euclidean distance metric. A room-scale implementation shows that a mean error of 2.2 cm and a 90-percentile error of 4.9 cm within a 3.3 $\text {m} \times 2.1$ m 2-D floor space are achievable. However, the high localization accuracy comes at the cost of requiring 187 offline measurements to construct the fingerprint database. A method for estimating an optical propagation model using only a handful of measurements is developed to address this problem. This leads to the creation of a dense and accurate fingerprinting database through fabricated data. The performance of the VLP system does not degrade noticeably when the localization is performed with the fabricated data. A mean error of 2.7 cm and a 90-percentile error of 5.7 cm are achievable with only 12 offline measurements.

81 citations


Journal ArticleDOI
TL;DR: The proposed moving object detection method via ResNet-18 with encoder–decoder structure outperforms state-of-the-art algorithms significantly, and its mean F-measure increased by 1.99%~29.17%.
Abstract: In complex scenes, dynamic background, illumination variation, and shadow are important factors, which make conventional moving object detection algorithms suffer from poor performance. To solve this problem, a moving object detection method via ResNet-18 with encoder-decoder structure is proposed to segment moving objects from complex scenes. ResNet-18 with encoder-decoder structure possesses pixel-level classification capability to divide pixels into foreground and background, and it performs well in feature extraction because of its layers are so shallow that many more low-scale features will be retained. First, the object frames and their corresponding artificial labels are input to the network. Then, feature vectors will be generated by the encoder, and they are converted into segmentation maps by the decoder through deconvolution processing. Third, a rough matching of the moving object regions will be obtained, and finally, the Euclidean distance is used to match the moving object regions accurately. The proposed method is suitable for the scenes where dynamic background, illumination variation, and shadow exist, and experimental results on the public standard CDnet2014 and I2R datasets, from both qualitative and quantitative comparison aspects, demonstrate that the proposed method outperforms state-of-the-art algorithms significantly, and its mean F-measure increased by 1.99%~29.17%.

77 citations


Journal ArticleDOI
TL;DR: Empirical results show that the proposed algorithm is very competitive against other MaOEAs for solving MaOPs, and two modified compared algorithms are generally more effective than their predecessors.
Abstract: The existing multiobjective evolutionary algorithms (EAs) based on nondominated sorting may encounter serious difficulties in tackling many-objective optimization problems (MaOPs), because the number of nondominated solutions increases exponentially with the number of objectives, leading to a severe loss of selection pressure. To address this problem, some existing many-objective EAs (MaOEAs) adopt Euclidean or Manhattan distance to estimate the convergence of each solution during the environmental selection process. Nevertheless, either Euclidean or Manhattan distance is a special case of Minkowski distance with the order ${P=2}$ or ${P=1}$ , respectively. Thus, it is natural to adopt Minkowski distance for convergence estimation, in order to cover various types of Pareto fronts (PFs) with different concavity–convexity degrees. In this paper, a Minkowski distance-based EA is proposed to solve MaOPs. In the proposed algorithm, first, the concavity–convexity degree of the approximate PF, denoted by the value of ${P}$ , is dynamically estimated. Subsequently, the Minkowski distance of order ${P}$ is used to estimate the convergence of each solution. Finally, the optimal solutions are selected by a comprehensive method, based on both convergence and diversity. In the experiments, the proposed algorithm is compared with five state-of-the-art MaOEAs on some widely used benchmark problems. Moreover, the modified versions for two compared algorithms, integrated with the proposed ${P}$ -estimation method and the Minkowski distance, are also designed and analyzed. Empirical results show that the proposed algorithm is very competitive against other MaOEAs for solving MaOPs, and two modified compared algorithms are generally more effective than their predecessors.

76 citations


Journal ArticleDOI
TL;DR: Experimental results on different datasets show that the proposed clustering algorithm outperforms other compared methods in various evaluation metrics; this approach enhances the prediction accuracy and effectively deals with the sparsity problem.
Abstract: Data sparsity is a widespread problem of collaborative filtering (CF) recommendation algorithms. However, some common CF methods cannot adequately utilize all user rating information; they are only able to use a small part of the rating data, depending on the co-rated items, which leads to low prediction accuracy. To alleviate this problem, a novel K-medoids clustering recommendation algorithm based on probability distribution for CF is proposed. The proposed scheme makes full use of all rating information based on Kullback–Leibler (KL) divergence from the perspective of item rating probability distribution, and distinguishes different items efficiently when selecting the cluster centers. Meanwhile, the distance model breaks the symmetric mode of classic geometric distance methods (such as Euclidean distance) and considers the effects of different rating numbers between items to emphasize their asymmetric relationship. Experimental results on different datasets show that the proposed clustering algorithm outperforms other compared methods in various evaluation metrics; this approach enhances the prediction accuracy and effectively deals with the sparsity problem.

69 citations


Proceedings Article
25 Jun 2019
TL;DR: In this article, the authors consider the problem of approximating a given function in ReLU networks with an unbounded number of units, but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded.
Abstract: We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function. For functions $f : \mathbb R \rightarrow \mathbb R$ and a single hidden layer, we show that the minimal network norm for representing $f$ is $\max(\int |f''(x)| dx, |f'(-\infty) + f'(+\infty)|)$, and hence the minimal norm fit for a sample is given by a linear spline interpolation.

68 citations


Journal ArticleDOI
TL;DR: A novel dissimilarity measure whose design is a function of product based gaussian membership function through extending the similarity function proposed in earlier research (G-Spamine) is proposed and the correctness and completeness of proposed approach is also proved analytically.
Abstract: Time profiled association mining is one of the important and challenging research problems that is relatively less addressed. Time profiled association mining has two main challenges that must be addressed. These include addressing i) dissimilarity measure that also holds monotonicity property and can efficiently prune itemset associations ii) approaches for estimating prevalence values of itemset associations over time. The pioneering research that addressed time profiled association mining is by J.S. Yoo using Euclidean distance. It is widely known fact that this distance measure suffers from high dimensionality. Given a time stamped transaction database, time profiled association mining refers to the discovery of underlying and hidden time profiled itemset associations whose true prevalence variations are similar as the user query sequence under subset constraints that include i) allowable dissimilarity value ii) a reference query time sequence iii) dissimilarity function that can find degree of similarity between a temporal itemset and reference. In this paper, we propose a novel dissimilarity measure whose design is a function of product based gaussian membership function through extending the similarity function proposed in our earlier research (G-Spamine). Our approach, MASTER (Mining of Similar Temporal Associations) which is primarily inspired from SPAMINE uses the dissimilarity measure proposed in this paper and support bound estimation approach proposed in our earlier research. Expression for computation of distance bounds of temporal patterns are designed considering the proposed measure and support estimation approach. Experiments are performed by considering naive, sequential, Spamine and G-Spamine approaches under various test case considerations that study the scalability and computational performance of the proposed approach. Experimental results prove the scalability and efficiency of the proposed approach. The correctness and completeness of proposed approach is also proved analytically.

64 citations


Journal ArticleDOI
01 Mar 2019
TL;DR: This study presents a predictor to obtain the particle swarm of high quality by calculating non-linear variations of ranging between particles and flags and modifying the reference distribution function, which can effectively improve the positioning accuracy and reduce the positioning error of target nodes.
Abstract: The particle degradation problem of particle filter (PF) algorithm caused by reduction of particle weights significantly influences the positioning accuracy of target nodes in wireless sensor networks. This study presents a predictor to obtain the particle swarm of high quality by calculating non-linear variations of ranging between particles and flags and modifying the reference distribution function. To this end, probability variations of distances between particles and star flags are calculated and the maximum inclusive distance using the maximum probability of high-quality particle swarm is obtained. The quality of particles is valued by the Euclidean distance between the predicted and real observations, and hereafter particles of high quality are contained in spherical coordinate system using the distance as diameter. The simulation results show that the proposed algorithm is robust and the computational complexity is low. The method can effectively improve the positioning accuracy and reduce the positioning error of target nodes.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed density peak (DP) clustering-based noisy label detection method indeed helps in improving the classification performance.
Abstract: Mislabeled training samples may have a negative effect on the performance of hyperspectral image classification. In order to solve this problem, a new density peak (DP) clustering-based noisy label detection method is proposed, which consists of the following steps. First, the distances among the training samples of each class are calculated using four representative distance metrics, i.e., the Euclidean distance (ED), orthogonal projection divergence (OPD), spectral information divergence (SID), and correlation coefficient (CC). Then, the local density of each training sample can be obtained using the DP clustering algorithm. Finally, a local density-based decision function is used to detect the noisy labels. The effectiveness of the proposed method is evaluated using the support vector machines on several real hyperspectral data sets. Experimental results demonstrate that the proposed noisy label detection method indeed helps in improving the classification performance.

Proceedings ArticleDOI
22 Apr 2019
TL;DR: It is shown that this method outperforms both low-dimension representation techniques based on principal component analysis (PCA) and sparse reconstruction using Gaussian-windowed Fourier dictionary, and can achieve very high classification rates.
Abstract: We introduce a simple but effective technique in automatic hand gesture recognition using radar. The proposed technique classifies hand gestures based on the envelopes of their micro-Doppler (MD) signatures. These envelopes capture the distinctions among different hand movements and their corresponding positive and negative Doppler frequencies that are generated during each gesture act. We detect the positive and negative frequency envelopes of MD separately, and form a feature vector of their augmentation. We use the k-nearest neighbor (kNN) classifier and Manhattan distance (L1) measure, in lieu of Euclidean distance (L2), so as not to diminish small but critical envelope values. It is shown that this method outperforms both low-dimension representation techniques based on principal component analysis (PCA) and sparse reconstruction using Gaussian-windowed Fourier dictionary, and can achieve very high classification rates.

Posted Content
TL;DR: The question of what functions can be captured by ReLU networks with an unbounded number of units, but where the overall network Euclidean norm is bounded is considered; or equivalently what is the minimal norm required to approximate a given function.
Abstract: We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function. For functions $f : \mathbb R \rightarrow \mathbb R$ and a single hidden layer, we show that the minimal network norm for representing $f$ is $\max(\int |f''(x)| dx, |f'(-\infty) + f'(+\infty)|)$, and hence the minimal norm fit for a sample is given by a linear spline interpolation.

Journal ArticleDOI
TL;DR: In this paper, a free-form curve registration method is applied to an efficient RGB-D visual odometry system called Canny-VO, as it efficiently tracks all Canny edge features extracted from the images.
Abstract: This paper reviews the classical problem of free-form curve registration and applies it to an efficient RGB-D visual odometry system called Canny-VO, as it efficiently tracks all Canny edge features extracted from the images. Two replacements for the distance transformation commonly used in edge registration are proposed: approximate nearest neighbor fields and oriented nearest neighbor fields. 3-D–2-D edge alignment benefits from these alternative formulations in terms of both efficiency and accuracy. It removes the need for the more computationally demanding paradigms of data-to-model registration, bilinear interpolation, and subgradient computation. To ensure robustness of the system in the presence of outliers and sensor noise, the registration is formulated as a maximum a posteriori problem and the resulting weighted least-squares objective is solved by the iteratively reweighted least-squares method. A variety of robust weight functions are investigated and the optimal choice is made based on the statistics of the residual errors. Efficiency is furthermore boosted by an adaptively sampled definition of the nearest neighbor fields. Extensive evaluations on public SLAM benchmark sequences demonstrate state-of-the-art performance and an advantage over classical Euclidean distance fields.

Journal ArticleDOI
18 May 2019-Sensors
TL;DR: Compared with the traditional methods, the proposed position label-assisted (PL-assisted) clustering result can reflect the position distribution of RPs and the proposed SWED-based WKNN (SWED-WKNN) algorithm can significantly improve the positioning accuracy.
Abstract: WiFi fingerprint positioning has been widely used in the indoor positioning field. The weighed K-nearest neighbor (WKNN) algorithm is one of the most widely used deterministic algorithms. The traditional WKNN algorithm uses Euclidean distance or Manhattan distance between the received signal strengths (RSS) as the distance measure to judge the physical distance between points. However, the relationship between the RSS and the physical distance is nonlinear, using the traditional Euclidean distance or Manhattan distance to measure the physical distance will lead to errors in positioning. In addition, the traditional RSS-based clustering algorithm only takes the signal distance between the RSS as the clustering criterion without considering the position distribution of reference points (RPs). Therefore, to improve the positioning accuracy, we propose an improved WiFi positioning method based on fingerprint clustering and signal weighted Euclidean distance (SWED). The proposed algorithm is tested by experiments conducted in two experimental fields. The results indicate that compared with the traditional methods, the proposed position label-assisted (PL-assisted) clustering result can reflect the position distribution of RPs and the proposed SWED-based WKNN (SWED-WKNN) algorithm can significantly improve the positioning accuracy.

Journal ArticleDOI
TL;DR: A novel robust scale ICP algorithm is proposed by introducing maximum correntropy criterion (MCC) as the similarity measure, which greatly outperforms state-of-the-art methods in terms of matching accuracy and run-time, especially when the data contain severe outliers.

Journal ArticleDOI
TL;DR: The first polynomial-time approximation schemes (PTASs) for the following problems are given: (1) uniform facility location in edge-weighted planar graphs; (2) $k$-median and $ k$-means in Edge-weight...
Abstract: We give the first polynomial-time approximation schemes (PTASs) for the following problems: (1) uniform facility location in edge-weighted planar graphs; (2) $k$-median and $k$-means in edge-weight...

Journal ArticleDOI
TL;DR: FMST is a good and practicable neuron reconstruction algorithm, and can be implemented in Vaa3D platform as a neuron tracing plugin, and is one of two methods with best performance among all 27 state of the art reconstruction methods.
Abstract: Neuron reconstruction is an important technique in computational neuroscience. Although there are many reconstruction algorithms, few can generate robust results. In this paper, we propose a reconstruction algorithm called fast marching spanning tree (FMST). FMST is based on a minimum spanning tree method (MST) and improve its performance in two aspects: faster implementation and no loss of small branches. The contributions of the proposed method are as follows. Firstly, the Euclidean distance weight of edges in MST is improved to be a more reasonable value, which is related to the probability of the existence of an edge. Secondly, a strategy of pruning nodes is presented, which is based on the radius of a node’s inscribed ball. Thirdly, separate branches of broken neuron reconstructions can be merged into a single tree. FMST and many other state of the art reconstruction methods were implemented on two datasets: 120 Drosophila neurons and 163 neurons with gold standard reconstructions. Qualitative and quantitative analysis on experimental results demonstrates that the performance of FMST is good compared with many existing methods. Especially, on the 91 fruitfly neurons with gold standard and evaluated by five metrics, FMST is one of two methods with best performance among all 27 state of the art reconstruction methods. FMST is a good and practicable neuron reconstruction algorithm, and can be implemented in Vaa3D platform as a neuron tracing plugin.

Journal ArticleDOI
TL;DR: This study provides a comprehensive overview of the advantages and limitations of the two widely-used CNN frameworks in the PReID community, and presents a hybrid model that combines the advantages of both identification and triplet models.

Journal ArticleDOI
TL;DR: A method for heterogeneous synthetic aperture radar (SAR) image and optical image change detection is proposed, which is based on a pixel-level mapping method and a capsule network with a deep structure, which obtains a satisfactory performance.
Abstract: Homogeneous image change detection research has been well developed, and many methods have been proposed. However, change detection between heterogeneous images is challenging since heterogeneous images are in different domains. Therefore, direct heterogeneous image comparison in the way that we do it is difficult. In this paper, a method for heterogeneous synthetic aperture radar (SAR) image and optical image change detection is proposed, which is based on a pixel-level mapping method and a capsule network with a deep structure. The mapping method proposed transforms an image from one feature space to another feature space. Then, the images can be compared directly in a similarly transformed space. In the mapping process, some image blocks in unchanged areas are selected, and these blocks are only a small part of the image. Then, the weighted parameters are acquired by calculating the Euclidean distances between the pixel to be transformed and the pixels in these blocks. The Euclidean distance calculated according to the weighted coordinates is taken as the pixel gray value in another feature space. The other image is transformed in a similar manner. In the transformed feature space, these images are compared, and the fusion of the two different images is achieved. The two experimental images are input to a capsule network, which has a deep structure. The image fusion result is taken as the training labels. The training samples are selected according to the ratio of the center pixel label and its neighboring pixels’ labels. The capsule network can improve the detection result and suppress noise. Experiments on remote sensing datasets show the final detection results, and the proposed method obtains a satisfactory performance.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: This paper proposes a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning and proposes Deep SNR-based Metric Learning (DSML) to generate discriminative feature embeddings.
Abstract: Deep metric learning, which learns discriminative features to process image clustering and retrieval tasks, has attracted extensive attention in recent years. A number of deep metric learning methods, which ensure that similar examples are mapped close to each other and dissimilar examples are mapped farther apart, have been proposed to construct effective structures for loss functions and have shown promising results. In this paper, different from the approaches on learning the loss structures, we propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. By exploring the properties of our SNR distance metric from the view of geometry space and statistical theory, we analyze the properties of our metric and show that it can preserve the semantic similarity between image pairs, which well justify its suitability for deep metric learning. Compared with Euclidean distance metric, our SNR distance metric can further jointly reduce the intra-class distances and enlarge the inter-class distances for learned features. Leveraging our SNR distance metric, we propose Deep SNR-based Metric Learning (DSML) to generate discriminative feature embeddings. By extensive experiments on three widely adopted benchmarks, including CARS196, CUB200-2011 and CIFAR10, our DSML has shown its superiority over other state-of-the-art methods. Additionally, we extend our SNR distance metric to deep hashing learning, and conduct experiments on two benchmarks, including CIFAR10 and NUS-WIDE, to demonstrate the effectiveness and generality of our SNR distance metric.

Journal ArticleDOI
TL;DR: In this article, three hedonic pricing models, including an OLS model, a Euclidean distance-based (ED-based) geographically weighted regression (GWR), and a travel t...
Abstract: In this research, three hedonic pricing models, including an ordinary least squares (OLS) model, a Euclidean distance–based (ED-based) geographically weighted regression (GWR) model, and a travel t...

Journal ArticleDOI
TL;DR: A novel deep dictionary representation-based classification scheme, where a convolutional neural network is employed as the feature extractor and followed by a dictionary to linearly code the extracted deep features, which is robust to large contiguous occlusion.
Abstract: Deep learning has achieved exciting results in face recognition; however, the accuracy is still unsatisfying for occluded faces. To improve the robustness for occluded faces, this paper proposes a novel deep dictionary representation-based classification scheme, where a convolutional neural network is employed as the feature extractor and followed by a dictionary to linearly code the extracted deep features. The dictionary is composed by a gallery part consisting of the deep features of the training samples and an auxiliary part consisting of the mapping vectors acquired from the subjects either inside or outside the training set and associated with the occlusion patterns of the testing face samples. A squared Euclidean norm is used to regularize the coding coefficients. The proposed scheme is computationally efficient and is robust to large contiguous occlusion. In addition, the proposed scheme is generic for both the occluded and non-occluded face images and works with a single training sample per subject. The extensive experimental evaluations demonstrate the superior performance of the proposed approach over other state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: This paper designs teacher and supervise dual stacked auto-encoder (TSSAE) for quality-relevant fault detection in industrial process which separates the feature extraction and model construction.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a statistical-based recognition method to deal with driver behaviour uncertainty in driving style recognition, where they extracted discriminative features using the conditional kernel density function to characterise path-following behavior.
Abstract: Driving style recognition plays a crucial role in eco-driving, road safety, and intelligent vehicle control. This study proposes a statistical-based recognition method to deal with driver behaviour uncertainty in driving style recognition. First, the authors extract discriminative features using the conditional kernel density function to characterise path-following behaviour. Meanwhile, the posterior probability of each selected feature is computed based on the full Bayesian theory. Second, they develop an efficient Euclidean distance-based method to recognise the path-following style for new input datasets at a low computational cost. By comparing the Euclidean distance of each pair of elements in the feature vector, then they classify driving styles into seven levels from normal to aggressive. Finally, they employ a cross-validation method to evaluate the utility of their proposed approach by comparing with a fuzzy logic (FL) method. The experiment results show that the proposed statistical-based recognition method integrating with the kernel density is more efficient and robust than the FL method.

Journal ArticleDOI
TL;DR: A novel supervised dimensionality reduction method, named feature space to feature space distance metric learning (FSDML), is presented and displays superiority on robustness by the local projection of the S2S distance metric.

Journal ArticleDOI
TL;DR: Results show that the accessibility to parks at all hierarchical levels is high particularly, particularly at the natural level, but the disparity between the supply and demand is significant, which may provide implications on access to urban greens paces for urban planners and authorities to develop effective planning strategies.
Abstract: Urban green spaces play a critical role in public health and human wellbeing for urban residents. Due to the uneven spatial distribution of urban green spaces in most of cities, the issue of the disparity between supply and demand has aroused public concern. In a case of Shenzhen, a modified Gaussian-based two-step floating catchment area (2SFCA) method is adopted to evaluate the disparity between park provision and the demanders in terms of accessibility at hierarchical levels under four types of distance (e.g., Euclidean distance, walking distance, bicycling distance, and driving distance), which is well aligned with hierarchical systems in urban green spaces in urban planning practice. By contrast and correlation analysis, among the four types of distance, the statistical correlations are relatively high between Euclidean distance and the other three. Nonetheless, the pattern of spatial accessibility under different type of travel distance is apparently variant. Accessibility calculated by Euclidean distance is overestimated relative to that of the other three, while the pattern of walking distance and bicycling distance is similar to each other. The choice of type of distance is worthy of caution when evaluating spatial accessibility by 2SFCA method. Results show that the accessibility to parks at all hierarchical levels is high particularly, particularly at the natural level. However, the disparity between the supply and demand is significant. The percentage of communities that have high population density but low park accessibility is over 40% (equivalent to approximately 55% of the population). The finding may provide implications on access to urban greens paces for urban planners and authorities to develop effective planning strategies.

Journal ArticleDOI
TL;DR: The authors propose a minimum spanning tree (MST)-based anomaly detection method, which is compared with 13 popular anomaly detection methods on 20 benchmark data sets, and demonstrates a considerable improvement in its ability of identifying anomalies.
Abstract: Anomalies are data points or a cluster of data points that lie away from the neighboring points or clusters and are inconsistent with the overall pattern of the data. Anomaly detection techniques help distinguish the anomalous observations from the regular ones, and thus provide the basis for developing a standard performance guideline for process control. The process of identifying anomalies becomes complicated in the absence of labeled training data as in supervised learning. Moreover, Euclidean distance between two points is less likely able to reflect the intrinsic structural distance imposed by the underlying manifold structure. In this paper, the authors propose a minimum spanning tree (MST)-based anomaly detection method. The merit of the method is that an MST provides a new distance measure, capable of capturing the relative connectedness of data points/clusters in a complicated manifold, and could be a better (dis)similarity metric, than the simple Euclidean distance, to identify anomalies in unsupervised learning settings. The proposed method is compared with 13 popular anomaly detection methods on 20 benchmark data sets, demonstrating a considerable improvement in its ability of identifying anomalies. Furthermore, the MST-based anomaly detection is applied to the data set from a hydropower turbine and demonstrates remarkable detection competence. Note to Practitioners —This paper is motivated by the problem of unsupervised anomaly detection in a hydropower generation plant, which operates with turbine systems that are instrumented with dozens of sensors. Each turbine has subcomponents or functional areas such as several bearing systems, a generator, and so on. Sensors collect various types of data in real time such as temperature of oil inside the bearing systems, temperature of the bearings, ambient temperature, vibrations in each functional areas, a variety of harmonics in functional areas, temperature of the coil in the generator, and many more. In total, each turbine collects more than 200 attributes from its sensors. The sensor data are then stored in a control system and kept as time stamped historical data points. When a service/maintenance engineer suspects that there is a malfunction in a turbine, she/he extracts a data set from the control system that contains the collected sensor data for that turbine for the selected period of time (few weeks to few months), and then stores this data in a relational databases or simply in a comma separate value (csv) file for further analysis. The objective is to efficiently identify and isolate anomalies in the turbines. Toward this goal, we propose a new solution for tackling this challenging problem, which is an unsupervised method based on the concept of MST. The proposed method can be used as a competitive tool to aid the practitioners in their search of anomalies for making their systems better.

Journal ArticleDOI
17 Jul 2019
TL;DR: Zhang et al. as mentioned in this paper proposed a novel metric loss named angular triplet-center loss, which directly optimizes the cosine distances between the features. And they adopted the angle margin to provide more explicit discriminative constraints on an embedding space, achieving state-of-the-art results on various 3D shape datasets.
Abstract: How to obtain the desirable representation of a 3D shape, which is discriminative across categories and polymerized within classes, is a significant challenge in 3D shape retrieval. Most existing 3D shape retrieval methods focus on capturing strong discriminative shape representation with softmax loss for the classification task, while the shape feature learning with metric loss is neglected for 3D shape retrieval. In this paper, we address this problem based on the intuition that the cosine distance of shape embeddings should be close enough within the same class and far away across categories. Since most of 3D shape retrieval tasks use cosine distance of shape features for measuring shape similarity, we propose a novel metric loss named angular triplet-center loss, which directly optimizes the cosine distances between the features. It inherits the triplet-center loss property to achieve larger inter-class distance and smaller intra-class distance simultaneously. Unlike previous metric loss utilized in 3D shape retrieval methods, where Euclidean distance is adopted and the margin design is difficult, the proposed method is more convenient to train feature embeddings and more suitable for 3D shape retrieval. Moreover, the angle margin is adopted to replace the cosine margin in order to provide more explicit discriminative constraints on an embedding space. Extensive experimental results on two popular 3D object retrieval benchmarks, ModelNet40 and ShapeNetCore 55, demonstrate the effectiveness of our proposed loss, and our method has achieved state-ofthe-art results on various 3D shape datasets.