scispace - formally typeset
Search or ask a question

Showing papers on "Mahalanobis distance published in 2010"


Journal ArticleDOI
TL;DR: This article proposed a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness, as well as geographic distance.
Abstract: Cross-national distance is a key concept in the field of management. Previous research has conceptualized and measured cross-national differences mostly in terms of dyadic cultural distance, and has used the Euclidean approach to measuring it. In contrast, our goal is to disaggregate the construct of distance by proposing a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness as well as geographic distance. We ground our analysis and choice of empirical dimensions on institutional theories of national business, governance, and innovation systems. In order to overcome the methodological limitations of the Euclidean approach, we calculate dyadic distances using the Mahalanobis method, which is scale-invariant and takes into consideration the variance–covariance matrix. We empirically analyze four different foreign expansion choices of US companies to illustrate the importance of disaggregating the distance construct and the usefulness of our distance calculations, which we make freely available to managers and scholars.

981 citations


Journal ArticleDOI
01 Jul 2010
TL;DR: A modified version of the support vector regression (SVR) is presented to solve the load forecasting problem and exhibits superior performance compare to that of LWR, local SVR, and other published models.
Abstract: The forecasting of electricity demand has become one of the major research fields in electrical engineering. Accurately estimated forecasts are essential part of an efficient power system planning and operation. In this paper, a modified version of the support vector regression (SVR) is presented to solve the load forecasting problem. The proposed model is derived by modifying the risk function of the SVR algorithm with the use of locally weighted regression (LWR) while keeping the regularization term in its original form. In addition, the weighted distance algorithm based on the Mahalanobis distance for optimizing the weighting function's bandwidth is proposed to improve the accuracy of the algorithm. The performance of the new model is evaluated with two real-world datasets, and compared with the local SVR and some published models using the same datasets. The results show that the proposed model exhibits superior performance compare to that of LWR, local SVR, and other published models.

260 citations


Journal ArticleDOI
01 Oct 2010-Forestry
TL;DR: In this article, a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbors as well as a regression tree-based classifier (Random Forest).
Abstract: Summary In a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbour as well as a regression tree-based classifier (Random Forest). The performances of nearest neighbour (NN) approaches were examined by means of relative bias and root mean squared error. The original highdimensional dataset was pruned using an evolutionary genetic algorithm search with a NN classification scenario, as well as by a stepwise selection. The genetic algorithm (GA)-selected variables showed improved performance when applying Euclidean and Mahalanobis distances for predictions, whereas the Most Similar Neighbour and Random Forests worked more precise with the full dataset. The GA search proved to be unstable in multiple runs because of intercorrelations among the high-dimensional predictors. The selected datasets are dominated by LiDAR height metrics. Furthermore, The LiDAR-based metrics showed major relevance in predicting both response variables examined here. The Random Forest proved to be superior to the other examined NN methods, which was eventually used for a wallto-wall mapping of predictions on a grid of 20 × 20 m spatial resolution.

202 citations


Journal ArticleDOI
TL;DR: The ability of the D2 and other squared Euclidean‐based statistics to approximate a genetic relationship matrix and Sewall Wright's fixation index using phenotypic data, and the inability of the MMD to do so, are addressed.
Abstract: The mean measure of divergence (MMD) distance statistic has been used by researchers for nearly 50 years to assess inter-sample phenetic affinity. Its widespread and often successful use is well documented, especially in the study of cranial and dental nonmetric traits. However, the statistic has accumulated some undesired mathematical baggage through the years from various workers in their attempts to improve or alter its performance. Others may not fully understand how to apply the MMD or interpret its output, whereas some described a number of perceived shortcomings. As a result, the statistic and its sometimes flawed application(s) have taken several well-aimed hits; a few researchers even argued that it should no longer be utilized or, at least, that its use be reevaluated. The objective of this report is to support the MMD, and in the process: (1) provide a brief history of the statistic, (2) review its attributes and applicability relative to the often-used Mahalanobis D2 statistic for nonmetric traits, (3) compare results from MMD and D2 model-free analyses of previously-recorded sub-Saharan African dental samples, and (4) investigate its utility for model-bound analyses. In the latter instance, the ability of the D2 and other squared Euclidean-based statistics to approximate a genetic relationship matrix and Sewall Wright's fixation index using phenotypic data, and the inability of the MMD to do so, is addressed. Three methods for obtaining such results with nonlinear MMD distances, as well as an assessment of the fit of the isolation-by-distance model, are presented. Am. J. Hum. Biol., 2010. © 2009 Wiley-Liss, Inc.

124 citations


Journal ArticleDOI
TL;DR: In this paper, a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D was studied, and a linear time (1+ϵ)-approximation algorithm was given for the problem in an arbitrary metric space with bounded doubling dimension.
Abstract: We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P,C) = ∑p i P minc i C {D(p,c)} is minimized The main result in this article can be stated as follows: There exists a (1+ϵ)-approximation algorithm for the k-median problem with respect to D, if the 1-median problem can be approximated within a factor of (1+ϵ) by taking a random sample of constant size and solving the 1-median problem on the sample exactly This algorithm requires time n2O(mklog(mk/ϵ)), where m is a constant that depends only on ϵ and D Using this characterization, we obtain the first linear time (1+ϵ)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner Our results are based on a new analysis of an algorithm of Kumar et al [2004]

107 citations


Journal ArticleDOI
TL;DR: A hybrid method for effective bankruptcy prediction is proposed, based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance with variable weight, which outperforms some currently-in-use techniques.
Abstract: This paper proposes a hybrid method for effective bankruptcy prediction, based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance with variable weight. Unlike the existing case-based reasoning methods using the Euclidean distance, we introduce the Mahalanobis distance in locating the nearest neighbors, which considers the covariance structure of variables in measuring the closeness. Since hundreds of financial ratio variables are available in analyzing credit management problems, the model performance is also affected by input variable selection strategies. Variables selected by the decision trees induction tend to have an interaction compared to those produced by the regression approaches. The Mahalanobis distance is a more true measure of proximity than the Euclidean distance when variables are correlated with each other. The experimental results indicate that the proposed approach outperforms some currently-in-use techniques.

102 citations


Journal ArticleDOI
TL;DR: A novel method for automatically classifying consumer video clips based on their soundtracks using a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections.
Abstract: This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.

102 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed to supplement TOPSIS (Technique for the Order Preference by Similarity to Ideal Solution) method and integrate the Mahalanobis distance in the usual algorythm.
Abstract: Decision making in construction management has been always complicated especially if there were more than one criterion under consideration. Multiple criteria decision making (MCDM) has been often applied for complex decisions in construction when a lot of criteria were involved. Traditional MCDM methods, however, operate with independent and conflicting criteria. While in every day problems a decision maker often faces interactive and interrelated criteria. Accordingly, the need of improving and supplementing the methodology of compromise decisions arose. It was proposed to supplement TOPSIS (Technique for the Order Preference by Similarity to Ideal Solution) method and integrate the Mahalanobis distance in the usual algorythm of TOPSIS. Mahalanobis distance measure offered an option to take the correlations between the criteria into considerations while making the decision. A case study of building redevelopment in Lithuanian rural areas was presented that demonstrated the application of the pr...

101 citations


Journal ArticleDOI
TL;DR: This research study proposes a dimensionality reduction method by addressing the problem as feature selection exercise of MTS using data from an Indian foundry shop to test the mathematical model and the swarm heuristic.
Abstract: Mahalanobis-Taguchi System (MTS) is a pattern recognition method applied to classify data into categories - ''healthy'' and ''unhealthy'' or ''acceptable'' and ''unacceptable''. MTS has found applications in a wide range of problem domains. Dimensionality reduction of the input set of attributes forms an important step in MTS. The current practice is to apply Taguchi's design of experiments (DOE) and orthogonal array (OA) method to achieve this end. Maximization of Signal-to-Noise (S/N) ratio forms the basis for selection of the optimal combination of variables. However the DOE-OA method has been reviewed to be inadequate for the purpose. In this research study, we propose a dimensionality reduction method by addressing the problem as feature selection exercise. The optimal combination of attributes minimizes a weighted sum of total fractional misclassification and the percentage of the total number of variables employed to obtain the misclassification. Mahalanobis distances (MDs) of ''healthy'' and ''unhealthy'' conditions are used to compute the misclassification. A mathematical model formulates the feature selection approach and it is solved by binary particle swarm optimization (PSO). Data from an Indian foundry shop is adopted to test the mathematical model and the swarm heuristic. Results are compared with that of DOE-OA method of MTS.

93 citations


Journal ArticleDOI
TL;DR: A Mahalanobis distance (MD) based diagnostic approach that employs a probabilistic approach to establish thresholds to classify a product as being healthy or unhealthy and a technique for detecting trends and biasness in system health is presented.
Abstract: This paper presents a Mahalanobis distance (MD) based diagnostic approach that employs a probabilistic approach to establish thresholds to classify a product as being healthy or unhealthy. A technique for detecting trends and biasness in system health is presented by constructing a control chart for the MD value. The performance parameters' residuals, which are the differences between the estimated values (from an empirical model) and the observed values (from health monitoring), are used to isolate parameters that exhibit faults. To aid in the qualification of a product against a specific known fault, we suggest that a fault-specific threshold MD value be defined by minimizing an error function. A case study on notebook computers is presented to demonstrate the applicability of this proposed diagnostic approach.

89 citations


Journal ArticleDOI
TL;DR: A new concept of an adaptive individual niche radius is applied to niching with the covariance matrix adaptation evolution strategy (CMA-ES), and two approaches are considered that are shown to be robust and to achieve satisfying results.
Abstract: While the motivation and usefulness of niching methods is beyond doubt, the relaxation of assumptions and limitations concerning the hypothetical search landscape is much needed if niching is to be valid in a broader range of applications. Upon the introduction of radii-based niching methods with derandomized evolution strategies (ES), the purpose of this study is to address the so-called niche radius problem. A new concept of an adaptive individual niche radius is applied to niching with the covariance matrix adaptation evolution strategy (CMA-ES). Two approaches are considered. The first approach couples the radius to the step size mechanism, while the second approach employs the Mahalanobis distance metric with the covariance matrix mechanism for the distance calculation, for obtaining niches with more complex geometrical shapes. The proposed approaches are described in detail, and then tested on high-dimensional artificial landscapes at several levels of difficulty. They are shown to be robust and to achieve satisfying results.

Journal ArticleDOI
TL;DR: This paper demonstrates that the combination of a classifier trained by AdaBoost.M2 and features based on the estimated parameter of a log-compressed K-distribution, as well as those of the pattern spectrum, are useful for the discrimination of tumors.
Abstract: This paper proposes a novel algorithm to estimate a log-compressed K distribution parameter and presents an algorithm to discriminate breast tumors in ultrasonic images. We computed a total of 208 features for discrimination, including those based on a parameter of a log-compressed K-distribution, which quantifies the homogeneity of the echo pattern in the tumor, but is influenced by compression parameters in the ultrasonic device. The proposed algorithm estimates the parameter of the log-compressed K-distribution in a manner free from this influence. To quantify irregularities in tumor shape, pattern-spectrum-based features were newly developed in this paper. The discrimination process uses an ensemble classifier trained by a multiclass AdaBoost learning algorithm (AdaBoost.M2), combined with a sequential feature-selection process. A 10-fold cross-validation test validated the performance, and the results were compared with those of a Mahalanobis distance-based classifier and a multiclass support vector machine. A total of 200 carcinomas, 50 fibroadenomas, and 50 cysts were used in the experiments. This paper demonstrates that the combination of a classifier trained by AdaBoost.M2 and features based on the estimated parameter of a log-compressed K-distribution, as well as those of the pattern spectrum, are useful for the discrimination of tumors.

Book ChapterDOI
05 Sep 2010
TL;DR: A large margin framework to improve the discrimination of I2C distance especially for small number of local features by learning Per-Class Mahalanobis metrics is proposed and can significantly outperform the original NBNN in several prevalent image datasets.
Abstract: Image-To-Class (I2C) distance is first used in Naive-Bayes Nearest-Neighbor (NBNN) classifier for image classification and has successfully handled datasets with large intra-class variances. However, the performance of this distance relies heavily on the large number of local features in the training set and test image, which need heavy computation cost for nearest-neighbor (NN) search in the testing phase. If using small number of local features for accelerating the NN search, the performance will be poor. In this paper, we propose a large margin framework to improve the discrimination of I2C distance especially for small number of local features by learning Per-Class Mahalanobis metrics. Our I2C distance is adaptive to different class by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem with the constraints that the I2C distance from each training image to its belonging class should be less than the distance to other classes by a large margin. A gradient descent method is applied to efficiently solve this optimization problem. For efficiency and performance improved, we also adopt the idea of spatial pyramid restriction and learning I2C distance function to improve this I2C distance. We show in experiments that the proposed method can significantly outperform the original NBNN in several prevalent image datasets, and our best results can achieve state-of-the-art performance on most datasets.

Journal ArticleDOI
TL;DR: In this paper, the minimum covariance determinant estimator is used to estimate location parameters and multivariate scales, which can be used to robustify Mahalanobis distances and to identify outliers.
Abstract: Before implementing any multivariate statistical analysis based on empirical covariance matrices, it is important to check whether outliers are present because their existence could induce significant biases. In this article, we present the minimum covariance determinant estimator, which is commonly used in robust statistics to estimate location parameters and multivariate scales. These estimators can be used to robustify Mahalanobis distances and to identify outliers. Verardi and Croux (1999, Stata Journal 9: 439-453; 2010, Stata Journal 10: 313) programmed this estimator in Stata and made it available with the med command. The implemented algorithm is relatively fast and, as we show in the simulation example section, outperforms the methods already available in Stata, such as the Hadi method. © 2010 StataCorp LP.

Journal ArticleDOI
Jianbo Yu1
TL;DR: Wang et al. as discussed by the authors proposed a hidden Markov model (HMM)-based process monitoring model to explicitly address the nonlinear and multimodal characteristics in processes. But, the HMM-based monitoring models can be used for online process monitoring without too much human intervention, and the experimental results clearly demonstrate that the proposed approaches effectively captured the non-linear and multi-modal relationship in process variables and showed superior process monitoring performance compared to those conventional process monitoring approaches.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set, and study masking robustness, that is, robustness against misidentification of outliers as nonoutliers.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a fast and scalable algorithm to learn a Mahalanobis distance metric, which can be viewed as the Euclidean distance metric on the input data that have been linearly transformed.
Abstract: For many machine learning algorithms such as k-nearest neighbor ( k-NN) classifiers and k-means clustering, often their success heavily depends on the metric used to calculate distances between different data points. An effective solution for defining such a metric is to learn it from a set of labeled training samples. In this work, we propose a fast and scalable algorithm to learn a Mahalanobis distance metric. The Mahalanobis metric can be viewed as the Euclidean distance metric on the input data that have been linearly transformed. By employing the principle of margin maximization to achieve better generalization performances, this algorithm formulates the metric learning as a convex optimization problem and a positive semidefinite (p.s.d.) matrix is the unknown variable. Based on an important theorem that a p.s.d. trace-one matrix can always be represented as a convex combination of multiple rank-one matrices, our algorithm accommodates any differentiable loss function and solves the resulting optimization problem using a specialized gradient descent procedure. During the course of optimization, the proposed algorithm maintains the positive semidefiniteness of the matrix variable that is essential for a Mahalanobis metric. Compared with conventional methods like standard interior-point algorithms or the special solver used in large margin nearest neighbor , our algorithm is much more efficient and has a better performance in scalability. Experiments on benchmark data sets suggest that, compared with state-of-the-art metric learning algorithms, our algorithm can achieve a comparable classification accuracy with reduced computational complexity.

Book ChapterDOI
01 Jan 2010
TL;DR: This paper investigates a number of different metrics proposed by different communities, including Mahalanobis, Euclidean, Kullback-Leibler and Hamming distance, and concludes that the best-performing method is the MahalanOBis distance metric.
Abstract: Many learning algorithms rely on distance metrics to receive their input data Research has shown that these metrics can improve the performance of these algorithms Over the years an often popular function is the Euclidean function In this paper, we investigate a number of different metrics proposed by different communities, including Mahalanobis, Euclidean, Kullback-Leibler and Hamming distance Overall, the best-performing method is the Mahalanobis distance metric

Posted Content
TL;DR: This work proposes a fast and scalable algorithm to learn a Mahalanobis distance metric and suggests that, compared with state-of-the-art metric learning algorithms, this algorithm can achieve a comparable classification accuracy with reduced computational complexity.
Abstract: For many machine learning algorithms such as $k$-Nearest Neighbor ($k$-NN) classifiers and $ k $-means clustering, often their success heavily depends on the metric used to calculate distances between different data points. An effective solution for defining such a metric is to learn it from a set of labeled training samples. In this work, we propose a fast and scalable algorithm to learn a Mahalanobis distance metric. By employing the principle of margin maximization to achieve better generalization performances, this algorithm formulates the metric learning as a convex optimization problem and a positive semidefinite (psd) matrix is the unknown variable. a specialized gradient descent method is proposed. our algorithm is much more efficient and has a better performance in scalability compared with existing methods. Experiments on benchmark data sets suggest that, compared with state-of-the-art metric learning algorithms, our algorithm can achieve a comparable classification accuracy with reduced computational complexity.

Book ChapterDOI
14 Apr 2010
TL;DR: This chapter contains sections titled: Introduction Cluster Analysis by GK Algorithm Evolving Clustering Based on GK Similarity Distance Methodological Considerations Recursive Computational Aspects Evolving GK-Like (eGKL) ClUSTering Algorithm Conclusion.
Abstract: This chapter contains sections titled: Introduction Cluster Analysis by GK Algorithm Evolving Clustering Based on GK Similarity Distance Methodological Considerations Recursive Computational Aspects Evolving GK-Like (eGKL) Clustering Algorithm Conclusion Acknowledgments References

Journal ArticleDOI
TL;DR: In this article, a new analysis technique is developed that fully takes into account the uncertainty ellipsoid for each measurement point, which solves for the positions and orientations of the measurement systems while determining the optimal fit to each physical point.
Abstract: Precision three-dimensional metrology frequently involves the measurement of common points by several three-dimensional measurement systems. These can be laser tracking interferometers, electronic theodolites, etc. A new analysis technique has been developed that fully takes into account the uncertainty ellipsoid for each measurement point. This technique solves for the positions and orientations of the measurement systems while determining the optimal fit to each physical point. No a priori knowledge of the location and orientation of the measurement systems is required. An initial estimate for the location and orientation of the measurement systems is derived from the measurement data by assuming equal uncertainties for each data point. Then a merit function is minimized to determine the optimized location and orientation of the measurement systems and the weighted mean for the position estimates of the physical points. This merit function is based on the Mahalanobis distance from multivariate statistics and takes into account the particular shape and orientation of the three-dimensional uncertainty ellipsoid of each data point from each measurement system. This technique can utilize data from differing types of three-dimensional measurement systems including distance only and angle only measurements, evaluate the “strength” of a measurement configuration and accommodate missing data points from some of the measurement systems.

Proceedings ArticleDOI
16 Nov 2010
TL;DR: In this work, the performance of two feature extraction techniques, the Modified Direction Feature (MDF) and the gradient feature are compared on the basis of similar experimental settings and results indicated that an average error rate as low as 15.03% could be obtained using the gradient Feature and SVMs.
Abstract: Feature extraction is an important process in off-line signature verification. In this work, the performance of two feature extraction techniques, the Modified Direction Feature (MDF) and the gradient feature are compared on the basis of similar experimental settings. In addition, the performance of Support Vector Machines (SVMs) and the squared Mahalanobis distance classifier employing the Gradient Feature are also compared and reported. Without using forgeries for training, experimental results indicated that an average error rate as low as 15.03% could be obtained using the gradient feature and SVMs.

Posted Content
TL;DR: This paper uses Support Vector Machines (SVM) to fuse multiple classifiers for an offline signature system and the results are found to be promising.
Abstract: This paper uses Support Vector Machines (SVM) to fuse multiple classifiers for an offline signature system. From the signature images, global and local features are extracted and the signatures are verified with the help of Gaussian empirical rule, Euclidean and Mahalanobis distance based classifiers. SVM is used to fuse matching scores of these matchers. Finally, recognition of query signatures is done by comparing it with all signatures of the database. The proposed system is tested on a signature database contains 5400 offline signatures of 600 individuals and the results are found to be promising.

Journal ArticleDOI
TL;DR: It is shown that when the noise level is low, the asymptotic error rates of both procedures are only slightly affected, and that LR is less deteriorated by CCC-Noise compared to NDA, and the Mahalanobis distance plays a vital role in determining the relative performance of these two procedures.

Journal ArticleDOI
01 Jun 2010
TL;DR: This paper proposes a joint learning of labels and distance metric (JLLDM) approach, which is able to simultaneously address the two difficulties of training data insufficiency and inappropriate distance metric usage.
Abstract: Machine learning algorithms frequently suffer from the insufficiency of training data and the usage of inappropriate distance metric. In this paper, we propose a joint learning of labels and distance metric (JLLDM) approach, which is able to simultaneously address the two difficulties. In comparison with the existing semi-supervised learning and distance metric learning methods that focus only on label prediction or distance metric construction, the JLLDM algorithm optimizes the labels of unlabeled samples and a Mahalanobis distance metric in a unified scheme. The advantage of JLLDM is multifold: 1) the problem of training data insufficiency can be tackled; 2) a good distance metric can be constructed with only very few training samples; and 3) no radius parameter is needed since the algorithm automatically determines the scale of the metric. Extensive experiments are conducted to compare the JLLDM approach with different semi-supervised learning and distance metric learning methods, and empirical results demonstrate its effectiveness.

Journal ArticleDOI
TL;DR: In this article, the residual distance between the disjoint PC model and the Mahalanobis distance to the center of the QDA model is used to define the normal operating conditions (NOC) region.
Abstract: In process monitoring, a representative out-of-control class of samples cannot be generated. Here, it is assumed that it is possible to obtain a representative subset of samples from a single ‘in-control class’ and one class classifiers namely Q and D statistics (respectively the residual distance to the disjoint PC model and the Mahalanobis distance to the centre of the QDA model in the projected PC space), as well as support vector domain description (SVDD) are applied to disjoint PC models of the normal operating conditions (NOC) region, to categorise whether the process is in-control or out-of-control. To define the NOC region, the cumulative relative standard deviation (CRSD) and a test of multivariate normality are described and used as joint criteria. These calculations were based on the application of window principal components analysis (WPCA) which can be used to define a NOC region. The D and Q statistics and SVDD models were calculated for the NOC region and percentage predictive ability (%PA), percentage model stability (%MS) and percentage correctly classified (%CC) obtained to determine the quality of models from 100 training/test set splits. Q, D and SVDD control charts were obtained, and 90% confidence limits set up based on multivariate normality (D and Q) or SVDD D value (which does not require assumptions of normality). We introduce a method for finding an optimal radial basis function for the SVDD model and two new indices of percentage classification index (%CI) and percentage predictive index (%PI) for non-NOC samples are also defined. The methods in this paper are exemplified by a continuous process studied over 105.11 h using online HPLC. Copyright © 2010 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The objective of this study was to use a small portion of uncorrelated singular values, as robust features for the classification of sliced pork ham images, using a supervised artificial neural network classifier.

Proceedings ArticleDOI
26 Jul 2010
TL;DR: This work develops an analytic center approach to distributed estimation fusion when the cross-correlation of errors between local estimates is unknown, and proves that the analytic center is a convex combination of the local estimates.
Abstract: We develop an analytic center approach to distributed estimation fusion when the cross-correlation of errors between local estimates is unknown. Based on a set-theoretic formulation of the problem, we seek an estimate that maximizes the complementary squared Mahalanobis “distance” between the local and the desired estimates in a logarithmic average form, and the optimal value turns out to be the analytic center. For our problem, we then prove that the analytic center is a convex combination of the local estimates. As such, our proposed analytic center covariance intersection (AC-CI) algorithm could be regarded as the covariance intersection (CI) algorithm with respect to a set-theoretic optimization criteria.

Proceedings ArticleDOI
TL;DR: A filter-type feature selection algorithm that selects reduced feature sets using the Mahalanobis distance measure, and develops classifiers from the sets, and demonstrates that as few as 10-60 features at various levels of embedding can be used to create a classifier that gives comparable results to the full suite of 274 features.
Abstract: Steganalysis is used to detect hidden content in innocuous images. Many successful steganalysis algorithms use a large number of features relative to the size of the training set and suffer from a "curse of dimensionality": large number of feature values relative to training data size. High dimensionality of the feature space can reduce classification accuracy, obscure important features for classification, and increase computational complexity. This paper presents a filter-type feature selection algorithm that selects reduced feature sets using the Mahalanobis distance measure, and develops classifiers from the sets. The experiment is applied to a well-known JPEG steganalyzer, and shows that using our approach, reduced-feature steganalyzers can be obtained that perform as well as the original steganalyzer. The steganalyzer is that of Pevn´y et al. (SPIE, 2007) that combines DCT-based feature values and calibrated Markov features. Five embedding algorithms are used. Our results demonstrate that as few as 10-60 features at various levels of embedding can be used to create a classifier that gives comparable results to the full suite of 274 features.

Posted Content
TL;DR: An overview of gesture recognition in real time using the concepts of correlation and Mahalanobis distance is presented and the six universal emotional categories namely joy, anger, fear, disgust, sadness and surprise are considered.
Abstract: Augmenting human computer interaction with automated analysis and synthesis of facial expressions is a goal towards which much research effort has been devoted recently. Facial gesture recognition is one of the important component of natural human-machine interfaces; it may also be used in behavioural science, security systems and in clinical practice. Although humans recognise facial expressions virtually without effort or delay, reliable expression recognition by machine is still a challenge. The face expression recognition problem is challenging because different individuals display the same expression differently. This paper presents an overview of gesture recognition in real time using the concepts of correlation and Mahalanobis distance.We consider the six universal emotional categories namely joy, anger, fear, disgust, sadness and surprise.