scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 2008"


Book ChapterDOI
15 Sep 2008
TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.
Abstract: The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms in to taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes cluster analysis (unsupervised learning) from discriminant analysis (supervised learning). The objective of cluster analysis is to simply find a convenient and valid organization of the data, not to establish rules for separating future data into categories.

4,255 citations


Book
25 Aug 2008
TL;DR: In this paper, a short excursion into Matrix Algebra Moving to Higher Dimensions Multivariate Distributions Theory of the Multinormal Theory of Estimation Hypothesis Testing is described. But it is not discussed in detail.
Abstract: I Descriptive Techniques: Comparison of Batches.- II Multivariate Random Variables: A Short Excursion into Matrix Algebra Moving to Higher Dimensions Multivariate Distributions Theory of the Multinormal Theory of Estimation Hypothesis Testing.- III Multivariate Techniques: Decomposition of Data Matrices by Factors Principal Components Analysis Factor Analysis Cluster Analysis Discriminant Analysis.- Correspondence Analysis.- Canonical Correlation Analysis.- Multidimensional Scaling.- Conjoint Measurement Analysis.- Application in Finance.- Computationally Intensive Techniques.- A: Symbols and Notations.- B: Data.- Bibliography.- Index.

1,081 citations


Book
01 Jan 2008
TL;DR: Novel computational approaches for deep learning of behaviors as opposed to just static patterns will be presented, based on structured nonnegative matrix factorizations of matrices that encode observation frequencies of behaviors.
Abstract: Future Directions -- Semi-supervised Multiple Classifier Systems: Background and Research Directions -- Boosting -- Boosting GMM and Its Two Applications -- Boosting Soft-Margin SVM with Feature Selection for Pedestrian Detection -- Observations on Boosting Feature Selection -- Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis -- Combination Methods -- Decoding Rules for Error Correcting Output Code Ensembles -- A Probability Model for Combining Ranks -- EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks -- Mixture of Gaussian Processes for Combining Multiple Modalities -- Dynamic Classifier Integration Method -- Recursive ECOC for Microarray Data Classification -- Using Dempster-Shafer Theory in MCF Systems to Reject Samples -- Multiple Classifier Fusion Performance in Networked Stochastic Vector Quantisers -- On Deriving the Second-Stage Training Set for Trainable Combiners -- Using Independence Assumption to Improve Multimodal Biometric Fusion -- Design Methods -- Half-Against-Half Multi-class Support Vector Machines -- Combining Feature Subsets in Feature Selection -- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments -- Using Decision Tree Models and Diversity Measures in the Selection of Ensemble Classification Models -- Ensembles of Classifiers from Spatially Disjoint Data -- Optimising Two-Stage Recognition Systems -- Design of Multiple Classifier Systems for Time Series Data -- Ensemble Learning with Biased Classifiers: The Triskel Algorithm -- Cluster-Based Cumulative Ensembles -- Ensemble of SVMs for Incremental Learning -- Performance Analysis -- Design of a New Classifier Simulator -- Evaluation of Diversity Measures for Binary Classifier Ensembles -- Which Is the Best Multiclass SVM Method? An Empirical Study -- Over-Fitting in Ensembles of Neural Network Classifiers Within ECOC Frameworks -- Between Two Extremes: Examining Decompositions of the Ensemble Objective Function -- Data Partitioning Evaluation Measures for Classifier Ensembles -- Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation -- Ensemble Confidence Estimates Posterior Probability -- Applications -- Using Domain Knowledge in the Random Subspace Method: Application to the Classification of Biomedical Spectra -- An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble -- Speaker Verification Using Adapted User-Dependent Multilevel Fusion -- Multi-modal Person Recognition for Vehicular Applications -- Using an Ensemble of Classifiers to Audit a Production Classifier -- Analysis and Modelling of Diversity Contribution to Ensemble-Based Texture Recognition Performance -- Combining Audio-Based and Video-Based Shot Classification Systems for News Videos Segmentation -- Designing Multiple Classifier Systems for Face Recognition -- Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data.

1,073 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n → 0, and obtain explicit rates.
Abstract: This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation of covariance matrices is important in a number of areas of statistical analysis, including dimension reduction by principal component analysis (PCA), classification by linear or quadratic discriminant analysis (LDA and QDA), establishing independence and conditional independence relations in the context of graphical models, and setting confidence intervals on linear functions of the means of the components. In recent years, many application areas where these tools are used have been dealing with very high-dimensional datasets, and sample sizes can be very small relative to dimension. Examples include genetic data, brain imaging, spectroscopic imaging, climate data and many others. It is well known by now that the empirical covariance matrix for samples of size n from a p-variate Gaussian distribution, Np(μ, � p), is not a good estimator of the population covariance if p is large. Many results in random matrix theory illustrate this, from the classical Mary law [29] to the more recent work of Johnstone and his students on the theory of the largest eigenvalues [12, 23, 30] and associated eigenvectors [24]. However, with the exception of a method for estimating the covariance spectrum [11], these probabilistic results do not offer alternatives to the sample covariance matrix. Alternative estimators for large covariance matrices have therefore attracted a lot of attention recently. Two broad classes of covariance estimators have emerged: those that rely on a natural ordering among variables, and assume that variables

1,052 citations


Journal ArticleDOI
TL;DR: A novel scheme of emotion-specific multilevel dichotomous classification (EMDC) is developed and compared with direct multiclass classification using the pLDA, with improved recognition accuracy of 95 percent and 70 percent for subject-dependent and subject-independent classification, respectively.
Abstract: Little attention has been paid so far to physiological signals for emotion recognition compared to audiovisual emotion channels such as facial expression or speech. This paper investigates the potential of physiological signals as reliable channels for emotion recognition. All essential stages of an automatic recognition system are discussed, from the recording of a physiological data set to a feature-based multiclass classification. In order to collect a physiological data set from multiple subjects over many weeks, we used a musical induction method that spontaneously leads subjects to real emotional states, without any deliberate laboratory setting. Four-channel biosensors were used to measure electromyogram, electrocardiogram, skin conductivity, and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to find the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by classification results. Classification of four musical emotions (positive/high arousal, negative/high arousal, negative/low arousal, and positive/low arousal) is performed by using an extended linear discriminant analysis (pLDA). Furthermore, by exploiting a dichotomic property of the 2D emotion model, we develop a novel scheme of emotion-specific multilevel dichotomous classification (EMDC) and compare its performance with direct multiclass classification using the pLDA. An improved recognition accuracy of 95 percent and 70 percent for subject-dependent and subject-independent classification, respectively, is achieved by using the EMDC scheme.

953 citations


Journal ArticleDOI
TL;DR: In this article, a linear combination of simple rules derived from the data is used for general regression and classification models, where each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables.
Abstract: General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects.

874 citations


Book
01 Jan 2008
TL;DR: In this paper, the authors present an approach to multivariate data analysis for paleontological data, which is based on the allometric equation and a set of properties of the data.
Abstract: Preface. Acknowledgments. 1 Introduction. 1.1 The nature of paleontological data. 1.2 Advantages and pitfalls of paleontological data analysis. 1.3 Software. 2 Basic statistical methods. 2.1 Introduction. 2.2 Statistical distributions. 2.3 Shapiro-Wilk test for normal distribution. 2.4 F test for equality of variances. 2.5 Student's t test and Welch test for equality of means. 2.6 Mann-Whitney U test for equality of medians. 2.7 Kolmogorov-Smirnov test for equality of distributions. 2.8 Permutation and resampling. 2.9 One-way ANOVA. 2.10 Kruskal-Wallis test. 2.11 Linear correlation. 2.12 Non-parametric tests for correlation. 2.13 Linear regression. 2.14 Reduced major axis regression. 2.15 Nonlinear curve fitting. 2.16 Chi-square test. 3 Introduction to multivariate data analysis. 3.1 Approaches to multivariate data analysis. 3.2 Multivariate distributions. 3.3 Parametric multivariate tests. 3.4 Non-parametric multivariate tests. 3.5 Hierarchical cluster analysis. 3.5 K-means cluster analysis. 4 Morphometrics. 4.1 Introduction. 4.2 The allometric equation. 4.3 Principal components analysis (PCA). 4.4 Multivariate allometry. 4.5 Discriminant analysis for two groups. 4.6 Canonical variate analysis (CVA). 4.7 MANOVA. 4.8 Fourier shape analysis. 4.9 Elliptic Fourier analysis. 4.10 Eigenshape analysis. 4.11 Landmarks and size measures. 4.12 Procrustean fitting. 4.13 PCA of landmark data. 4.14 Thin-plate spline deformations. 4.15 Principal and partial warps. 4.16 Relative warps. 4.17 Regression of partial warp scores. 4.18 Disparity measures. 4.19 Point distribution statistics. 4.20 Directional statistics. Case study: The ontogeny of a Silurian trilobite. 5 Phylogenetic analysis. 5.1 Introduction. 5.2 Characters. 5.3 Parsimony analysis. 5.4 Character state reconstruction. 5.5 Evaluation of characters and tree topologies. 5.6 Consensus trees. 5.7 Consistency index. 5.8 Retention index. 5.9 Bootstrapping. 5.10 Bremer support. 5.11 Stratigraphical congruency indices. 5.12 Phylogenetic analysis with Maximum Likelihood. Case study: The systematics of heterosporous ferns. 6 Paleobiogeography and paleoecology. 6.1 Introduction. 6.2 Diversity indices. 6.3 Taxonomic distinctness. 6.4 Comparison of diversity indices. 6.5 Abundance models. 6.6 Rarefaction. 6.7 Diversity curves. 6.8 Size-frequency and survivorship curves. 6.9 Association similarity indices for presence/absence data. 6.10 Association similarity indices for abundance data. 6.11 ANOSIM and NPMANOVA. 6.12 Correspondence analysis. 6.13 Principal Coordinates analysis (PCO). 6.14 Non-metric Multidimensional Scaling (NMDS). 6.15 Seriation. Case study: Ashgill brachiopod paleocommunities from East China. 7 Time series analysis. 7.1 Introduction. 7.2 Spectral analysis. 7.3 Autocorrelation. 7.4 Cross-correlation. 7.5 Wavelet analysis. 7.6 Smoothing and filtering. 7.7 Runs test. Case study: Sepkoski's generic diversity curve for the Phanerozoic. 8 Quantitative biostratigraphy. 8.1 Introduction. 8.2 Parametric confidence intervals on stratigraphic ranges. 8.3 Non-parametric confidence intervals on stratigraphic ranges. 8.4 Graphic correlation. 8.5 Constrained optimisation. 8.6 Ranking and scaling. 8.7 Unitary Associations. 8.8 Biostratigraphy by ordination. 8.9 What is the best method for quantitative biostratigraphy?. Appendix A: Plotting techniques. Appendix B: Mathematical concepts and notation. References. Index

867 citations


Journal ArticleDOI
TL;DR: The RF methodology is attractive for use in classification problems when the goals of the study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables.

854 citations


Journal ArticleDOI
TL;DR: This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance.
Abstract: This paper proposes and evaluates the application of support vector machine (SVM) to classify upper limb motions using myoelectric signals. It explores the optimum configuration of SVM-based myoelectric control, by suggesting an advantageous data segmentation technique, feature set, model selection approach for SVM, and postprocessing methods. This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance. A SVM, as the core of classification in myoelectric control, is compared with two commonly used classifiers: linear discriminant analysis (LDA) and multilayer perceptron (MLP) neural networks. It demonstrates exceptional accuracy, robust performance, and low computational load. The entropy of the output of the classifier is also examined as an online index to evaluate the correctness of classification; this can be used by online training for long-term myoelectric control operations.

730 citations


Book
28 Aug 2008
TL;DR: Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, and classification and regression trees.
Abstract: Remarkable advances in computation and data storage and the ready availability of huge data sets have been the keys to the growth of the new disciplines of data mining and machine learning, while the enormous success of the Human Genome Project has opened up the field of bioinformatics. These exciting developments, which led to the introduction of many innovative statistical tools for high-dimensional data analysis, are described here in detail. The author takes a broad perspective; for the first time in a book on multivariate analysis, nonlinear methods are discussed in detail as well as linear methods. Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, multivariate reduced-rank regression, nonlinear manifold learning, bagging, boosting, random forests, independent component analysis, support vector machines, and classification and regression trees. Another unique feature of this book is the discussion of database management systems. This book is appropriate for advanced undergraduate students, graduate students, and researchers in statistics, computer science, artificial intelligence, psychology, cognitive sciences, business, medicine, bioinformatics, and engineering. Familiarity with multivariable calculus, linear algebra, and probability and statistics is required. The book presents a carefully-integrated mixture of theory and applications, and of classical and modern multivariate statistical techniques, including Bayesian methods. There are over 60 interesting data sets used as examples in the book, over 200 exercises, and many color illustrations and photographs.

698 citations


Journal ArticleDOI
TL;DR: A nonasymptotic oracle inequality is proved for the empirical risk minimizer with Lasso penalty for high-dimensional generalized linear models with Lipschitz loss functions, and the penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm.
Abstract: We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with Lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least squares regression is also discussed.

Proceedings ArticleDOI
05 Jul 2008
TL;DR: This paper proposes a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors, and treats each sub-space as a point in the Grassmann space, and performs feature extraction and classification in the same space.
Abstract: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors. By treating subspaces as basic elements, we can make learning algorithms adapt naturally to the problems with linear invariant structures. We propose a unifying view on the subspace-based learning method by formulating the problems on the Grassmann manifold, which is the set of fixed-dimensional linear subspaces of a Euclidean space. Previous methods on the problem typically adopt an inconsistent strategy: feature extraction is performed in the Euclidean space while non-Euclidean distances are used. In our approach, we treat each sub-space as a point in the Grassmann space, and perform feature extraction and classification in the same space. We show feasibility of the approach by using the Grassmann kernel functions such as the Projection kernel and the Binet-Cauchy kernel. Experiments with real image databases show that the proposed method performs well compared with state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors proposed Feature Annealed Independence Rules (FAIR) to select a subset of important features for high-dimensional classification, and the conditions under which all the important features can be selected by the two-sample t-statistic are established.
Abstract: Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

01 Jan 2008
TL;DR: This paper presents statistical methods to identify extreme values and data outliers in the ECDF- or CP-plot, a mighty tool in graphical data analysis, and some common mistakes in geochemical mapping.
Abstract: Preface. Acknowledgements. About the Authors. 1. Introduction. 1.1 The Kola Ecogeochemistry Project. 2. Preparing the Data for Use in R and DAS+R. 2.1 Required data format for import into R and DAS+R. 2.2 The detection limit problem. 2.3 Missing Values. 2.4 Some "typical" problems encountered when editing a laboratory data report file to a DAS+R file. 2.5 Appending and linking data files. 2.6 Requirements for a geochemical database. 2.7 Summary. 3. Graphics to Display the Data Distribution. 3.1 The one-dimensional scatterplot. 3.2 The histogram. 3.3 The density trace. 3.4 Plots of the distribution function. 3.5 Boxplots. 3.6 Combination of histogram, density trace, one-dimensional scatterplot, boxplot, and ECDF-plot. 3.7 Combination of histogram, boxplot or box-and-whisker plot, ECDF-plot, and CP-plot. 3.8 Summary. 4. Statistical Distribution Measures. 4.1 Central value. 4.2 Measures of spread. 4.3 Quartiles, quantiles and percentiles. 4.4 Skewness. 4.5 Kurtosis. 4.6 Summary table of statistical distribution measures. 4.7 Summary. 5. Mapping Spatial Data. 5.1 Map coordinate systems (map projection). 5.2 Map scale. 5.3 Choice of the base map for geochemical mapping 5.4 Mapping geochemical data with proportional dots. 5.5 Mapping geochemical data using classes. 5.6 Surface maps constructed with smoothing techniques. 5.7 Surface maps constructed with kriging. 5.8 Colour maps. 5.9 Some common mistakes in geochemical mapping. 5.10 Summary. 6. Further Graphics for Exploratory Data Analysis. 6.1 Scatterplots (xy-plots). 6.2 Linear regression lines. 6.3 Time trends. 6.4 Spatial trends. 6.5 Spatial distance plot. 6.6 Spiderplots (normalized multi-element diagrams). 6.7 Scatterplot matrix. 6.8 Ternary plots. 6.9 Summary. 7. Defining Background and Threshold, Identification of Data Outliers and Element Sources. 7.1 Statistical methods to identify extreme values and data outliers. 7.2 Detecting outliers and extreme values in the ECDF- or CP-plot. 7.3 Including the spatial distribution in the definition of background. 7.4 Methods to distinguish geogenic from anthropogenic element sources. 7.5 Summary. 8. Comparing Data in Tables and Graphics. 8.1 Comparing data in tables. 8.2 Graphical comparison of the data distributions of several data sets. 8.3 Comparing the spatial data structure. 8.4 Subset creation - a mighty tool in graphical data analysis. 8.5 Data subsets in scatterplots. 8.6 Data subsets in time and spatial trend diagrams. 8.7 Data subsets in ternary plots. 8.8 Data subsets in the scatterplot matrix. 8.9 Data subsets in maps. 8.10 Summary. 9. Comparing Data Using Statistical Tests. 9.1 Tests for distribution (Kolmogorov-Smirnov and Shapiro-Wilk tests). 9.2 The one-sample t-test (test for the central value). 9.3 Wilcoxon signed-rank test. 9.4 Comparing two central values of the distributions of independent data groups. 9.5 Comparing two central values of matched pairs of data. 9.6 Comparing the variance of two test. 9.7 Comparing several central values. 9.8 Comparing the variance of several data groups. 9.9 Comparing several central values of dependent groups. 9.10 Summary. 10. Improving Data Behaviour for Statistical Analysis: Ranking and Transformations. 10.1 Ranking/sorting. 10.2 Non-linear transformations. 10.3 Linear transformations. 10.4 Preparing a data set for multivariate data analysis. 10.5 Transformations for closed number systems. 10.6 Summary. 11. Correlation. 11.1 Pearson correlation. 11.2 Spearman rank correlation. 11.3 Kendall-tau correlation. 11.4 Robust correlation coefficients. 11.5 When is a correlation coefficient significant? 11.6 Working with many variables. 11.7 Correlation analysis and inhomogeneous data. 11.8 Correlation results following addictive logratio or central logratio transformations. 11.9 Summary. 12. Multivariate Graphics . 12.1 Profiles. 12.2 Stars. 12.3 Segments. 12.4 Boxes. 12.5 Castles and trees. 12.6 Parallel coordinates plot. 12.7 Summary. 13. Multivariate Outlier Detection. 13.1 Univariate versus multivariate outlier detection. 13.2 Robust versus non-robust outlier detection. 13.3 The chi-square plot. 13.4 Automated multivariate outlier detection and visualization. 13.5 Other graphical approaches for identifying outliers and groups. 13.6 Summary. 14. Principal Component Analysis (PCA) and Factor Analysis (FA). 14.1 Conditioning the data for PCA and FA. 14.2 Principal component analysis (PCA). 14.3 Factor Analysis. 14.4 Summary. 15. Cluster Analysis. 15.1 Possible data problems in the context of cluster analysis. 15.2 Distance measures. 15.3 Clustering samples. 15.4 Clustering variables. 15.5 Evaluation of cluster validity. 15.6 Selection of variables for cluster analysis. 15.7 Summary. 16. Regression Analysis (RA). 16.1 Data requirements for regression analysis. 16.2 Multiple regression. 16.3 Classical least squares (LS) regression. 16.4 Robust regression. 16.5 Model selection in regression analysis. 16.6 Other regression methods. 16.7 Summary. 17. Discriminant Analysis (DA) and Other Knowledge-Based Classification Methods. 17.1 Methods for discriminant analysis. 17.2 Data requirements for discriminant analysis. 17.3 Visualisation of the discriminant function. 17.4 Prediction with discriminant analysis. 17.5 Exploring for similar data structures. 17.6 Other knowledge-based classification methods/ 17.7 Summary. 18. Quality Control (QC). 18.1 Randomised samples. 18.2 Trueness. 18.3 Accuracy. 18.4 Precision. 18.5 Analysis of variance (ANOVA) 18.6 Using Maps to assess data quality. 18.7 Variables analysed by two different analytical techniques. 18.8 Working with censored data - a practical example. 18.9 Summary. 19. Introduction to R and Structure of the DAS+R Graphical User Interface. 19.1 R. 19.2 R-scripts. 19.3 A brief overview of relevant R commands. 19.4 DAS+R. 19.5 Summary. References. Index.

Journal ArticleDOI
TL;DR: By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques, and there is no eigenvector computation involved, which is a huge save of both time and memory.
Abstract: Linear Discriminant Analysis (LDA) has been a popular method for extracting features that preserves class separability. The projection functions of LDA are commonly obtained by maximizing the between-class covariance and simultaneously minimizing the within-class covariance. It has been widely used in many fields of information processing, such as machine learning, data mining, information retrieval, and pattern recognition. However, the computation of LDA involves dense matrices eigendecomposition, which can be computationally expensive in both time and memory. Specifically, LDA has O(mnt + t3) time complexity and requires O(mn + mt + nt) memory, where m is the number of samples, n is the number of features, and t = min(m,n). When both m and n are large, it is infeasible to apply LDA. In this paper, we propose a novel algorithm for discriminant analysis, called Spectral Regression Discriminant Analysis (SRDA). By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of regularized least squares problems, and there is no eigenvector computation involved, which is a huge save of both time and memory. Our theoretical analysis shows that SRDA can be computed with O(mn) time and O(ms) memory, where .s(les n) is the average number of nonzero features in each sample. Extensive experimental results on four real-world data sets demonstrate the effectiveness and efficiency of our algorithm.

Journal ArticleDOI
TL;DR: Sexual dimorphism of these modern people contrasts markedly with that of the ancient Native Americans, and discriminant functions like those presented in this paper should be used with caution on populations other than those for which they were developed.
Abstract: The accuracy of sex determinations based on visual assessments of the mental eminence, orbital margin, glabellar area, nuchal area, and mastoid process was tested on a series of 304 skulls of known age and sex from people of European American, African American, and English ancestry as well as on an ancient Native American sample of 156 individuals whose sex could be reliably determined based on pelvic morphology. Ordinal scores of these sexually dimorphic traits were used to compute sex determination discriminant functions. Linear, kth-nearest-neighbor, logistic, and quadratic discriminant analysis models were evaluated based on their capacity to minimize both misclassifications and sex biases in classification errors. Logistic regression discriminant analysis produced the best results: a logistic model containing all five cranial trait scores correctly classified 88% of the modern skulls with a negligible sex bias of 0.1%. Adding age at death, birth year, and population affinity to the model did not appreciably improve its performance. For the ancient Native American sample, the best logistic regression model assigned the correct pelvic sex to 78% of the individuals with a sex bias of only 0.2%. Similar cranial trait frequency distributions were found in same-sex comparisons of the modern African American, European American, and English samples. The sexual dimorphism of these modern people contrasts markedly with that of the ancient Native Americans. Because of such population differences, discriminant functions like those presented in this paper should be used with caution on populations other than those for which they were developed.

Journal ArticleDOI
TL;DR: In this paper, the authors focus on high-breakdown methods, which can deal with a substantial fraction of outliers in the data, and give an overview of recent high breakdown robust methods for multivariate settings such as covariance estimation, multiple and multivariate regression, discriminant analysis, principal components and multiivariate calibration.
Abstract: When applying a statistical method in practice it often occurs that some observations deviate from the usual assumptions. However, many classical methods are sensitive to outliers. The goal of robust statistics is to develop methods that are robust against the possibility that one or several unannounced outliers may occur anywhere in the data. These methods then allow to detect outlying observations by their residuals from a robust fit. We focus on high-breakdown methods, which can deal with a substantial fraction of outliers in the data. We give an overview of recent high-breakdown robust methods for multivariate settings such as covariance estimation, multiple and multivariate regression, discriminant analysis, principal components and multivariate calibration.

Journal ArticleDOI
TL;DR: This paper introduces a statistical technique, Support Vector Machines (SVM), which is considered by the Deutsche Bundesbank as an alternative for company rating and confirms that the SVM outperforms both DA and Logit on bootstrapped samples.
Abstract: This paper introduces a statistical technique, Support Vector Machines (SVM), which is considered by the Deutsche Bundesbank as an alternative for company rating A special attention is paid to the features of the SVM which provide a higher accuracy of company classification into solvent and insolvent The advantages and disadvantages of the method are discussed The comparison of the SVM with more traditional approaches such as logistic regression (Logit) and discriminant analysis (DA) is made on the Deutsche Bundesbank data of annual income statements and balance sheets of German companies The out-of-sample accuracy tests confirm that the SVM outperforms both DA and Logit on bootstrapped samples

Journal ArticleDOI
TL;DR: Experiments comparing the proposed approach with some other popular subspace methods on the FERET, ORL, AR, and GT databases show that the method consistently outperforms others.
Abstract: This work proposes a subspace approach that regularizes and extracts eigenfeatures from the face image. Eigenspace of the within-class scatter matrix is decomposed into three subspaces: a reliable subspace spanned mainly by the facial variation, an unstable subspace due to noise and finite number of training samples, and a null subspace. Eigenfeatures are regularized differently in these three subspaces based on an eigenspectrum model to alleviate problems of instability, overfitting, or poor generalization. This also enables the discriminant evaluation performed in the whole space. Feature extraction or dimensionality reduction occurs only at the final stage after the discriminant assessment. These efforts facilitate a discriminative and a stable low-dimensional feature representation of the face image. Experiments comparing the proposed approach with some other popular subspace methods on the FERET, ORL, AR, and GT databases show that our method consistently outperforms others.

Journal ArticleDOI
01 Apr 2008
TL;DR: This work proposes a new manifold learning technique called discriminant locally linear embedding (DLLE), in which the local geometric properties within each class are preserved according to the locally linear embeddedding (LLE) criterion, and the separability between different classes is enforced by maximizing margins between point pairs on different classes.
Abstract: Graph-embedding along with its linearization and kernelization provides a general framework that unifies most traditional dimensionality reduction algorithms. From this framework, we propose a new manifold learning technique called discriminant locally linear embedding (DLLE), in which the local geometric properties within each class are preserved according to the locally linear embedding (LLE) criterion, and the separability between different classes is enforced by maximizing margins between point pairs on different classes. To deal with the out-of-sample problem in visual recognition with vector input, the linear version of DLLE, i.e., linearization of DLLE (DLLE/L), is directly proposed through the graph-embedding framework. Moreover, we propose its multilinear version, i.e., tensorization of DLLE, for the out-of-sample problem with high-order tensor input. Based on DLLE, a procedure for gait recognition is described. We conduct comprehensive experiments on both gait and face recognition, and observe that: 1) DLLE along its linearization and tensorization outperforms the related versions of linear discriminant analysis, and DLLE/L demonstrates greater effectiveness than the linearization of LLE; 2) algorithms based on tensor representations are generally superior to linear algorithms when dealing with intrinsically high-order data; and 3) for human gait recognition, DLLE/L generally obtains higher accuracy than state-of-the-art gait recognition algorithms on the standard University of South Florida gait database.

Journal ArticleDOI
TL;DR: The theoretical analysis of the effects of PCA on the discrimination power of the projected subspace is presented from a general pattern classification perspective for two possible scenarios: when PCA is used as a simple dimensionality reduction tool and when it is used to recondition an ill-posed LDA formulation.
Abstract: Dimensionality reduction is a necessity in most hyperspectral imaging applications. Tradeoffs exist between unsupervised statistical methods, which are typically based on principal components analysis (PCA), and supervised ones, which are often based on Fisher's linear discriminant analysis (LDA), and proponents for each approach exist in the remote sensing community. Recently, a combined approach known as subspace LDA has been proposed, where PCA is employed to recondition ill-posed LDA formulations. The key idea behind this approach is to use a PCA transformation as a preprocessor to discard the null space of rank-deficient scatter matrices, so that LDA can be applied on this reconditioned space. Thus, in theory, the subspace LDA technique benefits from the advantages of both methods. In this letter, we present a theoretical analysis of the effects (often ill effects) of PCA on the discrimination power of the projected subspace. The theoretical analysis is presented from a general pattern classification perspective for two possible scenarios: (1) when PCA is used as a simple dimensionality reduction tool and (2) when it is used to recondition an ill-posed LDA formulation. We also provide experimental evidence of the ineffectiveness of both scenarios for hyperspectral target recognition applications.

Journal ArticleDOI
TL;DR: Experimental results on Yale and CMU PIE face databases convince us that the proposed method provides a better representation of the class information and obtains much higher recognition accuracies.

Book ChapterDOI
10 Aug 2008
TL;DR: It is shown how classical statistical tools such as Principal Component Analysis and Fisher Linear Discriminant Analysis can be used for efficiently preprocessing the leakage traces and evaluates the effectiveness of two data dimensionality reduction techniques for constructing subspace-based template attacks.
Abstract: The power consumption and electromagnetic radiation are among the most extensively used side-channels for analyzing physically observable cryptographic devices. This paper tackles three important questions in this respect. First, we compare the effectiveness of these two side-channels. We investigate the common belief that electromagnetic leakages lead to more powerful attacks than their power consumption counterpart. Second we study the best combination of the power and electromagnetic leakages. A quantified analysis based on sound information theoretic and security metrics is provided for these purposes. Third, we evaluate the effectiveness of two data dimensionality reduction techniques for constructing subspace-based template attacks. Selecting automatically the meaningful time samples in side-channel leakage traces is an important problem in the application of template attacks and it usually relies on heuristics. We show how classical statistical tools such as Principal Component Analysis and Fisher Linear Discriminant Analysis can be used for efficiently preprocessing the leakage traces.

Journal ArticleDOI
TL;DR: A novel algorithm for image feature extraction, namely, the two-dimensional locality preserving projections (2DLPP), which directly extracts the proper features from image matrices based on locality preserving criterion is proposed.

Journal ArticleDOI
TL;DR: The results documented in this study may provide a reference for the optimum quantitative EEG features to use in developing and enhancing neonatal seizure detection algorithms.

Journal ArticleDOI
01 Feb 2008
TL;DR: Experimental results show that the proposed GSVD-ILDA algorithm gives the same performance as the LDA/GSVD with much smaller computational complexity, and also gives better classification performance than the other recently proposed ILDA algorithms.
Abstract: Dimensionality reduction methods have been successfully employed for face recognition. Among the various dimensionality reduction algorithms, linear (Fisher) discriminant analysis (LDA) is one of the popular supervised dimensionality reduction methods, and many LDA-based face recognition algorithms/systems have been reported in the last decade. However, the LDA-based face recognition systems suffer from the scalability problem. To overcome this limitation, an incremental approach is a natural solution. The main difficulty in developing the incremental LDA (ILDA) is to handle the inverse of the within-class scatter matrix. In this paper, based on the generalized singular value decomposition LDA (LDA/GSVD), we develop a new ILDA algorithm called GSVD-ILDA. Different from the existing techniques in which the new projection matrix is found in a restricted subspace, the proposed GSVD-ILDA determines the projection matrix in full space. Extensive experiments are performed to compare the proposed GSVD-ILDA with the LDA/GSVD as well as the existing ILDA methods using the face recognition technology face database and the Carneggie Mellon University Pose, Illumination, and Expression face database. Experimental results show that the proposed GSVD-ILDA algorithm gives the same performance as the LDA/GSVD with much smaller computational complexity. The experimental results also show that the proposed GSVD-ILDA gives better classification performance than the other recently proposed ILDA algorithms.

Journal ArticleDOI
TL;DR: This work proposes a novel semisupervised method for dimensionality reduction called Maximum Margin Projection (MMP), which aims at maximizing the margin between positive and negative examples at each local neighborhood.
Abstract: One of the fundamental problems in Content-Based Image Retrieval (CBIR) has been the gap between low-level visual features and high-level semantic concepts. To narrow down this gap, relevance feedback is introduced into image retrieval. With the user-provided information, a classifier can be learned to distinguish between positive and negative examples. However, in real-world applications, the number of user feedbacks is usually too small compared to the dimensionality of the image space. In order to cope with the high dimensionality, we propose a novel semisupervised method for dimensionality reduction called Maximum Margin Projection (MMP). MMP aims at maximizing the margin between positive and negative examples at each local neighborhood. Different from traditional dimensionality reduction algorithms such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which effectively see only the global euclidean structure, MMP is designed for discovering the local manifold structure. Therefore, MMP is likely to be more suitable for image retrieval, where nearest neighbor search is usually involved. After projecting the images into a lower dimensional subspace, the relevant images get closer to the query image; thus, the retrieval performance can be enhanced. The experimental results on Corel image database demonstrate the effectiveness of our proposed algorithm.

Journal ArticleDOI
TL;DR: This work defines a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion and shows that the aggregate satisfies sharp oracle inequalities under some general assumptions.
Abstract: Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation.

Book
05 Jun 2008
TL;DR: This paper discusses collection, preparation, testing, and checking the data for consumer research, as well as further methods in multi-dimensional analysis.
Abstract: PART ONE: COLLECTING, PREPARING AND CHECKING THE DATA Measurement, Errors and Data for Consumer Research Secondary Consumer Data Primary Data Collection Data Preparation and Descriptive Statistics PART TWO: SAMPLING, PROBABILITY AND INFERENCE Sampling Hypothesis Testing Analysis of Variance PART THREE: RELATIONSHIPS AMONG VARIABLES Correlation and Regression Association, Log-linear Analysis and Canonical Correlation Analysis Factor Analysis and Principal Component Analysis PART FOUR: CLASSIFICATION AND SEGMENTATION TECHNIQUES Discriminant Analysis Cluster Analysis Multidimensional Scaling Correspondence Analysis PART FIVE: FURTHER METHODS IN MULTIVARIATE ANALYSIS Structural Equation Models Discrete Choice Models The End (and Beyond)

Journal ArticleDOI
12 Aug 2008
TL;DR: The design and performance of a brain-computer interface (BCI) system for real-time single-trial binary classification of viewed images based on participant-specific dynamic brain response signatures in high-density electroencephalographic (EEG) data acquired during a rapid serial visual presentation (RSVP) task is reported.
Abstract: We report the design and performance of a brain-computer interface (BCI) system for real-time single-trial binary classification of viewed images based on participant-specific dynamic brain response signatures in high-density (128-channel) electroencephalographic (EEG) data acquired during a rapid serial visual presentation (RSVP) task. Image clips were selected from a broad area image and presented in rapid succession (12/s) in 4.1-s bursts. Participants indicated by subsequent button press whether or not each burst of images included a target airplane feature. Image clip creation and search path selection were designed to maximize user comfort and maintain user awareness of spatial context. Independent component analysis (ICA) was used to extract a set of independent source time-courses and their minimally-redundant low-dimensional informative features in the time and time-frequency amplitude domains from 128-channel EEG data recorded during clip burst presentations in a training session. The naive Bayes fusion of two Fisher discriminant classifiers, computed from the 100 most discriminative time and time-frequency features, respectively, was used to estimate the likelihood that each clip contained a target feature. This estimator was applied online in a subsequent test session. Across eight training/test session pairs from seven participants, median area under the receiver operator characteristic curve, by tenfold cross validation, was 0.97 for within-session and 0.87 for between-session estimates, and was nearly as high (0.83) for targets presented in bursts that participants mistakenly reported to include no target features.