scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 1993"


Journal ArticleDOI
TL;DR: In this article, a linear discriminant function was used to find a linear combination of markers to maximize the sensitivity over the entire specificity range uniformly under the multivariate normal distribution model with proportional covariance matrices.
Abstract: The receiver operating characteristic (ROC) curve is a simple and meaningful measure to assess the usefulness of diagnostic markers. To use the information carried by multiple markers, we note that Fisher's linear discriminant function provides a linear combination of markers to maximize the sensitivity over the entire specificity range uniformly under the multivariate normal distribution model with proportional covariance matrices. With no restriction on covariance matrices, we also provide a solution of the best linear combination of markers in the sense that the area under the ROC curve of this combination is maximized among all possible linear combinations. We illustrate both situations discussed in the article with a cancer clinical trial data.

264 citations


Book
01 Jan 1993
TL;DR: In this paper, the authors present a multidimensional scaling analysis of individual differences in the context of social representation fields, including common knowledge and individual positions, and the study of anchors.
Abstract: Part 1: Common Knowledge: Anutomatic Cluster Analysis. Automatic Cluster Analysis and Multidimensional Scaling. Correspondence Analysis - Interpretation of the Dimensions of the Social Representational Field. Statistical Structure and Objectification. Part 2: Shared Knowledge and Individual Positions: Three Basic Notions in the Multivariate Approach to Individual Differences. Factor Analysis. Multidimensional Scaling Analysis of Individual Differences. From Consensus to Individual Positioning. Part 3: Group Effects in Individual Positioning: Correspondence Analysis and the Study of Anchoring. Factor Scores. Automatic Interaction Detection. Discriminant Analysis. Correspondence Analysis of Textual Data. Status of Field Variations.

239 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare the ANN method with the Discriminant Analysis (DA) method in order to understand the merits of ANN that are responsible for the higher level of performance.
Abstract: Artificial Neural Network (ANN) techniques have recently been applied to many different fields and have demonstrated their capabilities in solving complex problems. In a business environment, the techniques have been applied to predict bond ratings and stock price performance. In these applications, ANN techniques outperformed widely-used multivariate statistical techniques. The purpose of this paper is to compare the ANN method with the Discriminant Analysis (DA) method in order to understand the merits of ANN that are responsible for the higher level of performance. The paper provides an overview of the basic concepts of ANN techniques in order to enhance the understanding of this emerging technique. The similarities and differences between ANN and DA techniques in representing their models are described. This study also proposes a method to overcome the limitations of the ANN approach, Finally, a case study using a data set in a business environment demonstrates the superiority of ANN over DA as a method of classification of observations.

197 citations


Journal ArticleDOI
TL;DR: This work investigates two important issues in building neural network models; network architecture and size of training samples and compares neural networks with classical models, and nonparametric methods such as k-nearest-neighbor and linear programming.
Abstract: Artificial neural networks are new methods for classification. We investigate two important issues in building neural network models; network architecture and size of training samples. Experiments were designed and carried out on two-group classification problems to find answers to these model building questions. The first experiment deals with selection of architecture and sample size for different classification problems. Results show that choice of architecture and choice of sample size depend on the objective: to maximize the classification rate of training samples, or to maximize the generalizability of neural networks. The second experiment compares neural network models with classical models such as linear discriminant analysis and quadratic discriminant analysis, and nonparametric methods such as k-nearest-neighbor and linear programming. Results show that neural networks are comparable to, if not better than, these other methods in terms of classification rates in the training samples but not in the test samples.

185 citations


Journal ArticleDOI
TL;DR: An important conclusion about the present method is that the Foley-Sammon optimal set of discriminant vectors is a special case of the set of optimal discriminant projection vectors.

183 citations


Journal ArticleDOI
TL;DR: A new minimum recognition error formulation and a generalized probabilistic descent (GPD) algorithm are analyzed and used to accomplish discriminative training of a conventional dynamic-programming-based speech recognizer.
Abstract: A new minimum recognition error formulation and a generalized probabilistic descent (GPD) algorithm are analyzed and used to accomplish discriminative training of a conventional dynamic-programming-based speech recognizer. The objective of discriminative training here is to directly minimize the recognition error rate. To achieve this, a formulation that allows controlled approximation of the exact error rate and renders optimization possible is used. The GPD method is implemented in a dynamic-time-warping (DTW)-based system. A linear discriminant function on the DTW distortion sequence is used to replace the conventional average DTW path distance. A series of speaker-independent recognition experiments using the highly confusible English E-set as the vocabulary showed a recognition rate of 84.4% compared to approximately 60% for traditional template training via clustering. The experimental results verified that the algorithm converges to a solution that achieves minimum error rate. >

165 citations


Proceedings ArticleDOI
A.R. Rao1, G.L. Lohse1
25 Oct 1993
TL;DR: An experiment was designed to help identify the relevant higher order features of texture perceived by humans, and three orthogonal dimensions for texture to be repetitive vs. non-repetitive; high-contrast and non-directional vs. low-Contrast and directional.
Abstract: Recently, researchers have started using texture for data visualization. The rationale behind this is to exploit the sensitivity of the human visual system to texture in order to overcome the limitations inherent in the display of multidimensional data. A fundamental issue that must be addressed is what textural features are important in texture perception, and how they are used. We designed an experiment to help identify the relevant higher order features of texture perceived by humans. We used twenty subjects, who were asked to rate 56 pictures from Brodatz's album on 12 nine-point Likert scales. We applied the techniques of hierarchical cluster analysis, non-parametric multidimensional scaling (MDS), classification and regression tree analysis (CART), discriminant analysis, and principal component analysis to data gathered from the subjects. Based on these techniques, we identified three orthogonal dimensions for texture to be repetitive vs. non-repetitive; high-contrast and non-directional vs. low-contrast and directional; granular, coarse and low-complexity vs. non-granular, fine and high-complexity. >

149 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented a logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored, using items from a 27-item mathematics test.
Abstract: The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure.

143 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: An adaptation of hidden Markov models (HMM) to automatic recognition of unrestricted handwritten words and many interesting details of a 50,000 vocabulary recognition system for US city names are described.
Abstract: The paper describes an adaptation of hidden Markov models (HMM) to automatic recognition of unrestricted handwritten words. Many interesting details of a 50,000 vocabulary recognition system for US city names are described. This system includes feature extraction, classification, estimation of model parameters, and word recognition. The feature extraction module transforms a binary image to a sequence of feature vectors. The classification module consists of a transformation based on linear discriminant analysis and Gaussian soft-decision vector quantizers which transform feature vectors into sets of symbols and associated likelihoods. Symbols and likelihoods form the input to both HMM training and recognition. HMM training performed in several successive steps requires only a small amount of gestalt labeled data on the level of characters for initialization. HMM recognition based on the Viterbi algorithm runs on subsets of the whole vocabulary. >

107 citations


Journal ArticleDOI
TL;DR: Artificial neural networks are new methods for classification that can solve some difficult classification problems where classical models cannot and are more robust in that they are less sensitive to changes in sample size, number of groups,Number of variables, proportions of group memberships, and degrees of overlap among groups.

77 citations


Journal ArticleDOI
TL;DR: McLachlan et al. as mentioned in this paper presented Discriminant Analysis and Statistical Pattern Recognition (DASR) for pattern recognition in the context of statistical pattern recognition, and used it for classification.
Abstract: 7. Discriminant Analysis and Statistical Pattern Recognition. By G. J. McLachlan. ISBN 0 4716 1531 5. Wiley, New York, 1992. xvi + 526 pp. £52.00.

Journal Article
TL;DR: The results have confirmed the potentiality and good performance of the connectionist approach when compared with classical methodologies and the basic architecture used is a feed-forward neural network and the backpropagation algorithm for the training phase.

Proceedings ArticleDOI
27 Apr 1993
TL;DR: Four methods were used to reduce the error rate of a continuous-density hidden Markov-model-based speech recognizer on the TI/NIST connected-digits recognition task.
Abstract: Four methods were used to reduce the error rate of a continuous-density hidden Markov-model-based speech recognizer on the TI/NIST connected-digits recognition task. Energy thresholding sets a lower limit on the energy in each frequency channel to suppress spurious distortion accumulation caused by random noise. This led to an improvement in error rate by 15%. Spectrum normalization was used to compensate for across-speaker variations, resulting in an additional improvement by 20%. The acoustic resolution was increased up to 32 component densities per mixture. Each doubling of the number of component densities yielded a reduction in error rate by roughly 20%. Linear discriminant analysis was used for improved feature selection. A single class-independent transformation matrix was applied to a large input vector consisting of several adjacent frames, resulting in an improvement by 20% for high acoustic resolution. The final string error rate was 0.84%. >

Journal ArticleDOI
TL;DR: There are indications that the classification trees and discriminant function analysis methods may be usefully employed in constructing parsimonious decision trees.
Abstract: Classification trees and discriminant function analysis were employed in order to ascertain whether a small number of diagnostic decision rules could be extracted from a large inventory of items. Several models, involving up to 17 symptoms, that led to a broad psychiatric diagnosis were then tested on a small validation sample of 53 patients. All methods, with the exception of CART used without any pruning, generated identical trees involving four items. Almost 90% of the validation sample was able to be correctly classified by all methods although poor classification performance was noted in the case of one particular diagnosis, Schizoaffective Psychosis. In contrast, stepwise linear discriminant analysis originally selected 17 items, although three out of the first four items selected were identical to those chosen by the tree-building methods. Although more research is required, there are indications that the latter methods may be usefully employed in constructing parsimonious decision trees.

Journal ArticleDOI
TL;DR: The location model has been known as the location model and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables, and extensive development of this model took place throughout the 1970's and 1980's in the context of discrimination and classification.
Abstract: Recent research into graphical association models has focussed interest on the conditional Gaussian distribution for analyzing mixtures of categorical and continuous variables. A special case of such models, utilizing the homogeneous conditional Gaussian distribution, has in fact been known since 1961 as the location model, and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables. Extensive development of this model took place throughout the 1970’s and 1980’s in the context of discrimination and classification, and comprehensive methodology is now available for such analysis of mixed variables. This paper surveys these developments and summarizes current capabilities in the area. Topics include distances between groups, discriminant analysis, error rates and their estimation, model and feature selection, and the handling of missing data.

Proceedings ArticleDOI
28 Mar 1993
TL;DR: An artificial neural network and a supervised self-organizing learning algorithm for multivariate linear discriminant analysis are proposed and the precision of the neural computation is shown to be high enough for feature selection and projection purposes.
Abstract: An artificial neural network and a supervised self-organizing learning algorithm for multivariate linear discriminant analysis are proposed. The precision of the neural computation is shown to be high enough for feature selection and projection purposes. A nonlinear discriminant analysis network (supervised nonlinear projection method) based on the multilayer feedforward network is also suggested. A comparative study of the principal component analysis network, linear discriminant analysis network, and nonlinear discriminant analysis network based on three criteria on various data sets is provided. A significance advantage of these neural networks over conventional approaches is their plasticity, which allows the networks to adapt themselves to new input data. >

Journal ArticleDOI
TL;DR: In this article, a modified model selection procedure based on a new appreciation function was proposed, which was shown to perform better than the original one in terms of model selection performance in chemical data sets.
Abstract: Regularized discriminant analysis has proven to be a most effective classifier for problems where traditional classifiers fail because of a lack of sufficient training samples, as is often the case in highdimensional settings. However, it has been shown that the model selection procedure of regularized discriminant analysis, determining the degree of regularization, has some deficiencies associated with it. We propose a modified model selection procedure base on a new appreciation function. By means of an extensive simulation it was shown that the new model selection procedure performs better than the original one. We also propose that one of the control parameters of regularized discriminant analysis be allowed to take on negative values. This extension leads to an improved performance in certain situations. The results are confirmed using two chemical data sets.

Proceedings ArticleDOI
27 Apr 1993
TL;DR: The authors use speaker identification (SI) for a performance evaluation as it is very sensitive to feature changes, and propose a target for robustness in terms of matched noise conditions, which is found to give the best resilience under cross conditions for a single feature.
Abstract: A variety of features and their sensitivity to noise mismatch between the model and test noise conditions are assessed. The authors use speaker identification (SI) for a performance evaluation as it is very sensitive to feature changes, and propose a target for robustness in terms of matched noise conditions. Two primary features, mel frequency cepstral coefficients (MFCCs) and PLP, are considered along with their RASTA and first-order regression extensions. PLP-RASTA is found to give the best resilience under cross conditions for a single feature, and the linear discriminant analysis (LDA) combination of MFCC and PLP-RASTA gives the best performance overall. Only in combined training are satisfactory results for any feature found. >

Proceedings ArticleDOI
R.A. Sukkar1, Jay G. Wilpon1
27 Apr 1993
TL;DR: Experimental results show that, on two separate databases, the two-pass classifier significantly outperforms a single-passclassifier based solely on the HMM likelihood scores.
Abstract: A classifier for utterance rejection in a hidden Markov model (HMM) based speech recognizer is presented. This classifier, termed the two-pass classifier, is a postprocessor to the HMM recognizer, and consists of a two-stage discriminant analysis. The first stage employs the generalized probabilistic descent (GPD) discriminative training framework, while the second stage performs linear discrimination combining the output of the first stage with HMM likelihood scores. In this fashion the classification power of the HMM is combined with that of the GPD stage which is specifically designed for keyword/nonkeyword classification. Experimental results show that, on two separate databases, the two-pass classifier significantly outperforms a single-pass classifier based solely on the HMM likelihood scores. >

Journal ArticleDOI
TL;DR: In this paper, three heuristic procedures for the two group discriminant problem are developed that minimize the total expected cost of misclassification in the training sample, which is shown through a systematic experiment to generate near optimal solutions.

Proceedings ArticleDOI
27 Apr 1993
TL;DR: Using triphone models based on LDA and continuous mixture densities, significant improvements have been observed and the following word error rates have been achieved and are among the best published so far on the RM database.
Abstract: Linear discriminant analysis (LDA) experiments reported previously (ICASSP-92 vol.1, p.13-16), are extended to context-dependent models and speaker-independent large vocabulary continuous speech recognition. Two variants of using mixture densities are compared: state-specific modeling and the monophone-tying approach where densities are shared across the states relevant to the same phoneme. Results are presented on the DARPA Resource Management (RM) task for both speaker-dependent (SD) and speaker-independent (SI) parts. Using triphone models based on LDA and continuous mixture densities, significant improvements have been observed and the following word error rates have been achieved: for the SD part, 7.8% without grammar and 1.5% with word pair; and for the SI part, 17.2% and 4.6%, respectively. These scores are averaged over 1200 SD or SI evaluation sentences and are among the best published so far on the RM database. >

Journal ArticleDOI
TL;DR: The neural network methodology was superior to discriminant function analysis both in its ability to classify groups and to generalize to new cases that were not part of the training sample and for understanding of possible core deficits in autism.
Abstract: A nonlinear pattern recognition system, neural network technology, was explored for its utility in assisting in the classification of autism. It was compared with a more traditional approach, simultaneous and stepwise linear discriminant analyses, in terms of the ability of each methodology to both classify and predict persons as having autism or mental retardation based on information obtained from a new structured parent interview: the Autistic Behavior Interview. The neural network methodology was superior to discriminant function analysis both in its ability to classify groups (92 vs. 85%) and to generalize to new cases that were not part of the training sample (92 vs. 82%). Interrater and test-retest reliabilities and measures of internal consistency were satisfactory for most of the subscales in the Autistic Behavior Interview. The implications of neural network technology for diagnosis, in general, and for understanding of possible core deficits in autism are discussed.

Journal ArticleDOI
TL;DR: The process is reversed so that training observations from each group are positioned in a Euclidean space, usually two-dimensional, using non-metric multidimensional scaling and then a mapping from the original sample space to the MDS space is found and used to discriminate future observations.

Journal ArticleDOI
TL;DR: An advanced document images analysis which involves a multi-layer description of a document and leads to a semantic analysis of its content for an adaptive coding orientation in order to optimize the archiving is described.

Journal ArticleDOI
TL;DR: It is concluded that neural networks are an attractive alternative to traditional statistical techniques when dealing with medical detection and classification tasks and the use of generated data for training the networks and the discriminant classifier has been shown to be justified and profitable.

Journal ArticleDOI
TL;DR: It is concluded that propofol, thiopental, and midazolam produce different effects on the EEG and that both neural network and discriminant analysis are useful in identifying these differences, and that EEG spectra should be analyzed without using classical EEG bands.
Abstract: Differences in electroencephalographic (EEG) power spectra obtained under similar, but not identical, conditions may be difficult to discern using standard techniques. Statistical analysis may not be useful because of the large number of comparisons necessary. Visual recognition of differences also may be difficult. A new technique, neural network analysis, has been used successfully in other problems of pattern recognition and classification. We examined a number of methods of classifying similar EEG data: standard statistical analysis (analysis of variance), visual recognition, discriminant analysis, and neural network analysis. Twenty-nine volunteers received either thiopental (n = 9), midazolam (n = 10), or propofol (n = 10) in sedative doses in 3 different studies. These drugs produced very similar changes in the EEG power spectra. Except for beta2 power during thiopental infusion, differences between drugs could not be detected using analysis of variance. Visual categorization was correct in 72% of the baseline EEGs, 70% of thiopental EEGs, 27% of propofol EEGs, and 46% of midazolam EEGs. A classification neural network (Learning Vector Quantization network) containing a Kohonen hidden layer was able to successfully classify 57 of 58 EEG samples (of 4 minutes’ duration). Discriminant analysis had a similar rate of success. This level of performance was achieved by dividing the EEG power spectrum from 1 to 30 Hz into 15 2-Hz bandwidths. When the EEG power spectrum was divided into the “classical” frequency bandwidths (alpha, beta1, beta2, theta, delta), both neural network and discriminant analysis performance deteriorated. By training the network using only certain inputs we were able to identify drugspecific bandwidths that seemed to be important in correct classification. We conclude that propofol, thiopental, and midazolam produce different effects on the EEG and that both neural network and discriminant analysis are useful in identifying these differences. We also conclude that EEG spectra should be analyzed without using classical EEG bands (alpha, beta, etc.). Additionally, neural networks can be used to identify frequency bands that are “important” in specific drug effects on the EEG. Once a classification algorithm is obtained using either a neural network or discriminant analysis, it could be used as an on-line monitor to recognize drug-specific EEG patterns.

Journal ArticleDOI
H. Saranadasa1
TL;DR: In this article, the A-criterion performs better than the D-Criterion in a high dimensional setting with measurement space (p) nearly equal to the total sample size (n).

Journal ArticleDOI
TL;DR: In this paper, an application of qualitative analysis in the flour milling industry based on near infrared spectroscopy and a factorial discriminant procedure was reported, with a correct classification rate of 97%.
Abstract: This paper reports an application of qualitative analysis in the flour milling industry based on near infrared spectroscopy and a factorial discriminant procedure. Samples of different commercial flour types were collected from a number of mills and a discriminant model developed; evaluation of this model on a different set of 99 samples produced a correct classification rate of 97%.

Journal ArticleDOI
TL;DR: In this paper, the problem of determining the minimum dimension necessary for quadratic discrimination in normal populations with heterogeneous covariance matrices was considered and some asymptotic chi-squared tests were obtained.

Journal ArticleDOI
TL;DR: Based on self-reported levels of alcohol consumption, 473 college students were placed into at-risk or not-at-risk groups and discriminant analysis procedures were conducted to determine if a function could be found which would discriminate between the groups.
Abstract: Based on self-reported levels of alcohol consumption, 473 college students (295 female and 178 male) were placed into at-risk or not-at-risk groups. Using reasons given for drinking as the independent variables, discriminant analysis procedures were conducted separately on the males and females to determine if a function could be found which would discriminate between the groups. For the female group, 11 of 22 reasons defined a discriminant function which accounted for 36% of the variance between the groups (p < .001). This function was also able to correctly classify 71% of the holdout sample. For the males, five of the 22 reasons defined a discriminant function which accounted for 36% of the variance between the groups (p < .001). This function was able to correctly classify 69% of the holdout sample.