scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 1984"


Book
22 Aug 1984
TL;DR: Several aspects of multivariate analysis can be found in this paper, e.g., principal components analysis, factor analysis, multiple discriminant analysis, Linear Structural Relations (LISREL), Latent Structure Analysis.
Abstract: Selected Aspects of Multivariate Analysis. Principal Components Analysis. Factor Analysis. Multidimensional Scaling. Cluster Analysis. Multiple Regression. Some Practical Considerations: Data Analysis Problems. Cross-Classified Frequency Data. Canonical Correlation Analysis. Discriminant Analysis: The Two-Group Problem. Multiple Discriminant Analysis and Related Topics. Linear Structural Relations (LISREL). Latent Structure Analysis. Appendixes. References. Index.

1,914 citations



Journal ArticleDOI
TL;DR: In this paper, the authors present an explanation of Cohen's kappa statistic which is useful in interpreting the classification results of discriminant analysis when group sample sizes are equal or unequal.
Abstract: INTRODUCTION Ecologists use discriminant analysis, in part, to examine the correct classification of species or individuals by functional or taxonomic group based on some predictor set of variables (Williams, 1981, 1983). The effectiveness of variable sets of data in discriminating between groups can thus be assessed. A problem, typically encountered in applying discriminant analysis, is unequal group sample sizes. In extreme situations unequal group sizes may lead to a very high percent correct classification but the improvement over random correct classification may be slight. As an example, if one wished to classify individuals of species A and B with sample sizes of 25 and 75, respectively, the probability of correct classification for each group is not 50%. Any individual has an a priori .25 probability of belonging to species A and a .75 probability of belonging to species B. The posterior chance of correct classification will be unclear to a researcher who does not apply a chance-corrected procedure. While a chancecorrected measure of correct prediction is more important as sample sizes become more disparate, such a procedure is useful even with equal group sample sizes. We present an explanation of Cohen's kappa statistic which is useful in interpreting the classification results of discriminant analysis when group sample sizes are equal or unequal. A numerical example is employed in Table 1 taken from Cody (1978). This statistic was developed by Cohen (1960) as a method for objectively computing the chance-corrected percentage of agreement between actual and predicted group memberships. Cohen (1968) later presented a generalized form which was subsequently applied to discriminant analysis in the educational literature by Wiedemann and Fenster (1978).

335 citations


Journal ArticleDOI
TL;DR: In this article, the authors generalize principal components to so-called common principal components and derive the normal-theory maximum likelihood estimates of the common component Σ i matrices and the log-likelihood-ratio statistics for testing this hypothesis.
Abstract: This article generalizes the method of principal components to so-called “common principal components” as follows: Consider the hypothesis that the covariance matrices Σ i for k populations are simultaneously diagonalizable. That is, there is an orthogonal matrix β such that β' Σ i β is diagonal for i = 1, …, k. I derive the normal-theory maximum likelihood estimates of the common component Σ i matrices and the log-likelihood-ratio statistics for testing this hypothesis. The solution has some favorable properties that do not depend on normality assumptions. Numerical examples illustrate the method. Applications to data reduction, multiple regression, and nonlinear discriminant analysis are sketched.

328 citations


Journal ArticleDOI
TL;DR: In this paper, a general approach to estimating linear statistical relationships is presented, which includes three lectures on linear functional and structural relationships, factor analysis, and simultaneous equations models, focusing on the similarity of maximum likelihood estimators under normality in the different models.
Abstract: This paper on estimating linear statistical relationships includes three lectures on linear functional and structural relationships, factor analysis, and simultaneous equations models. The emphasis is on relating the several models by a general approach and on the similarity of maximum likelihood estimators (under normality) in the different models. In the first two lectures the observable vector is decomposed into a "systematic part" and a random error; the systematic part satisfies the linear relationships. Estimators are derived for several cases and some of their properties given. Estimation of the coefficients of a single equation in a simultaneous equations model is shown to be equivalent to estimation of linear functional relationships.

272 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used multiple discriminant analysis (MDA) of financial ratios for bankruptcy prediction and found that the results exhibit a lack of consistency in the values of the coefficients reported and the relative importance of the various financial ratios when used in different studies.
Abstract: Previous bankruptcy prediction studies using multiple discriminant analysis (MDA) of financial ratios exhibit a lack of consistency both in the values of the coefficients reported and the relative importance of the various financial ratios when used in different studies. Part of this inconsistency can be attributed to the use of different sets of ratios, since no two studies used exactly the same set. Since multiple discriminant analysis coefficients are unique only up to a factor of proportionality (Eisenbeis [1977] and Ladd [1966]), the use of a different functional form of the MDA model of different subsets of a given set of financial ratios can result in different coefficient values. Apart from the selection of different ratios in the final prediction models, other methodological issues are raised by some common practices followed in bankruptcy prediction studies. For example, researchers typically pool data across different years without considering the underlying economic events in those years. Futhermore, in the development of the final prediction models, some authors consciously attempt to control for multicollinearity, whereas others ignore the issue, relying on Eisenbeis' assumption that multicollinearity is not a problem in discriminant analysis if classification accuracy is the objective (Eisenbeis [1977]). The

271 citations


Journal ArticleDOI
TL;DR: Three multivariate procedures are described: Hotelling's T2, discriminant analysis, and logistic regression; the underlying assumptions and relative merits and disadvantages are reviewed and which method to use in various circumstances is recommended.
Abstract: Researchers frequently encounter studies that compare two groups on many variables. We discourage the use of multiple tests of hypotheses on individual variables, an approach that ignores the correlation among the variables and increases the chance of a type I error. Instead of examining each variable separately, we recommend using multivariate procedures that integrate all measures on a person into a unified analysis of the differences between the two groups. We describe three multivariate procedures: Hotelling's T2, discriminant analysis, and logistic regression. We also discuss the use of Bonferroni's adjustment to preserve the overall chance of a type I error in conducting individual tests on each variable after doing the multivariate procedures. We review the underlying assumptions and relative merits and disadvantages of the three multivariate methods and recommend which method to use in various circumstances.

193 citations


Journal ArticleDOI
TL;DR: A method for utilising spatial information in performing discriminant analysis on multivariate data at each point on a regular lattice, as for example with LANDSAT, seems to be encouraging.

96 citations


Journal Article
TL;DR: In this paper, the Bayesian maximum likelihood parametric classifier has been tested against the data-based formulation designated "linear discrimination analysis", using the "GLIKE" decision and "CLASSIFY" classification algorithms in the Landsat Mapping System.
Abstract: The Bayesian maximum likelihood parametric classifier has been tested against the data-based formulation designated 'linear discrimination analysis', using the 'GLIKE' decision and "CLASSIFY' classification algorithms in the Landsat Mapping System. Identical supervised training sets, USGS land use/land cover classes, and various combinations of Landsat image and ancilliary geodata variables, were used to compare the algorithms' thematic mapping accuracy on a single-date summer subscene, with a cellularized USGS land use map of the same time frame furnishing the ground truth reference. CLASSIFY, which accepts a priori class probabilities, is found to be more accurate than GLIKE, which assumes equal class occurrences, for all three mapping variable sets and both levels of detail. These results may be generalized to direct accuracy, time, cost, and flexibility advantages of linear discriminant analysis over Bayesian methods.

70 citations


ReportDOI
01 Oct 1984
TL;DR: Software implementing the SMART (Smooth Multiple Additive Regression Technique) algorithm, which generalizes the projection pursuit method to classification and multiple response regression, is described.
Abstract: : This note describes software implementing the SMART(Smooth Multiple Additive Regression Technique) algorithm. SMART generalizes the projection pursuit method to classification and multiple response regression. SMART also provides a more efficient algorithm for single response projection pursuit regression. Originator-supplied keywords include: Multiple response regression, Non parametric regression, Classification, and Discriminant analysis.

61 citations


Journal ArticleDOI
TL;DR: In this paper, a general approach to modeling of misallocation is formulated, and the mean vectors and covariance matrices of the mixture distributions are derived, as well as the asymptotic distribution of the discriminant boundary is obtained and the first two moments of the error rates are given.
Abstract: Linear discriminant analysis for a two-class case is studied in the presence of misallocation in training samples. A general approach to modeling of misallocation is formulated, and the mean vectors and covariance matrices of the mixture distributions are derived. The asymptotic distribution of the discriminant boundary is obtained, and the asymptotic first two moments of the error rates are given. Certain numerical results for the error rates are presented by considering the random and two nonrandom misallocation models.

Journal ArticleDOI
TL;DR: In this paper, the criterion of unconditional mean squared error is used to compare four commonly used estimators of error rates in discriminant analysis, and the leave-one-out estimator, which has relatively small bias, is found to perform well relative to the other estimators when a large number of explanatory variables are used in the discriminant function.
Abstract: In this article the criterion of unconditional mean squared error is used to compare four commonly used estimators of error rates in discriminant analysis. The leave-one-out estimator, which has relatively small bias, is found to perform well relative to the other estimators when a large number of explanatory variables are used in the discriminant function. With a small number of explanatory variables, the large variance of this estimator results in poor performance. We also find the estimators that assume normally distributed explanatory variables to be nonrobust when the parent distributions are skewed or have large tails.

Journal ArticleDOI
TL;DR: This article investigated two propositions of uncertainty reduction theory and examined their effects on language use, including linguistic diversity and verbal immediacy, in two conversational segments taken from different periods of the entry phase of 72 interviews.
Abstract: The present study investigates two propositions of uncertainty reduction theory and examines their effects on language use. Linguistic diversity and verbal immediacy were measured in two conversational segments taken from different periods of the entry phase of 72 interviews. A discriminant analysis function accounted for 19.36% of the variance in the measures. A sign test of the discriminant function coefficients showed a significantly consistent shift, across individuals, from the earlier conversational segment to the later. The results are consistent with the propositions of uncertainly reduction. Implications of this interpretation are discussed.


Book ChapterDOI
01 Jan 1984
TL;DR: In this article, the authors compare populations or classifying unknown specimens on the basis of their morphology, and use the mathematical techniques of discriminant analysis to classify unknown specimens based on their morphology.
Abstract: When comparing populations or classifying unknown specimens on the basis of their morphology, one can either rely on visual comparison or follow an approach which involves the use of the mathematical techniques of discriminant analysis.

Journal ArticleDOI
TL;DR: A detailed description of an efficient algorithm of minimization of this criterion function is given, both in the case of linear separability and nonseparability of sets.

Journal ArticleDOI
TL;DR: In this article, a random vector is assumed to have one of three known multivariate normal distributions with equal covariance matrices, and it is desired to separate the three distributions by means of a single linear discriminant function.

Journal ArticleDOI
TL;DR: In this article, the fit of logistic regressions using the Implied Discriminant Analysis (IDA) has been evaluated and the fit has been shown to be tight.
Abstract: (1984). Comment: Assessing the Fit of Logistic Regressions Using the Implied Discriminant Analysis. Journal of the American Statistical Association: Vol. 79, No. 385, pp. 79-80.

Journal ArticleDOI
TL;DR: It is concluded that for a similarly-manufactured battery set, relative lifetime prediction could be based on initial measurements of the samemore type examined here.

01 Jan 1984
TL;DR: The project implemented an on-line handprinted character recogniser using a Bayesian decision rule in conjunction with selected feature transformations to provide a basis for constructing a practical device for data entry.
Abstract: The project implemented an on-line handprinted character recogniser. A Bayesian decision rule was used in conjunction with selected feature transformations. Fourier transformations and transformations derived from discriminant analysis were compared. Discriminant features generally proved superior to those derived from Fourier analysis. A recognition rate of 980/0 was achieved with the use of the first 4 discriminant functions. The character set included numerals and arithmetic operators. The results from the current system provide a basis for constructing a practical device for data entry.

Journal ArticleDOI
TL;DR: In this paper, principal component analysis is applied to the interpretation of 13 C-n.m.r. spectra and to the resolution of mass spectral data and a procedure is given for determining the relative amounts of pure components with and without the use of pure mass lines, in mass spectra of mixtures.

Journal ArticleDOI
TL;DR: In this paper, the authors derived expressions for misclassification probabilities under a contaminated multivariate normal model for the linear programming approaches to the two-group discriminant problem, and used them to evaluate the performance of the linear programs.
Abstract: Expressions for misclassification probabilities are derived under a contaminated multivariate normal model for the linear-programming approaches to the two-group discriminant problem.

Journal ArticleDOI
TL;DR: Of the five factor VIII variates, two‐stage factor VIII coagulant activity and factor VIII related antigen by electroimmunoassay correctly identified the most subjects, and ristocetin co‐factor did not improve the diagnostic ability.
Abstract: Summary. Multivariate analysis is potentially superior to the linear discriminant analysis which is commonly used to identify carriers of haemophilia. Our aim was to compare these two statistical methods, and to find which factor VIII variates most effectively partitioned carrier and normal subjects. In this study we assayed one- and two-stage factor VIII coagulant activity, factor VIII related antigen by electroimmunoassay and by fluoroimmunoassay, and ristocetin co-factor in 50 normal females and 50 carriers of haemophilia. From the results we calculated multivariate ellipses which circumscribed the normal and the carrier populations, and we displayed these on the monitor of a microcomputer. These ellipses separated the two populations better than linear discriminants calculated on the same data. Multivariate analysis correctly identified 94% of the carriers whereas discriminant analysis correctly identified only 84%. Discriminant analysis gave poorer results because the statistical assumption of equal variance was breached, whereas the assumption of multivariate normality was upheld. Of the five factor VIII variates, two-stage factor VIII coagulant activity and factor VIII related antigen by electroimmunoassay correctly identified the most subjects. Ristocetin co-factor did not improve the diagnostic ability, either when in lieu of or when added to factor VIII related antigen.


01 Jan 1984
TL;DR: In this paper, the authors compared two statistical procedures that combine these mifasting plasma phenylalanine and tyrosine concentrations with the individuals' prior probability of being a heterozygous carrier in order to discriminate carriers from noncarriers.
Abstract: SUMMARY Absence ofaconvenient, direct enzymeassay fordetecting phenylketonuria (PKU)heterozygotes hasresulted incontinued effort todevelop anaccurate andreliable procedure todiscriminate theheterozygous individual fromthehomozygous normal. Ourstudy compares twostatistical procedures that combine thesemifasting plasma phenylalanine andtyrosine concentrations withtheindividuals' prior probability of being aheterozygous carrier inorder todiscriminate carriers fromnoncarriers. Theresults ofthis comparison indicate that thequadratic discriminant function issuperior tothelinear discriminant function asa methodofcarrier testing bothintheory andinpractice. Aninteractive computer system isdescribed that facilitates theclinical utilization of thequadratic discriminant function.

Journal Article
TL;DR: This study compares two statistical procedures that combine the semifasting plasma phenylalanine and tyrosine concentrations with the individuals' prior probability of being a heterozygous carrier in order to discriminate carriers from noncarriers and indicates that the quadratic discriminant function is superior to the linear discriminant functions as a method of carrier testing.
Abstract: Absence of a convenient, direct enzyme assay for detecting phenylketonuria (PKU) heterozygotes has resulted in continued effort to develop an accurate and reliable procedure to discriminate the heterozygous individual from the homozygous normal Our study compares two statistical procedures that combine the semifasting plasma phenylalanine and tyrosine concentrations with the individuals' prior probability of being a heterozygous carrier in order to discriminate carriers from noncarriers The results of this comparison indicate that the quadratic discriminant function is superior to the linear discriminant function as a method of carrier testing both in theory and in practice An interactive computer system is described that facilitates the clinical utilization of the quadratic discriminant function

Journal ArticleDOI
TL;DR: In this paper, the authors modeled giving behavior toward a charitable organization by two different procedures: discriminant analysis and a log-linear model, which may more accurately represent the conceptual basis of consumer decisions.
Abstract: Giving behavior toward a charitable organization is modeled by two different procedures: discriminant analysis and a log-linear model. Although the discriminant procedure is beter known in marketing, the log-linear approach has less restrictive model assumptions and may more accurately represent the conceptual basis of consumer decisions. Two situations are considered: (1) a simple binary classification of the giving decision, and (2) a three-group case of no gift, small, and large gift. Utilizing a large data base and a holdout sample for comparative purposes, the log-linear procedure is found to be an attractive alternative to discriminant analysis in terms of correct classification of individuals.

Journal ArticleDOI
TL;DR: The results of a systematic investigation of the properties of several data transformations commonly used in multivariate statistical studies of primate functional morphology are considered using the same sample of 11 extant primate taxa and the same set of linear measurements of the elbow complex.

Journal ArticleDOI
TL;DR: The authors recommend that in the future researchers use CCA instead of PCA in EP studies for data reduction carried out for discrimination, because the method referred to as canonical component analysis is optimal for the purpose of discrimination.
Abstract: The authors tested a new procedure for the discrimination of EPs obtained in different stimulus situations. In contrast with principal component analysis (PCA) used so far for the purpose of data compression, the method referred to as canonical component analysis (CCA) is optimal for the purpose of discrimination. To illustrate this, the authors performed both PCA and CCA for the same material, then after carrying out discriminant analysis (SDWA) for the data transformed in this way, compared the performance of the two procedures in discrimination. In view of both the theoretical and practical considerations, the authors recommend that in the future researchers use CCA instead of PCA in EP studies for data reduction carried out for discrimination.

Journal ArticleDOI
TL;DR: In this paper, two multivariate alternatives to regression analysis are presented, canonical analysis and multiple discriminant analysis, both of which define new coordinate systems for evaluation of dimensions underlying salary decisions.
Abstract: The analysis of salary equity-parity in institutions of higher education typically involves the use of multiple regression analysis to determine predicted salary and the residual differences between predicted and actual salary. Multiple regression analysis forces the variable weights throughout the salary structure to be uniform, permits only one criterion or dependent variable to be examined at a time, restricts the coordinates to those provided by the variables as measured, and as customarily used, treats qualitative or discrete variables as if they were continuous, assuming ordinality of the numbers used. Two multivariate alternatives to regression analysis are presented, canonical analysis and multiple discriminant analysis, both of which define new coordinate systems for evaluation of dimensions underlying salary decisions.