scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 1983"


Book
01 Jan 1983
TL;DR: In this Section: 1. Multivariate Statistics: Why? and 2. A Guide to Statistical Techniques: Using the Book Research Questions and Associated Techniques.
Abstract: In this Section: 1. Brief Table of Contents 2. Full Table of Contents 1. BRIEF TABLE OF CONTENTS Chapter 1 Introduction Chapter 2 A Guide to Statistical Techniques: Using the Book Chapter 3 Review of Univariate and Bivariate Statistics Chapter 4 Cleaning Up Your Act: Screening Data Prior to Analysis Chapter 5 Multiple Regression Chapter 6 Analysis of Covariance Chapter 7 Multivariate Analysis of Variance and Covariance Chapter 8 Profile Analysis: The Multivariate Approach to Repeated Measures Chapter 9 Discriminant Analysis Chapter 10 Logistic Regression Chapter 11 Survival/Failure Analysis Chapter 12 Canonical Correlation Chapter 13 Principal Components and Factor Analysis Chapter 14 Structural Equation Modeling Chapter 15 Multilevel Linear Modeling Chapter 16 Multiway Frequency Analysis 2. FULL TABLE OF CONTENTS Chapter 1: Introduction Multivariate Statistics: Why? Some Useful Definitions Linear Combinations of Variables Number and Nature of Variables to Include Statistical Power Data Appropriate for Multivariate Statistics Organization of the Book Chapter 2: A Guide to Statistical Techniques: Using the Book Research Questions and Associated Techniques Some Further Comparisons A Decision Tree Technique Chapters Preliminary Check of the Data Chapter 3: Review of Univariate and Bivariate Statistics Hypothesis Testing Analysis of Variance Parameter Estimation Effect Size Bivariate Statistics: Correlation and Regression. Chi-Square Analysis Chapter 4: Cleaning Up Your Act: Screening Data Prior to Analysis Important Issues in Data Screening Complete Examples of Data Screening Chapter 5: Multiple Regression General Purpose and Description Kinds of Research Questions Limitations to Regression Analyses Fundamental Equations for Multiple Regression Major Types of Multiple Regression Some Important Issues. Complete Examples of Regression Analysis Comparison of Programs Chapter 6: Analysis of Covariance General Purpose and Description Kinds of Research Questions Limitations to Analysis of Covariance Fundamental Equations for Analysis of Covariance Some Important Issues Complete Example of Analysis of Covariance Comparison of Programs Chapter 7: Multivariate Analysis of Variance and Covariance General Purpose and Description Kinds of Research Questions Limitations to Multivariate Analysis of Variance and Covariance Fundamental Equations for Multivariate Analysis of Variance and Covariance Some Important Issues Complete Examples of Multivariate Analysis of Variance and Covariance Comparison of Programs Chapter 8: Profile Analysis: The Multivariate Approach to Repeated Measures General Purpose and Description Kinds of Research Questions Limitations to Profile Analysis Fundamental Equations for Profile Analysis Some Important Issues Complete Examples of Profile Analysis Comparison of Programs Chapter 9: Discriminant Analysis General Purpose and Description Kinds of Research Questions Limitations to Discriminant Analysis Fundamental Equations for Discriminant Analysis Types of Discriminant Analysis Some Important Issues Comparison of Programs Chapter 10: Logistic Regression General Purpose and Description Kinds of Research Questions Limitations to Logistic Regression Analysis Fundamental Equations for Logistic Regression Types of Logistic Regression Some Important Issues Complete Examples of Logistic Regression Comparison of Programs Chapter 11: Survival/Failure Analysis General Purpose and Description Kinds of Research Questions Limitations to Survival Analysis Fundamental Equations for Survival Analysis Types of Survival Analysis Some Important Issues Complete Example of Survival Analysis Comparison of Programs Chapter 12: Canonical Correlation General Purpose and Description Kinds of Research Questions Limitations Fundamental Equations for Canonical Correlation Some Important Issues Complete Example of Canonical Correlation Comparison of Programs Chapter 13: Principal Components and Factor Analysis General Purpose and Description Kinds of Research Questions Limitations Fundamental Equations for Factor Analysis Major Types of Factor Analysis Some Important Issues Complete Example of FA Comparison of Programs Chapter 14: Structural Equation Modeling General Purpose and Description Kinds of Research Questions Limitations to Structural Equation Modeling Fundamental Equations for Structural Equations Modeling Some Important Issues Complete Examples of Structural Equation Modeling Analysis. Comparison of Programs Chapter 15: Multilevel Linear Modeling General Purpose and Description Kinds of Research Questions Limitations to Multilevel Linear Modeling Fundamental Equations Types of MLM Some Important Issues Complete Example of MLM Comparison of Programs Chapter 16: Multiway Frequency Analysis General Purpose and Description Kinds of Research Questions Limitations to Multiway Frequency Analysis Fundamental Equations for Multiway Frequency Analysis Some Important Issues Complete Example of Multiway Frequency Analysis Comparison of Programs

53,113 citations


Journal ArticleDOI
01 Oct 1983-Ecology
TL;DR: It is suggested that the common practice of imputing eco- logical "meaning" to the signs and magnitudes of coefficients be replaced by an assessment of "struc- ture coefficients."
Abstract: The application of discriminant analysis in ecological investigations is discussed. The appropriate statistical assumptions for discriminant analysis are illustrated, and both classification and group separation approaches are outlined. Three assumptions that are crucial in ecological studies are discussed at length, and the consequences of their violation are developed. These assumptions are: (1) equality of dispersions, (2) identifiability of prior probabilities, and (3) precise and accurate estimation of means and dispersions. The use of discriminant functions for purposes of interpreting ecological relationships is also discussed. It is suggested that the common practice of imputing eco- logical "meaning" to the signs and magnitudes of coefficients be replaced by an assessment of "struc- ture coefficients." Finally, the potential and limitations of representation of data in canonical space are considered, and some cautionary points are made concerning ecological interpretation of patterns in canonical space.

278 citations


Journal ArticleDOI
TL;DR: In this article, a nonparametric method of discriminant analysis is proposed based on non-parametric extensions of commonly used scatter matrices for non-Gaussian data sets and a procedure is proposed to test the structural similarity of two distributions.
Abstract: A nonparametric method of discriminant analysis is proposed. It is based on nonparametric extensions of commonly used scatter matrices. Two advantages result from the use of the proposed nonparametric scatter matrices. First, they are generally of full rank. This provides the ability to specify the number of extracted features desired. This is in contrast to parametric discriminant analysis, which for an L class problem typically can determine at most L 1 features. Second, the nonparametric nature of the scatter matrices allows the procedure to work well even for non-Gaussian data sets. Using the same basic framework, a procedure is proposed to test the structural similarity of two distributions. The procedure works in high-dimensional space. It specifies a linear decomposition of the original data space in which a relative indication of dissimilarity along each new basis vector is provided. The nonparametric scatter matrices are also used to derive a clustering procedure, which is recognized as a k-nearest neighbor version of the nonparametric valley seeking algorithm. The form which results provides a unified view of the parametric nearest mean reclassification algorithm and the nonparametric valley seeking algorithm.

232 citations


01 Jan 1983
TL;DR: Using the same basic framework, a procedure is proposed to test the structural similarity of two distributions, and the form which results provides a unified view of the parametric nearest mean reclassification algorithm and the nonparametric valley seeking algorithm.

223 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of failure prediction models using four alternative variable sets on firms which failed from 1966-1975, using linear discriminant analysis or logit analysis.

146 citations



Journal ArticleDOI
TL;DR: It is suggested that measurement of nuclear parameters in atypical endometrialhyperplasia may provide a more objective means of predicting the behavior of the various forms of hyperplasia.
Abstract: Nuclear parameters from 24 cases of atypical endometrial hyperplasia were determined by means of graphic tablet and microcomputer. Eight of the 24 hyperplasias progressed to carcinoma, but the remaining 16 did not progress during a mean follow-up period of 11.8 years. A linear discriminant function selected the mean and standard deviation of maximal nuclear diameter as useful predictors of clinical outcome. The linear discriminant function predicted the correct outcome in 83% of the cases. This study suggests that measurement of nuclear parameters in atypical endometrial hyperplasia may provide a more objective means of predicting the behavior of the various forms of hyperplasia.

46 citations


Journal ArticleDOI
TL;DR: In this paper, a linear discriminant model is applied to prediction of failure, and the model is sensitive to departures of input data distributions from a multivariate normal, as well as the absence of statistical properties of accounting measures.
Abstract: Prediction has been a central theme in much of the accounting research and theory construction and verification over the past decade. Largely ignored in such studies has been consideration of the statistical properties of accounting measures, particularly as related to the effects of those properties on the signals from prediction models that use accounting measures as inputs. This study was designed to provide preliminary insight into the magnitude of the effects of this omission, and a bankruptcy prediction model was selected to facilitate the analysis. Results indicate that the linear discriminant model (as applied to prediction of failure) is sensitive to departures of inputdata distributions from multivariate normal.

44 citations


Journal ArticleDOI
TL;DR: Application between different populations of prediction schemes based on LDA and LR was shown to be feasible but prior validation is essential; although each used the information contained in the prognostic variables differently.
Abstract: We predicted 30-day mortality and survival following acute myocardial infarction in two different hospital populations utilizing several multivariate statistical methodologies [linear discriminant analysis (LDA), logistic regression (LR), recursive partitioning (RP), and nearest neighbor]. Variables used were identified as predictive univariately from the base hospital and were obtained during the first 24 h after admission. LDA, LR, or RP all performed similarly within a given population; although each used the information contained in the prognostic variables differently. Application between different populations of prediction schemes based on LDA and LR was shown to be feasible but prior validation is essential.

43 citations


Journal ArticleDOI
TL;DR: The apparent error rates of the kernel method are found to be consistently less than those of the classical method, and when the true error rates are estimated either by applying the classifiers to independent test sets, or by the leaving-one-out method from the design sets, no significant difference is discernible between the two types of classifier.
Abstract: The results of applying classical linear discriminant analysis and kernel discriminant analysis to several real sets of multivariate binary data are presented. Classical discriminant analysis is intrinsically parametric and is usually presented as being well-suited to continuous variables; it is also well-known to be optimal when the (two) classes have normal distributions with identical covariance matrices. The kernel method, on the other hand, is nonparametric and, in the form used here, is ideally suited to binary data. The apparent error rates of the kernel method are found to be consistently less than those of the classical method. However, when the true error rates are estimated either by applying the classifiers to independent test sets, or by the leaving-one-out method from the design sets, no significant difference is discernible between the two types of classifier.

35 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a procedure for estimating the binary response curve based on a model which approximates the response curve by a finely segmented piecewise constant function, which is applicable to data consisting of observations of a binary response variable and a single explanatory variable.
Abstract: The purpose of the present paper is to propose a practical procedure for the estimation of the binary response curve. The procedure is based on a model which approximates the response curve by a finely segmented piecewise constant function. To obtain a stable estimate we assume a prior distribution of the parameters of the model. The prior distribution has several parameters (hyper-parameters) which are chosen to minimize an information criterion ABIC. The procedure is applicable to data consisting of observations of a binary response variable and a single explanatory variable. The practical utility of the procedure is demonstrated by examples of applications to the dose response curve estimation, to the intensity function estimation of a point process and to the analysis of social survey data. The application of the procedure to the discriminant analysis is also briefly discussed.

Journal ArticleDOI
TL;DR: In this article, the performance of the Fisher and logistic linear and quadratic discriminant functions is compared with the optimal maximum likelihood procedure for the different data types and the theoretical misclassification probabilities of the sample discriminant function are calculated directly and used for the comparison of different procedures both in terms of bias and variation.

Journal ArticleDOI
TL;DR: The application of bootstrap sampling to the above problem has the advantage of not only furnishing the estimates of misclassification probabilities but also provides an estimate of the standard error of estimate.
Abstract: Several methods have been proposed to estimate the misclassification probabilities when a linear discriminant function is used to classify an observation into one of several populations. We describe the application of bootstrap sampling to the above problem. The proposed method has the advantage of not only furnishing the estimates of misclassification probabilities but also provides an estimate of the standard error of estimate. The method is illustrated by a small simulation experiment. It is then applied to three published, well accessible data sets, which are typical of large, medium and small data sets encountered in practice.

Book ChapterDOI
01 Jan 1983
TL;DR: In many research fields it is possible to obtain good scientific results only after large amounts of data have been collected and analyzed; the analysis allows the researcher to detect regularities, similarities and discriminant features which may be useful to characterize different classes of objects.
Abstract: In many research fields it is possible to obtain good scientific results only after large amounts of data have been collected and analyzed; the analysis allows the researcher to detect regularities, similarities and discriminant features which may be useful to characterize different classes of objects. On the other hand, the manual examination of a large set of data is slow and error prone, so that many techniques have been proposed and are actually used to perform that analysis automatically (e.g. discriminant analysis); unfortunately, most of those techniques are based on mathematical methodologies which impose strong constraints on the kinds of data that can be analyzed.

Journal ArticleDOI
TL;DR: A class of matrix arithmetic networks is proposed for implementing the Foley-Sammon feature extraction algorithm and for generating linear discriminant vectors in pattern classification.
Abstract: In statistical methods for image processing and pattern classification, large-scale matrix computations are often performed over huge image data bases. A class of matrix arithmetic networks is proposed for implementing the Foley-Sammon feature extraction algorithm and for generating linear discriminant vectors in pattern classification. Such VLSI feature extractors and pattern classifiers are in high demand in real-time artificial intelligence applications. Performances of the proposed VLSI image analyzers are compared with conventional software approaches using a uniprocessor computer.

Book ChapterDOI
01 Jan 1983
TL;DR: In this paper, a comparison of the logit model and normal discriminant analysis when the independent variables are binary is presented, where the discriminant estimator is the true maximum likelihood estimator and is asymptotically more efficient than the LML estimator.
Abstract: Publisher Summary This chapter presents a comparison of the logit model and normal discriminant analysis when the independent variables are binary. In the logit model for a dichotomous dependent variable, the parameters can be estimated either by the logit maximum likelihood estimator or by the method of normal discriminant analysis. If the independent variables are normally distributed, the discriminant analysis estimator is the true maximum likelihood estimator and, therefore, is asymptotically more efficient than the logit maximum likelihood estimator. Predictive robustness of the discriminant analysis estimator holds more for discrete explanatory variables than for continuously distributed, non-normal independent variables. In continuously distributed, non-normal independent variables, the magnitudes of the estimated coefficients and not merely the signs of certain linear combinations of them, are required for a complete description of the classification rule. Misapplication of normal discriminant analysis to binary data should be of more concern if the object is the estimation of structural parameters rather than prediction.


Journal ArticleDOI
TL;DR: In this article, a backward elimination method of discrete variable selection is outlined, which can be used to identify a suitable, reduced location model for discriminant applications when the number of discrete variables is too large for direct use.
Abstract: One practical drawback to the use of discrimination methods based on the location model for mixtures of discrete and continuous variables is that the smoothing techniques employed, and the subsequent estimation of error rates, limit fairly severely the allowable number of discrete variables. A backward elimination method of discrete variable selection is outlined in this paper. This can be used to identify a suitable, reduced location model for discriminant applications when the number of discrete variables is too large for direct use. It can also be used more traditionally as a variable selection procedure in discriminant analysis. Some examples are given.

Journal ArticleDOI
TL;DR: In a discriminant analysis setting, support is found for all three hypotheses, and a substantial portion of the variance in the criterion variable—advertising recall—is taken into account.


Journal ArticleDOI
TL;DR: The IMIR data are used to compare the diagnostic performance of logistic discrimination with some other discriminant analysis techniques, and which characterizations of mixed data sets may give indications for an appropriate choice from among the alternative methods for discrimination.
Abstract: The Imminent Myocardial Infarction study Rotterdam (IMIR), concerns patients who visit their general practitioners and have complaints suspected to be of cardiac origin. The study aims to develop a protocol for diagnosing myocardial infarction without laboratory assistance. We use the IMIR data, consisting of continuous and binary variables, to compare the diagnostic performance of logistic discrimination with some other discriminant analysis techniques. We discuss which characterizations of mixed data sets may give indications for an appropriate choice from among the alternative methods for discrimination.

Journal ArticleDOI
TL;DR: In this paper, a model for mixed continuous and discrete variables is used to explore the bias in the discriminant function (DF) approach to estimation of the coefficients in the multiple-1ogistic regression model.
Abstract: A model for mixed continuous and discrete variables suggested by Chang and Afifi (1974) and Krzanowski (1975) is used to explore the bias in the discriminant function (DF) approach to estimation of the coefficients in the multiple1ogistic regression model. When the data come from this mixed variable model the DF estimator of the coefficients of the continuous variables are asymptotically unbiased. The DF estimator of the intercept and coefficients for the discrete variables may be severely biased. The magnitude of the bias is shown to depend in a systematic way on the true value of the coefficients and the underlying probabilities of the out-come of discrete variables. The implications for analysis are discussed.

Journal ArticleDOI
TL;DR: This paper describes a procedure to partition ordered variables into discrete states for the discrimination of an ecological classification, and concludes that the benthos classification is independent of oxygen concentrations.
Abstract: This paper describes a procedure to partition ordered variables into discrete states for the discrimination of an ecological classification. At each step, the best partition is that which maximizes...

Journal ArticleDOI
TL;DR: A method for obtaining a linear discriminant function to identify monogenic segregation in multivariate pedigree data and finds that linear function of the variables that maximizes the likelihood of a set of pedigree data, under the hypothesis of single gene segregation.
Abstract: We describe a method for obtaining a linear discriminant function to identify monogenic segregation in multivariate pedigree data. It differs from Fisher's linear discriminant function in that it does not assume that the genotype of each individual in the pedigree already known. The method consists of finding that linear function of the variables that maximizes the likelihood of a set of pedigree data, under the hypothesis of single gene segregation, subject to the constraint that the total sample variance of the function remains constant. To simplify the computation the variables are first transformed to their standardized principal components. Reanalysis of a set of pedigree data suggests that age and powers of age should be considered as extra variables from which the principal components are obtained, and virtually all of the variance should be accounted for by the principal components used to obtain the discriminant function.

Journal ArticleDOI
TL;DR: This paper investigates the effect of serial correlation under more general conditions and finds that the asymptotic expansion of the change in the expected error rate differs from that given by Tubbs.

Journal ArticleDOI
TL;DR: A descriptive solution for the case where one can determine the response curves by linear interpolation between successive observations is proposed, which has the potential advantage of being applicable dynamically, as one observes the multivariate response curve.
Abstract: We examine the problem of discriminating between two groups in the context of multivariate response curves observed over a specified time interval. We propose a descriptive solution for the case where one can determine the response curves by linear interpolation between successive observations. Unlike most previously reported methods that use only the current multivariate observation, our approach accounts for the history of the process. Moreover the method has the potential advantage of being applicable dynamically, as one observes the multivariate response curve. Finally, the method demonstrates simplicity and flexibility, two important features for successful, routine, clinical application.

Journal ArticleDOI
TL;DR: Serious problems arise in the use of morphological characteristics for taxonomic discrimination of Fucus species, due to the extreme morphological variability of the genus, and statistical analyses of morphometric measurements have been used as an alternative method.
Abstract: Many problems arise in the use of morphological characteristics for taxonomic discrimination of Fucus species, due to the extreme morphological variability of the genus. New taxonomic techniques are required. Statistical analyses of morphometric measurements have been used äs an alternative method of separating Fucus spp. These can be applied in situations where intergradation or Hybridisation is believed to occur. Twelve continuous variables, representative of plant morphology and common to both F. senatus and F. vesiculosus were selected. Discriminant analysis of these variables can separate closely related populations of F. serratus and of F. vesiculosus. Variation exists within each population. Different populations of the two species cannot be totally discriminated.

Journal ArticleDOI
TL;DR: In this article, the performance of four classification rules with respect to discriminatory ability for data consisting of a mixture of continuous and discrete variables was investigated, and four discriminant analysis methods are Fisher's linear discrimination, logistic discrimination, quadratic discrimination and a kernel model.
Abstract: The present study investigates the performance of four classification rules with respect to discriminatory ability for data consisting of a mixture of continuous and discrete variables. The four discriminant analysis methods are Fisher's linear discrimination, logistic discrimination, quadratic discrimination and a kernel model. Four measures of performance for evaluation of the classification rules are used: the error rate, the quadratic scoring rule, the modified logarithmic scoring rule and a doubt-based scoring rule. The mixed data are obtained by generating from the fourdimensional normal distribution. Three of these variables were discretized. The results show that Fisher's linear discrimination and logistic discrimination have an alomost similar performance. In most of the situations model seems to be appropriate as far as discriminatory ability is concerned.

Journal ArticleDOI
TL;DR: In this paper, multi-element analyses of more than 600 panned heavy-mineral concentrate samples from the Jameson Land area of central East Greenland were investigated by discriminant analysis which, combined with an a priori knowledge of the geology, was used to assist interpretation and classification the data.

Journal ArticleDOI
TL;DR: Information on the inherent structure of multidimensional data derived from a factor analysis procedure is equivalent to information obtained by Fisher discriminant analysis techniques, provided certain conditions, usually required in the factor analysis model, are satisfied.
Abstract: We show that information on the inherent structure of multidimensional data derived from a factor analysis procedure is equivalent to information obtained by Fisher discriminant analysis techniques, provided certain conditions, usually required in the factor analysis model, are satisfied. The results advocate the use of a factor analysis approach when Fisher discriminant analysis is not applicable, such as, for instance, in clustering problems.