scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Mahalanobis distance

TL;DR: The Mahalanobis distance, in the original and principal component (PC) space, will be examined and interpreted in relation with the Euclidean distance (ED).
About: This article is published in Chemometrics and Intelligent Laboratory Systems.The article was published on 2000-01-04. It has received 1802 citations till now. The article focuses on the topics: Mahalanobis distance & Bhattacharyya distance.
Citations
More filters
Journal ArticleDOI
TL;DR: Genome selection (GS) as discussed by the authors uses all marker data as predictors of performance and consequently delivers more accurate predictions, potentially leading to more rapid and lower cost gains from breeding. But these traits are complex and affected by many genes, each with small effect.
Abstract: We intuitively believe that the dramatic drop in the cost of DNA marker information we have experienced should have immediate benefits in accelerating the delivery of crop varieties with improved yield, quality and biotic and abiotic stress tolerance. But these traits are complex and affected by many genes, each with small effect. Traditional marker-assisted selection has been ineffective for such traits. The introduction of genomic selection (GS), however, has shifted that paradigm. Rather than seeking to identify individual loci significantly associated with a trait, GS uses all marker data as predictors of performance and consequently delivers more accurate predictions. Selection can be based on GS predictions, potentially leading to more rapid and lower cost gains from breeding. The objectives of this article are to review essential aspects of GS and summarize the important take-home messages from recent theoretical, simulation and empirical studies. We then look forward and consider research needs surrounding methodological questions and the implications of GS for long-term selection.

986 citations

Journal ArticleDOI
TL;DR: This article proposed a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness, as well as geographic distance.
Abstract: Cross-national distance is a key concept in the field of management. Previous research has conceptualized and measured cross-national differences mostly in terms of dyadic cultural distance, and has used the Euclidean approach to measuring it. In contrast, our goal is to disaggregate the construct of distance by proposing a set of multidimensional measures, including economic, financial, political, administrative, cultural, demographic, knowledge, and global connectedness as well as geographic distance. We ground our analysis and choice of empirical dimensions on institutional theories of national business, governance, and innovation systems. In order to overcome the methodological limitations of the Euclidean approach, we calculate dyadic distances using the Mahalanobis method, which is scale-invariant and takes into consideration the variance–covariance matrix. We empirically analyze four different foreign expansion choices of US companies to illustrate the importance of disaggregating the distance construct and the usefulness of our distance calculations, which we make freely available to managers and scholars.

981 citations


Cites methods from "The Mahalanobis distance"

  • ...Scholars familiar with principal-component analysis will realize that the Mahalanobis distance is equivalent to the Euclidean distance calculated with the standardized values of the principal components (De Maesschalck et al., 2000)....

    [...]

Journal ArticleDOI
TL;DR: Several single-cell -omics approaches are used to define the cellular processes and pathways in the human RA joint and attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes, potentially key mediators of RA pathogenesis.
Abstract: To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+ autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.

649 citations

Journal ArticleDOI
TL;DR: This paper discusses the uses of the correlation coefficient r, either as a way to infer correlation, or to test linearity, and recommends the use of z Fisher transformation instead of r values because r is not normally distributed but z is (at least in approximation).
Abstract: Correlation and regression are different, but not mutually exclusive, techniques. Roughly, regression is used for prediction (which does not extrapolate beyond the data used in the analysis) whereas correlation is used to determine the degree of association. There situations in which the x variable is not fixed or readily chosen by the experimenter, but instead is a random covariate to the y variable. This paper shows the relationships between the coefficient of determination, the multiple correlation coefficient, the covariance, the correlation coefficient and the coefficient of alienation, for the case of two related variables x and y. It discusses the uses of the correlation coefficient r, either as a way to infer correlation, or to test linearity. A number of graphical examples are provided as well as examples of actual chemical applications. The paper recommends the use of z Fisher transformation instead of r values because r is not normally distributed but z is (at least in approximation). For eithe...

649 citations

Journal ArticleDOI
01 Nov 2003-Ecology
TL;DR: The co-inertia criterion for measuring the adequacy between two data sets is presented and can be easily extended to the cases of distance matrices or to the case of more than two tables.
Abstract: Ecological studies often require studying the common structure of a pair of data tables. Co-inertia analysis is a multivariate method for coupling two tables. It is often neglected by ecologists who prefer the widely used methods of redundancy analysis and canonical correspondence analysis. We present the co-inertia criterion for measuring the adequacy between two data sets. Co-inertia analysis is based on this criterion as are canonical correspondence analysis or canonical correlation analysis, but the latter two have additional constraints. Co-inertia analysis is very flexible and allows many possibilities for coupling. Co-inertia analysis is suitable for quantitative and/or qualitative or fuzzy environmental variables. Moreover, various weighting of sites and various transformations and/or centering of species data are available for this method. Hence, more ecological considerations can be taken into account in the statistical procedures. Moreover, the principle of this method is very general and can be easily extended to the case of distance matrices or to the case of more than two tables. Simulated ecological data are used to compare the co-inertia approach with other available methods.

592 citations

References
More filters
Book
01 Jan 1987
TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.
Abstract: 1. Introduction. 2. Simple Regression. 3. Multiple Regression. 4. The Special Case of One-Dimensional Location. 5. Algorithms. 6. Outlier Diagnostics. 7. Related Statistical Techniques. References. Table of Data Sets. Index.

6,955 citations

Book
09 Jul 1993
TL;DR: In this article, the authors present a summary of statistical tests for classical analysis, including errors in classical analysis - Statistics of Repeated Measurements and Statistical Tests for Instrumental Analysis.
Abstract: Introduction. Errors in Classical Analysis - Statistics of Repeated Measurements. Significance Tests. Quality Control and Sampling. Errors in Instrumental Analysis. Regression and Correlation. Non-parametric and Robust Methods. Experimental Design. Optimization and Pattern Recognition. Solutions to Exercises. Appendix 1: Summary of Statistical Tests. Appendix 2: Statistical Tests.

3,834 citations

Book
13 Mar 1991
TL;DR: In this paper, the authors present a directory of Symbols and Definitions for PCA, as well as some classic examples of PCA applications, such as: linear models, regression PCA of predictor variables, and analysis of variance PCA for Response Variables.
Abstract: Preface.Introduction.1. Getting Started.2. PCA with More Than Two Variables.3. Scaling of Data.4. Inferential Procedures.5. Putting It All Together-Hearing Loss I.6. Operations with Group Data.7. Vector Interpretation I : Simplifications and Inferential Techniques.8. Vector Interpretation II: Rotation.9. A Case History-Hearing Loss II.10. Singular Value Decomposition: Multidimensional Scaling I.11. Distance Models: Multidimensional Scaling II.12. Linear Models I : Regression PCA of Predictor Variables.13. Linear Models II: Analysis of Variance PCA of Response Variables.14. Other Applications of PCA.15. Flatland: Special Procedures for Two Dimensions.16. Odds and Ends.17. What is Factor Analysis Anyhow?18. Other Competitors.Conclusion.Appendix A. Matrix Properties.Appendix B. Matrix Algebra Associated with Principal Component Analysis.Appendix C. Computational Methods.Appendix D. A Directory of Symbols and Definitions for PCA.Appendix E. Some Classic Examples.Appendix F. Data Sets Used in This Book.Appendix G. Tables.Bibliography.Author Index.Subject Index.

3,534 citations