scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Use of Faces to Represent Points in k- Dimensional Space Graphically

01 Jun 1973-Journal of the American Statistical Association (Taylor & Francis Group)-Vol. 68, Iss: 342, pp 361-368
TL;DR: Every multivariate observation is visualized as a computer-drawn face that makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data.
Abstract: A novel method of representing multivariate data is presented. Each point in k-dimensional space, k≤18, is represented by a cartoon of a face whose features, such as length of nose and curvature of mouth, correspond to components of the point. Thus every multivariate observation is visualized as a computer-drawn face. This presentation makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data. Other graphical representations are described briefly.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations


Cites methods from "The Use of Faces to Represent Point..."

  • ...Important techniques include iconographic displays such as Chernoff faces (Chernoff, 1973), pixel-based techniques (Keim, 2000), and techniques that represent the dimensions in the data as vertices in a graph (Battista et al....

    [...]

  • ...Important techniques include iconographic displays such as Chernoff faces (Chernoff, 1973), pixel-based techniques (Keim, 2000), and techniques that represent the dimensions in the data as vertices in a graph (Battista et al., 1994)....

    [...]

Book
21 Mar 2002
TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.
Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

9,509 citations


Cites methods from "The Use of Faces to Represent Point..."

  • ...The best known method is using Chernoff faces, where different features of the face represent different variables (Chernoff 1973; see also Everitt & Dunn 1991, Flury & Riedwyl 1988). These plots have been criticized, primarily because of the difficulty of rationally assigning variables to face features (Cox 1978), but they also have their supporters (Everitt & Dunn 1991, Flury & Riedwyl 1988). We illustrate these face plots with the Wisconsin forb data from Reich et al. (1999) in Figure 15....

    [...]

  • ...The best known method is using Chernoff faces, where different features of the face represent different variables (Chernoff 1973; see also Everitt & Dunn 1991, Flury & Riedwyl 1988)....

    [...]

Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations

Journal ArticleDOI
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Abstract: (2005). Applied Multivariate Statistical Analysis. Technometrics: Vol. 47, No. 4, pp. 517-517.

3,932 citations


Cites methods from "The Use of Faces to Represent Point..."

  • ...The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact faces are a simple way to graphically display high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury....

    [...]

  • ...The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact faces are a simple way to graphically display high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury. We follow the design described in Flury and Riedwyl (1988) which uses the following characteristics....

    [...]

Posted Content
01 Jan 2001
TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

3,765 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a method of plotting data of more than two dimensions is proposed, where each data point, x = (xi, *, xk), is mapped into a function of the form fx(t) = xl/ v/2 + x2 sin t + x3 cos t + X4 sin 2t + x5 cos 2t+, and the function is plotted on the range - 7r < t < 7r.
Abstract: SUMMARY A method of plotting data of more than two dimensions is proposed. Each data point, x = (xi, * , xk), is mapped into a function of the form fx(t) = xl/ v/2 + x2 sin t + x3 cos t + X4 sin 2t + x5 cos 2t + , and the function is plotted on the range - 7r < t < 7r. Some statistical properties of the method are explored. The application of the method is illustrated with an example from anthropology.

708 citations

Journal ArticleDOI
TL;DR: The Editor felt that the following article by Dr. Edgar E. Anderson, which appeared in the Proceedings of the National Academy of Sciences, would be of interest to the readers of Technometrics.
Abstract: Recognizing associations between large numbers of variables is a problem encountered in all the sciences. For this reason the Editor felt that the following article by Dr. Edgar E. Anderson, which appeared in the Proceedings of the National Academy of Sciences. Vol. 13, pp. 923–27, 1957, would be of interest to the readers of Technometrics. The article is republished with the kind permission of Dr. Anderson and of Dr. Wendell M. Stanley, the Editor of the Proceedings of the National Academy of Sciences.

116 citations

Journal ArticleDOI
TL;DR: In this paper, eight specimens of Eocene nummulitids from the Yellow Limestone Formation of northwestern Jamaica are classified according to quantitative measurements of morphologic parameters that are generally considered to be taxonomically useful.
Abstract: Eighty-eight specimens of Eocene nummulitids from the Yellow Limestone Formation of northwestern Jamaica are classified according to quantitative measurements of morphologic parameters that are generally considered to be taxonomically useful. The specimens are grouped into homogeneous classes by the computer screening of differently oriented data projections. By this method, the use of similarity coefficients and the question of a priori weighting of characters, for which numerical taxonomy has been heavily criticized, are both avoided. The stability of the classes thus obtained is validated by discriminant analysis. These techniques provide an objective view of phenetic differences among specimens and show how the measured characters produce those differences. Tightness of coiling and total number of whorls, prove to be the most useful features in discriminating between groups but seem to have taxonomic value only at the specific and not at the generic level. This suggests that the generaOperculinoides andNummulites are synonymous.

25 citations