The Use of Faces to Represent Points in k- Dimensional Space Graphically

doi:10.1080/01621459.1973.10482434

Home
/
Papers
/
The Use of Faces to Represent Points in k- Dimensional Space Graphically

Journal Article•DOI•

The Use of Faces to Represent Points in k- Dimensional Space Graphically

Herman Chernoff¹•Institutions (1)

Stanford University¹

01 Jun 1973-Journal of the American Statistical Association (Taylor & Francis Group)-Vol. 68, Iss: 342, pp 361-368

TL;DR: Every multivariate observation is visualized as a computer-drawn face that makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data.

read less

Abstract: A novel method of representing multivariate data is presented. Each point in k-dimensional space, k≤18, is represented by a cartoon of a face whose features, such as length of nose and curvature of mouth, correspond to components of the point. Thus every multivariate observation is visualized as a computer-drawn face. This presentation makes it easy for the human mind to grasp many of the essential regularities and irregularities present in the data. Other graphical representations are described briefly.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•

Visualizing Data using t-SNE

[...]

Laurens van der Maaten, Geoffrey E. Hinton

01 Jan 2008-Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

...read moreread less

30,124 citations

Cites methods from "The Use of Faces to Represent Point..."

...Important techniques include iconographic displays such as Chernoff faces (Chernoff, 1973), pixel-based techniques (Keim, 2000), and techniques that represent the dimensions in the data as vertices in a graph (Battista et al....
[...]
...Important techniques include iconographic displays such as Chernoff faces (Chernoff, 1973), pixel-based techniques (Keim, 2000), and techniques that represent the dimensions in the data as vertices in a graph (Battista et al., 1994)....
[...]

Book•

Experimental Design and Data Analysis for Biologists

[...]

Gerry P. Quinn¹, Michael J. Keough²•Institutions (2)

Monash University¹, University of Melbourne²

21 Mar 2002

TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.

...read moreread less

Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

...read moreread less

9,509 citations

Cites methods from "The Use of Faces to Represent Point..."

...The best known method is using Chernoff faces, where different features of the face represent different variables (Chernoff 1973; see also Everitt & Dunn 1991, Flury & Riedwyl 1988). These plots have been criticized, primarily because of the difficulty of rationally assigning variables to face features (Cox 1978), but they also have their supporters (Everitt & Dunn 1991, Flury & Riedwyl 1988). We illustrate these face plots with the Wisconsin forb data from Reich et al. (1999) in Figure 15....
[...]
...The best known method is using Chernoff faces, where different features of the face represent different variables (Chernoff 1973; see also Everitt & Dunn 1991, Flury & Riedwyl 1988)....
[...]

Journal Article•DOI•

Statistical pattern recognition: a review

[...]

Anil K. Jain¹, Robert P. W. Duin², Jianchang Mao³•Institutions (3)

Michigan State University¹, Delft University of Technology², IBM³

01 Jan 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

6,527 citations

Journal Article•DOI•

Applied Multivariate Statistical Analysis

[...]

Charles E. Heckler

01 Nov 2005-Technometrics

TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.

...read moreread less

Abstract: (2005). Applied Multivariate Statistical Analysis. Technometrics: Vol. 47, No. 4, pp. 517-517.

...read moreread less

3,932 citations

Cites methods from "The Use of Faces to Represent Point..."

...The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact faces are a simple way to graphically display high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury....
[...]
...The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple “face”. In fact faces are a simple way to graphically display high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury. We follow the design described in Flury and Riedwyl (1988) which uses the following characteristics....
[...]

Posted Content•

Principles of data mining

[...]

David J. Hand, Heikki Mannila, Padhraic Smyth

01 Jan 2001

TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.

...read moreread less

Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

...read moreread less

3,765 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Plots of high-dimensional data

[...]

David F. Andrews

01 Mar 1972-Biometrics

TL;DR: In this article, a method of plotting data of more than two dimensions is proposed, where each data point, x = (xi, *, xk), is mapped into a function of the form fx(t) = xl/ v/2 + x2 sin t + x3 cos t + X4 sin 2t + x5 cos 2t+, and the function is plotted on the range - 7r < t < 7r.

...read moreread less

Abstract: SUMMARY A method of plotting data of more than two dimensions is proposed. Each data point, x = (xi, * , xk), is mapped into a function of the form fx(t) = xl/ v/2 + x2 sin t + x3 cos t + X4 sin 2t + x5 cos 2t + , and the function is plotted on the range - 7r < t < 7r. Some statistical properties of the method are explored. The application of the method is illustrated with an example from anthropology.

...read moreread less

708 citations

Journal Article•DOI•

A Semigraphical Method for the Analysis of Complex Problems

[...]

Edgar Anderson

01 Aug 1960-Technometrics

TL;DR: The Editor felt that the following article by Dr. Edgar E. Anderson, which appeared in the Proceedings of the National Academy of Sciences, would be of interest to the readers of Technometrics.

...read moreread less

Abstract: Recognizing associations between large numbers of variables is a problem encountered in all the sciences. For this reason the Editor felt that the following article by Dr. Edgar E. Anderson, which appeared in the Proceedings of the National Academy of Sciences. Vol. 13, pp. 923–27, 1957, would be of interest to the readers of Technometrics. The article is republished with the kind permission of Dr. Anderson and of Dr. Wendell M. Stanley, the Editor of the Proceedings of the National Academy of Sciences.

...read moreread less

116 citations

Journal Article•DOI•

Pattern and process in the evolution of human septic shock.

[...]

John H. Siegel¹, John H. Siegel², Roger M. Goldwyn², Roger M. Goldwyn¹, Herman P. Friedman¹, Herman P. Friedman² - Show less +2 more•Institutions (2)

Systems Research Institute¹, Albert Einstein College of Medicine²

01 Aug 1971-Surgery

79 citations

Journal Article•DOI•

Numerical classification applied to certain Jamaican eocene nummulitids

[...]

Raymond M. Wright¹, Paul Switzer¹•Institutions (1)

Stanford University¹

01 Sep 1971-Mathematical Geosciences

TL;DR: In this paper, eight specimens of Eocene nummulitids from the Yellow Limestone Formation of northwestern Jamaica are classified according to quantitative measurements of morphologic parameters that are generally considered to be taxonomically useful.

...read moreread less

Abstract: Eighty-eight specimens of Eocene nummulitids from the Yellow Limestone Formation of northwestern Jamaica are classified according to quantitative measurements of morphologic parameters that are generally considered to be taxonomically useful. The specimens are grouped into homogeneous classes by the computer screening of differently oriented data projections. By this method, the use of similarity coefficients and the question of a priori weighting of characters, for which numerical taxonomy has been heavily criticized, are both avoided. The stability of the classes thus obtained is validated by discriminant analysis. These techniques provide an objective view of phenetic differences among specimens and show how the measured characters produce those differences. Tightness of coiling and total number of whorls, prove to be the most useful features in discriminating between groups but seem to have taxonomic value only at the specific and not at the generic level. This suggests that the generaOperculinoides andNummulites are synonymous.

...read moreread less

25 citations