scispace - formally typeset
Journal ArticleDOI

Adaptive Hierarchical Clustering Schemes

F. James Rohlf
- 01 Mar 1970 - 
- Vol. 19, Iss: 1, pp 58-82
TLDR
This paper is concerned with a brief review of some of the techniques of summarizing phenetic similarities that have been proposed for use in numerical taxonomy and new procedures which allow for elongated and curvilinear clusters are proposed.
Abstract
Rohlf, F. J. (Biological Sciences, State Univ., Stony Brook, N. Y. 11790) 1970. Adaptive hierarchical clustering schemes. Syst. Zool., 18:58-82.-Various methods of summarizing phenetic relationships are briefly reviewed (including a comparison of principal components analysis and non-metric scaling). Sequential agglomerative hierarchical clustering schemes are considered in particular detail, and several new methods are proposed. The new algorithms are characterized by their ability to "adapt" to the possible trends of variation found within clusters as they are being formed. A nonlinear version allows the isolation and description of clusters which are parabolic, ring-shaped, etc., by the introduction of appropriate dummy variables. Procedures for computing the best fitting trend line through the cluster are also presented, and problems in measuring the amount of information lost by clustering are discussed. [Phenetics; cluster analysis; numerical taxonomy.] This paper is concerned with a brief review of some of the techniques of summarizing phenetic similarities that have been proposed for use in numerical taxonomy. One class of methods (sequential agglomera'tive) is considered in detail and new procedures which allow for elongated and curvilinear clusters are proposed. The "taxonomy problem" in biology can be described as follows: Given a set of specimens ("operational taxonomic units" or OTU's, Sokal and Sneath, 1963, which may represent taxa of any rank) known only by a list of their properties or characters, we wish to find the "best" way of describing their often complex patterns of mutual similarities (phenetic relationships). Such relationships do not necessarily imply evolutionary (cladistic) relationships (for a discussion of these approaches, see Sokal and Camin, 1965). The methods that have been developed appear to have a more general application than just in biological taxonomy, but there are certain facts and assumptions that can be made in biology which influence our choice of methods. As a result, the techniques may or may not be completely valid in other fields. Some of the considerations which influence the development of cluster analyses in biological taxonomy are the following: (1) "All things being equal" we would hope that a system of nested clusters would be found. This is due to the fact that evolution is believed usually to be a divergent process and the distribution of OTU's in a phenetic space should to some extent reflect this. There are, of course, exceptions to this overall rule which are very important, such as those provided by hybridization and clinal variation. (2) Another consideration is the nature of the character set representing each OTU. We would like to use a "random sampling of characters" or at least a "representative" sampling of characters. But since different sets of characters seem to yield slightly different systems of relationships (Rohlf, 1963; Ehrlich and Ehrlich, 1967; Michener and Sokal, 1966), biologists may have to get used to the idea of using different classifications, based upon different sets of characters, each best for its own special purpose, with overall similarities based on the total character set available at any one time. (3) The selection of OTU's is also not random. Since we cannot study all organisms, we must select those which are of immediate interest. But even with a specified group of organisms, we usually cannot sample at random. This is so because the distributions of recent (and even fossil) organisms are clumped in a phenetic hyperspace. One needs to pass up many very similar, common specimens to obtain a more interesting sampling of different kinds of organisms. Thus, a preliminary screening of individuals according to their apparent similarities must be made before one can make detailed measurements to analyze their phenetic relationships quan-

read more

Citations
More filters
Journal ArticleDOI

Use of two-block partial least-squares to study covariation in shape.

TL;DR: The relatively new two-block partial least-squares method for analyzing the covariance between two sets of variables is described and contrasted with the well-known method of canonical correlation analysis.
Book

Mathematical Classification and Clustering

Boris Mirkin
TL;DR: This paper presents a meta-analyses of Hierarchy as a Clustering Structure, a model for hierarchical clustering based on the model developed in [Bouchut-Boyaval, M3].
References
More filters
Book

An Introduction to Multivariate Statistical Analysis

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Journal ArticleDOI

On the shortest spanning subtree of a graph and the traveling salesman problem

TL;DR: Kurosh and Levitzki as discussed by the authors, on the radical of a general ring and three problems concerning nil rings, Bull Amer Math Soc vol 49 (1943) pp 913-919 10 -, On the structure of algebraic algebras and related rings.
Journal ArticleDOI

Hierarchical clustering schemes

TL;DR: A useful correspondence is developed between any hierarchical system of such clusters, and a particular type of distance measure, that gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data.
Journal ArticleDOI

Shortest connection networks and some generalizations

TL;DR: In this paper, the basic problem of interconnecting a given set of terminals with a shortest possible network of direct links is considered, and a set of simple and practical procedures are given for solving this problem both graphically and computationally.