Showing papers on "Cluster analysis published in 1974"

PDF

Open Access

Book•

[...]

Brian Everitt, Sabine Landau, Morven Leese

01 Jan 1974

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.

...read moreread less

Abstract: Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organising multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques are applicable in a wide range of areas such as medicine, psychology and market research. This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering. Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.

...read moreread less

9,857 citations

Journal Article•DOI•

A Projection Pursuit Algorithm for Exploratory Data Analysis

[...]

Jerome H. Friedman¹, John W. Tukey•Institutions (1)

Stanford University¹

01 Sep 1974-IEEE Transactions on Computers

TL;DR: An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples to find one-and two-dimensional linear projections of multivariable data that are relatively highly revealing.

...read moreread less

Abstract: An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples. The algorithm seeks to find one-and two-dimensional linear projections of multivariate data that are relatively highly revealing.

...read moreread less

1,635 citations

Journal Article•DOI•

Pattern Classification and Scene Analysis

[...]

Richard O. Duda¹, Peter E. Hart•Institutions (1)

Queen Mary University of London¹

01 May 1974

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

1,222 citations

Journal Article•DOI•

Some Applications of Graph Theory to Clustering.

[...]

Lawrence Hubert¹•Institutions (1)

University of Wisconsin-Madison¹

01 Sep 1974-Psychometrika

TL;DR: Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning.

...read moreread less

Abstract: This paper attempts to review and expand upon the relationship between graph theory and the clustering of a set of objects. Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning; these same ideas are then extended to include the more general problem of constructing subsets of objects with overlap. Finally, a number of related topics are surveyed within the general context of reinterpreting and justifying methods of clustering either through standard concepts in graph theory or their simple extensions.

...read moreread less

151 citations

Journal Article•DOI•

Some Recent Investigations of a New Fuzzy Partitioning Algorithm and its Application to Pattern Classification Problems

[...]

J. C. Dunn

01 Jan 1974

TL;DR: F fuzzy partitioning algorithm has potential value as a heuristic tool for identifying clusters within large finite data sets, and more specifically, for estimating the parameters in a mixture of unimodal probability densities, given a finite sample drawn from the mixture.

...read moreread less

Abstract: Recent results pertaining to a newly developed fuzzy partitioning algorithm are surveyed. The algorithm has potential value as a heuristic tool for identifying clusters within large finite data sets, and more specifically, for estimating the parameters in a mixture of unimodal probability densities, given a finite sample drawn from the mixture. The topics treated are: fuzzy partitions, conventional clustering algorithms, fuzzy clustering algorithms, asymptotic behavior of optimal fuzzy partitions with increasing cluster separation, scalar measures of partition fuzziness, and unsupervised learning and parameter estimation.

...read moreread less

126 citations

Journal Article•DOI•

Approximate Evaluation Techniques for the Single-Link and Complete-Link Hierarchical Clustering Procedures

[...]

Lawrence Hubert

01 Sep 1974-Journal of the American Statistical Association

TL;DR: Test the hypothesis that a hierarchical sequence of partitions constructed by the single-link or complete-link clustering method could have been obtained because of “noise” by referring the Goodman-Kruskal rank correlation y statistic to an approximate permutation distribution.

...read moreread less

Abstract: A technique is presented for testing the hypothesis that a hierarchical sequence of partitions constructed by the single-link or complete-link clustering method could have been obtained because of “noise.” Two rank orderings of the object pairs are compared. One of the orderings is obtained from the initial proximity values; the second is derived from the levels at which an object pair first appears within a single subset within the hierarchy. The hypothesis that the given set of proximity values have been assigned randomly is tested by referring the Goodman-Kruskal rank correlation y statistic to an approximate permutation distribution.

...read moreread less

122 citations

Journal Article•DOI•

Technical Note-Clustering a Data Array and the Traveling-Salesman Problem

[...]

J. K. Lenstra

01 Apr 1974-Operations Research

TL;DR: A clustering of a nonnegative M×N array is obtained by permuting its rows and columns and can be stated as two traveling-salesman problems.

...read moreread less

Abstract: A clustering of a nonnegative M×N array is obtained by permuting its rows and columns. W. T. McCormick et al. [Opns. Res. 20, 993-1009 1972] measure the effectiveness of a clustering by the sum of all products of nearest-neighbor elements in the permuted array. This note points out that this clustering problem can be stated as two traveling-salesman problems.

...read moreread less

88 citations

Journal Article•DOI•

More on measures of category clustering in free recall-although probably not the last word.

[...]

Robert Frender¹, Peter Doubilet•Institutions (1)

Harvard University¹

01 Jan 1974-Psychological Bulletin

48 citations

Journal Article•DOI•

Error intervals and cluster density in channel modeling (Corresp.)

[...]

J.-P. Adoul

01 Jan 1974-IEEE Transactions on Information Theory

TL;DR: It is demonstrated that cluster density can be analytically described from the distributions of intervals between errors and derived clustering properties hold for any stationary process.

...read moreread less

Abstract: This correspondence is concerned with binary processes and presents results with immediate applications in the modeling of digital channels for the purpose of evaluating code performance. It is demonstrated that cluster density can be analytically described from the distributions of intervals between errors. These relations and derived clustering properties hold for any stationary process. Analyses of real error data exemplify the use of these results in regard to channels having dependent inter-error intervals.

...read moreread less

44 citations

Journal Article•DOI•

Further Experiments with Hierarchic Clustering in Document Retrieval.

[...]

C.J. van Rijsbergen¹•Institutions (1)

Monash University, Clayton campus¹

01 Jan 1974-Information Storage and Retrieval

TL;DR: A framework for the evaluation of cluster-based retrieval strategies is constructed and these strategies are shown to be dependent on the method of cluster representation (cluster profile) adopted.

...read moreread less

38 citations

Proceedings Article•

The Dynamic Clusters Method in Pattern Recognition.

[...]

Edwin Diday, A. Schroeder, Y. Ok

01 Jan 1974

Journal Article•DOI•

The Atom Probe and Markov Chain Statistics of Clustering

[...]

Charles A. Johnson, Jerome H. Klotz

01 Nov 1974-Technometrics

TL;DR: In this article, a Markov chain generalization of the binomial model is proposed to investigate clustering of like atoms within an alloy of two types of metalic elements and a combinatorial analysis is developed which provides the exact distribution of the sufficient statistics and permits small sample comparisons of expectations and mean square errors of estimates with their large sample approximations.

...read moreread less

Abstract: The field ion microscope atom probe, used in the exploration of crystal structure, is discussed. Data is generated from a probe of an alloy of two types of metalic elements. A statistical model is formulated to investigate clustering of like atoms within the alloy. Physical considerations motivate a Markov chain generalization of the binomial model. A parameter in the model which measures the degree of clustering is estimated by maximum likelihood and large sample distribution theory is given using results from Billingsly. A combinatorial analysis is developed which provides the exact distribution of the sufficient statistics and permits small sample comparisons of expectations and mean square errors of estimates with their large sample approximations. The model is applied to some preliminary data and is used to point up and quantify some difficulties in the experiment.

...read moreread less

Journal Article•DOI•

Spanning trees and aspects of clustering

[...]

Lawrence Hubert¹•Institutions (1)

University of Wisconsin-Madison¹

01 May 1974-British Journal of Mathematical and Statistical Psychology

TL;DR: Most of the paper is devoted to stating relationships between spanning trees, single-link and complete-link hierarchical clustering, network flow and two divisive clustering procedures.

...read moreread less

Abstract: The concept of a spanning tree for a weighted graph is used to characterize several methods of clustering a set of objects. In particular, most of the paper is devoted to stating relationships between spanning trees, single-link and complete-link hierarchical clustering, network flow and two divisive clustering procedures. Several related topics using the notion of a spanning tree are also mentioned.

...read moreread less

Journal Article•DOI•

On the Validity of the Grouping Method —Comments on "Analysis of the Clustering Process of Supersaturated Lattice Vacancies"—

[...]

Masahiro Koiwa¹•Institutions (1)

Tohoku University¹

15 Dec 1974-Journal of the Physical Society of Japan

TL;DR: In this article, it was shown that for the present model problem, the grouping method is not valid and, therefore, it seems difficult to justify such a procedure mathematically, at least for the current model problem.

...read moreread less

Abstract: The validity of the “grouping method” has been examined critically; the method was contrived by Kiritani in treating an enormously large number of simultaneous differential equations which represent the clustering process of quenched-in vacancies. The method is applied to a model problem which can be solved rigorously. Size distributions of clusters at the completion of a reaction are derived by the exact and the grouping methods; when calculated with the latter method, the distribution becomes broader and the position of the maximum shifts to lower sizes. Thus, it is concluded that, at least for the present model problem, the grouping method is not valid. By the grouping method one attempts to obtain an average of many differential equations. It seems difficult to justify such a procedure mathematically.

...read moreread less

Journal Article•DOI•

A Clustering Algorithm Based on User Queries.

[...]

C. T. Yu¹•Institutions (1)

University of Alberta¹

01 Jul 1974-Journal of the Association for Information Science and Technology

TL;DR: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented and experimental results indicate that the proposed method is superior to the other methods.

...read moreread less

Abstract: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented. It is compared to Bonner's Method, Rocchio's Method, Dattola's Method and the Single Link Method in three different aspects, namely system effectiveness, system efficiency and the time required for clustering. Experimental results using the Cranfield 424 collection indicate that the proposed method is superior to the other methods.

...read moreread less

Journal Article•DOI•

Cluster Analysis Based on Dimensional Information with Applications to Feature Selection and Classification

[...]

Daryl J. Eigen¹, Frederick R. Fromm¹, Richard A. Northouse¹•Institutions (1)

University of Wisconsin–Milwaukee¹

01 May 1974

TL;DR: A new clustering algorithm is presented that is based on dimensional information that includes an inherent feature selection criterion, which is discussed and shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

...read moreread less

Abstract: A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

...read moreread less

Journal Article•DOI•

A Model for Continuous Clustering in the Large-Scale Distribution of Matter

[...]

P. J. E. Peebles¹, P. J. E. Peebles²•Institutions (2)

Princeton University¹, University of California, Berkeley²

01 Dec 1974-Astrophysics and Space Science

TL;DR: In this article, an attempt to find an acceptable model for clustering consistent with the picture of a continous hierarchy is discussed. But it is not clear whether superclusters are entities distinguishable in some natural and fundamental way from clusters, or from groups, or even from individual galaxies.

...read moreread less

Abstract: A new view of the nature of the large-scale distribution of matter is suggested by the fact that the covariance function for the distribution varies smoothly, like a power law, over a wide range of separations. This leads one to ask whether superclusters are entities distinguishable in some natural and fundamental way from clusters, or from groups, or even from individual galaxies. I discuss here an attempt to find an acceptable model for clustering consistent with the picture of a continous hierarchy.

...read moreread less

Journal Article•DOI•

Remarks on the root-clustering of a polynomial in a certain region in the complex plane

[...]

E. I. Jury, S. M. Ahn

01 Jan 1974-Quarterly of Applied Mathematics

Journal Article•DOI•

Optimization in non-hierarchical clustering

[...]

Edwin Diday¹•Institutions (1)

University of Paris-Sud¹

01 Jun 1974-Pattern Recognition

TL;DR: The main aim of this paper is a synthetical study of properties of optimality in spaces formed by partitions of a finite set, and formalizes and takes for a model of that study a family of particularly efficient technics of “clusters centers” type.

...read moreread less

Journal Article•DOI•

Multiplicity Distribution under the Clustering Assumption in High Energy Hadron Collisions

[...]

Naomichi Suzuki¹•Institutions (1)

Waseda University¹

01 May 1974-Progress of Theoretical Physics

Journal Article•DOI•

Clustering patterns in high-energy hadron collisions

[...]

T. Ludlam¹, R. Slansky¹, J. Slaughter¹, F.T. Dao², J. Lach², E. Malamud², J. Schivell², R. Poster³, R. Engelmann⁴, T. Kafka⁴, M. Pratap⁴ - Show less +7 more•Institutions (4)

United States Atomic Energy Commission¹, Universities Research Association², National Science Foundation³, State University of New York System⁴

04 Mar 1974-Physics Letters B

TL;DR: In this article, the authors analyzed the event-to-event fluctuations of the rapidity distributions for semi-inclusive data obtained from 205 GeV/c and 303 GeV /c proton-proton collisions observed at NAL.

...read moreread less

Journal Article•DOI•

Clustering in multiple production

[...]

Alexander Wu Chao¹, Chris Quigg¹•Institutions (1)

State University of New York System¹

01 Apr 1974-Physical Review D

TL;DR: In this paper, the existence of clusters is strongly suggested by a number of correlations which have recently been observed in high-energy experiments, and call attention to strategies for studying the properties of produced clusters.

...read moreread less

Abstract: The notion of clustering in many-particle production is defined and applied to experimental situations. We argue that the existence of clusters is strongly suggested by a number of correlations which have recently been observed in high-energy experiments, and call attention to strategies for studying the properties of produced clusters.

...read moreread less

Journal Article•DOI•

Self-Organizing Probability State Variable Parameter Search Algorithms for Systems that Must Avoid High-Penalty Operating Regions

[...]

Anthony N. Mucciardi

01 Jul 1974

TL;DR: Experimental results are presented to demonstrate the utility of the self-organizing PSV search algorithms and clustering analysis, which has great promise as a means of assessing the complexity of an optimization problem.

...read moreread less

Abstract: Self-organizing probability state variable (PSV) parameter search algorithms possessing long-term memory have been formulated to cope with systems that must avoid high performance-penalty operating regions. The information gained from all previous experiments is efficiently encoded in multivariate probability distribution functions (pdf's). This long-term memory capability enables the PSV algorithms to avoid effectively future experiments in high penalty regions. The systems considered are resource-limited, and catastrophic failure may occur if parameter values lying in high penalty regions are implemented. Those cases in which the high penalty regions are not known in advance were investigated. The PSV algorithms have the capability of adaptively learning the location and hypervolume of these regions as the search proceeds. The algorithms are explicitly guided in their internal strategies as a function of the remaining system resources and the updated probability distribution functions. Clustering analysis is used both in the discovery of new operating regions and for updating the pdf's. As a by-product of this research, clustering was also investigated as a presearch scheme. It is shown that this procedure has great promise as a means of assessing the complexity of an optimization problem. Experimental results are presented to demonstrate the utility of the self-organizing PSV search algorithms.

...read moreread less

Journal Article•DOI•

A cluster analysis based on graph theory

[...]

H. Van Groenewoud, P. Ihm¹•Institutions (1)

University of Marburg¹

17 May 1974-Plant Ecology

TL;DR: In this paper, a clustering method is presented that groups sample plots (stands or other units) together, based on their proximity in a multidimensional test space in which the axes represent the attributes (species) of the individuals (sample plots, etc.).

...read moreread less

Abstract: A clustering method is presented that groups sample plots (stands or other units) together, based on their proximity in a multidimensional test space in which the axes represent the attributes (species) of the individuals (sample plots, etc.). The resulting dendrogram is used to make subjective judgements on the type and distinctiveness of the groupings.

...read moreread less

Journal Article•DOI•

The best-match problem in document retrieval

[...]

C.J. Van Rijsbergen¹•Institutions (1)

Monash University¹

01 Nov 1974-Communications of The ACM

TL;DR: The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval, aimed at reducing the number of comparisons needed to achieve the desired result.

...read moreread less

Abstract: Introduction In a recent paper Burkhard and Keller [I] discuss the best-match p rob lem-the problem "of searching the set of keys in a file to find a key which is closest to a given query." Taking my cue f rom their paper, I present some work which I have done on the same problem in a related field. The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval. In both cases retrieval is based on a file structure imposed on the information, whether keys or documents, aimed at reducing the number of comparisons needed to achieve the desired result. In the case of keys, Burkhard and Keller recommend for their more sophisticated file structure (they recommend several simpler ones) a minimal cover of cliques C such that (1) every key is in at least one element of C, and (2) for no other smaller set C' does (1) hold. Unfortunately finding C requires the generation of (almost) all cliques on the set of keys. It is well known that the computat ion time to generate all cliques can be excessive. The only known bound on this time is so high, order 0(k) n for n keys, that it amounts to no bound at all. The number of cliques in a graph can increase dramatically with the number of nodes in the graph. This in itself has been found to be a hurdle in applications to document retrieval (see e.g. Minker, et al. [6]). So, for applications in document retrieval, where the number of documents to be clustered may be of the order of hundreds of thousands, clique generation is just too slow. Nevertheless, related clustering approaches have Copyright © 1974, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Department of Information Science, Monash University, Clayton, Victoria, 3168, Australia. been attempted by Salton [8], Litofsky [5], Crouch [2], and Van Rijsbergen [10], who have called in the techniques of cluster analysis to classify the documents so that the search time may be reduced. In both data and document retrieval search time is reduced by selection of a good clique or cluster representative. Burkhard and Keller proposed a method for selecting clique representatives. In document retrieval, one of the cluster representatives selected for use on heuristic grounds in [4] has proved to have an interesting theoretical basis and I describe it here.

...read moreread less

Journal Article•DOI•

Effects of clustering on the magnetic properties of transition metal alloys

[...]

D. Sherrington¹, K. Mihill¹•Institutions (1)

Imperial College London¹

01 May 1974-Le Journal De Physique Colloques

TL;DR: In this article, a simple theoretical framework is presented for examining the effects of clustering on local moment formation and magnetic ordering in metallic alloys in which the solute is a transition metal ferromagnetic in the bulk, but for which an isolated atom in the solvent does not necessarily carry a magnetic moment (in a Hartree-Fock sense).

...read moreread less

Abstract: A simple theoretical framework is presented for examining the effects of clustering on local moment formation and magnetic ordering in metallic alloys in which the solute is a transition metal ferromagnetic in the bulk, but for which an isolated atom in the solvent does not necessarily carry a magnetic moment (in a Hartree-Fock sense). Consequences of the model are discussed and compared qualitatively with experiment. Some transition metals exhibit spontaneous long range magnetic order, eg Ni, Co, Fe. These we shall call type A. Those which have no such spontaneous order, eg Pd, Rh, Pt, we shall call type B. Simple metals, such as Zn, Cu, Al, we shall refer to as type C. We are interested in investigating theoretically the magnetic properties of BA and CA alloys at finite concentrations. We a r e p a r t i c ~ l a > ~ concerned with the effects of statistical (or metallurgical) clustering in producing tt local moments )> in alloys for which an isolated A atom (in B or C ) is non-magnetic in a Friedel-Anderson or Hartree sense. We restrict our discussion to simple models and simple but non-trivial approximations. As a model for a BA alloy, we take the d-electron Hamiltonian where in general the parameters t, V, U depend upon whether the atoms at i and j are A or B type. The problem may be simplified whilst retaining its essential features by assuming tij independent of the type of atoms, by taking the constituents to have the same number of d-electrons per atom, by treating the local electron number in Hartree approximation, and by assuming (Vi + Ui 4 2 ) = constant, due to sp charge transfer. We then obtain the simple model d-electron Hamiltonian \" i X = C tij a,+, aju C-Si.Si iju i 4

...read moreread less

Journal Article•DOI•

Graphs Implied by the Jardine-Sibson Overlapping Clustering Methods, Bk

[...]

F. James Rohlf

01 Sep 1974-Journal of the American Statistical Association

TL;DR: The properties of the graphs whose edges correspond to the dissimilarities left invariant by the Jardine-Sibson Bk clustering method are examined and algorithms are given for the determination of Bk clusters.

...read moreread less

Abstract: A cluster analysis can be Interpreted as a function which maps the input dissimilarity matrix into an output dissimilarity matrix whose elements indicate the dissimilarity between pairs of objects. Some cluster analyses leave invariant the dissimilarities between certain pairs of objects. The set of elements left invariant by the single-linkage clustering method corresponds to the edges in the minimum spanning tree. The properties of the graphs whose edges correspond to the dissimilarities left invariant by the Jardine-Sibson Bk clustering method are examined and algorithms are given for the determination of Bk clusters.

...read moreread less

Journal Article•DOI•

Neural tube defects in a country town: Confirmation of clustering within a particularly small area

[...]

M. J. Aylett, C. J. Roberts, S. Lloyd

01 Aug 1974-Journal of Epidemiology and Community Health

TL;DR: Cl clustering was confirmed, a critical distance between cases of up to 100 metres giving a highly significant result (P = 0·001), and with one exception the observed number of pairs significantly exceeds the expected number even up to 1,000 metres.

...read moreread less

Abstract: Eighteen infants with neural tube defects occurring in 979 births over five years in a small Wiltshire town were investigated for evidence of spatial epidemicity. Applying a method not used previously in the study of these defects, clustering was confirmed, a critical distance between cases of up to 100 metres giving a highly significant result (P = 0·001), and with one exception the observed number of pairs significantly exceeds the expected number (P < 0·01) even up to 1,000 metres.

...read moreread less

An automated and repeatable data analysis procedure for remote sensing applications

[...]

B. J. Davis, P. H. Swain¹•Institutions (1)

Purdue University¹

01 Jan 1974

TL;DR: A new multispectral data analysis procedure, based on LARSYS, has been developed which substantially reduces the influence of the analyst.

...read moreread less

Abstract: A new multispectral data analysis procedure, based on LARSYS, has been developed which substantially reduces the influence of the analyst. The analysis is automated, including the interpretation of clustering results. The classification results obtained are repeatable and not biased by analyst subjectivity during the analysis.

...read moreread less

Journal Article•DOI•

An application of cluster analysis to the construction of a diagnostic classification

[...]

B.S. Duran¹, T.O. Lewis¹•Institutions (1)

Texas Tech University¹

01 Dec 1974-Computers in Biology and Medicine

TL;DR: A similarity coefficient involving two measurements is defined and various clustering procedures are studied using the similarity coefficient and emphasis is placed on the application of the cluster analysis procedures to the construction of a diagnostic classification.

...read moreread less