Showing papers on "Cluster analysis published in 1977"

PDF

Open Access

Book•

Methods for Statistical Data Analysis of Multivariate Observations

[...]

01 Jan 1977

TL;DR: In this paper, the authors present an assessment of specific aspects of multivariate statistical models, including reduction of dimensionality, reduction of dependence, and clustering of multidimensional dependencies.

...read moreread less

Abstract: Reduction of Dimensionality. Development and Study of Multivariate Dependencies. Multidimensional Classification and Clustering. Assessment of Specific Aspects of Multivariate Statistical Models. Summarization and Exposure. References. Appendix. Indexes.

...read moreread less

1,059 citations

Journal Article•DOI•

Additive similarity trees

[...]

Shmuel Sattath¹, Amos Tversky¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Sep 1977-Psychometrika

TL;DR: A computer program, ADDTREE, for the construction of additive trees is described and applied to several sets of data, and some empirical and theoretical advantages of tree representations over spatial representations of proximity data are illustrated.

...read moreread less

Abstract: Similarity data can be represented by additive trees. In this model, objects are represented by the external nodes of a tree, and the dissimilarity between objects is the length of the path joining them. The additive tree is less restrictive than the ultrametric tree, commonly known as the hierarchical clustering scheme. The two representations are characterized and compared. A computer program, ADDTREE, for the construction of additive trees is described and applied to several sets of data. A comparison of these results to the results of multidimensional scaling illustrates some empirical and theoretical advantages of tree representations over spatial representations of proximity data.

...read moreread less

594 citations

Journal Article•DOI•

Storage and access in relational data bases

[...]

M. W. Blasgen, K. P. Eswaran

01 Dec 1977-Ibm Systems Journal

TL;DR: Results indicate that physical clustering of logically adjacent items is a critical performance parameter for relational query evaluation and methods that depend on sorting the records themselves seem to be the algorithm of choice.

...read moreread less

Abstract: A model of storage and access to a relational data base is presented. Using this model, four techniques for evaluating a general relational query that involves the operations of projection, restriction, and join are compared on the basis of cost of accessing secondary storage. The techniques are compared numerically and analytically for various values of important parameters. Results indicate that physical clustering of logically adjacent items is a critical performance parameter. In the absence of such clustering, methods that depend on sorting the records themselves seem to be the algorithm of choice.

...read moreread less

259 citations

Book Chapter•DOI•

The Relationship between Multidimensional Scaling and Clustering

[...]

Joseph B. Kruskal¹•Institutions (1)

Bell Labs¹

01 Jan 1977

TL;DR: This chapter describes the relationship between the clustering and multidimensional scaling, and describes some applications of clustering to astronomy that are not famous in the field of clusters.

...read moreread less

Abstract: Publisher Summary This chapter describes the relationship between the clustering and multidimensional scaling. The clustering and multidimensional scaling are both methods for analyzing data. To some extent, they are in competition with one another. However, the clustering and multidimensional scaling stand in a strongly complementary relationship. They can be used together in several ways, and these joint uses are often desirable. The chapter describes some applications of clustering to astronomy that are not famous in the field of clustering. Many of the basic concepts of clustering belong to the biological inheritance of humans and many other animals. The concept of similarity is built into the human nervous system. There are three main types of data used in clustering: (1) multivariate data, (2) proximity data, and (3) clustering data. The multivariate data gives the values of several variables for several individuals. The proximity data consist of proximities among objects of the same kind, either proximities among individuals, or proximities among variables, or proximities among stimuli, or proximities among objects of any single cohesive type.

...read moreread less

125 citations

Book Chapter•DOI•

Distribution Problems in Clustering

[...]

J. A. Hartigan¹•Institutions (1)

Yale University¹

01 Jan 1977

TL;DR: A statistical problem that is encountered in deciding which of the many clusters presented by algorithms are real is discussed, which requires the asymptotic theory to be validated by Monte Carlo experiments.

...read moreread less

Abstract: Publisher Summary The very large growth in clustering techniques and applications is not yet supported by development of statistical theory by which the clustering results are evaluated. A number of branches of statistics are relevant to clustering, namely, discriminant analysis, eigenvector analysis, analysis of variance, multiple comparisons, density estimation, contingency tables, piecewise fitting, and regression. These are all areas where the techniques are used in evaluating clusters or where clustering operations occur. This chapter discusses a statistical problem that is encountered in deciding which of the many clusters presented by algorithms are real. There is no easy generally applicable definition of real. A data cluster is real if it corresponds to one of the population clusters. The mixture techniques, k-means, single linkage, complete linkage and other common algorithms are examined to give measures of the reality of their clusters. A reasonable significance testing procedure requires the asymptotic theory to be validated by Monte Carlo experiments.

...read moreread less

104 citations

Journal Article•DOI•

Cluster Analysis Using Seed Points and Density-Determined Hyperspheres as an Aid to Global Optimization

[...]

Aimo A. Torn

01 Aug 1977

TL;DR: It is demonstrated that there exist classes of global optimization problems for which the probability of obtaining a solution is greater for the proposed model than for multiple local optimizations.

...read moreread less

Abstract: A model for finding the local optima of a multimodal function defined in a region A ? Rn is proposed. The method uses a local optimizer which is started from a number of points sampled in A. In order to reduce the number of function evaluations needed to reach the local optima, the parallel local search processes are stopped repeatedly, the working points clustered, and a reduced number of processes from each cluster resumed. A direct nonhierarchical cluster analysis technique is presented. The dissimilarity measure used is the Euclidean distance between points. Clusters are grown from seed points. The number of required distance evaluations is less than or equal to c(n-1), where n is the number of points and c is the number of clusters arrived at. Thresholds are determined by the point density in a body which in turn is determined by the given points. The covariance matrix is diagonalized, and a decision on the dimensionality of the space containing the points can be made. The volume of the body is proportional to the square root of the product of the corresponding eigenvalues. The performance of the clustering analysis technique is illustrated. It is demonstrated that there exist classes of global optimization problems for which the probability of obtaining a solution is greater for the proposed model than for multiple local optimizations. Some experiences gained from using the model are reported.

...read moreread less

98 citations

Journal Article•DOI•

The Psychological Structure of Leisure: Activities, Needs, People

[...]

Manuel London, Rick Crandall, Dale E. Fitzgibbons

01 Sep 1977-Journal of Leisure Research

TL;DR: In this article, a technique for clustering leisure activities which takes into consideration individual differences in the perceived needs that the activities satisfy is presented, and the authors demonstrate how to cluster leisure activities based on individual differences.

...read moreread less

Abstract: The present study demonstrates a technique for clustering leisure activities which takes into consideration individual differences in the perceived needs that the activities satisfy. This extends p...

...read moreread less

92 citations

Book Chapter•DOI•

Graph Theoretic Techniques for Cluster Analysis Algorithms

[...]

David W. Matula¹•Institutions (1)

Southern Methodist University¹

01 Jan 1977

TL;DR: The most desirable cluster analysis models for substantive applications should have the input proximity data expressible in a manner faithfully representing only the reliable information content of the empirically measured data.

...read moreread less

Abstract: Publisher Summary The output of a cluster analysis method is a collection of subsets of the object set termed clusters characterized in some manner by relative internal coherence and/or external isolation, along with a natural stratification of these identified clusters by levels of cohesive intensity. In formalizing a model of the cluster analysis methods, it is essential to consider the nature and inherent reliability of the proximity data that constitutes the input in substantive clustering applications. The proximity value scales are dichotomous. It is the practice of most authors of cluster methods to assume that the proximity values are available in the form of a real symmetric matrix, where any unjustified structure implicit in the real values is either to be ignored or axiomatically disallowed. The most desirable cluster analysis models for substantive applications should have the input proximity data expressible in a manner faithfully representing only the reliable information content of the empirically measured data.

...read moreread less

85 citations

Journal Article•DOI•

A Clustering Procedure for Syntactic Patterns

[...]

King-Sun Fu, Shin-Yee Lu

01 Oct 1977

TL;DR: A nearest neighbor recognition rule for syntactic patterns using the proposed distance as a similarity measure, a clustering procedure for syntactical patterns is described and a character recognition experiment is given.

...read moreread less

Abstract: A distance between two syntactic patterns is defined in terms of error transformations. This definition is extended to the case of distance measures between one syntactic pattern and a group of syntactic patterns. A nearest neighbor recognition rule for syntactic patterns using the proposed distance is then given. Using the proposed distance as a similarity measure, a clustering procedure for syntactic patterns is described. A character recognition experiment is given as an illustrative example.

...read moreread less

82 citations

Journal Article•DOI•

Network Design: An Algorithm for the Access Facility Location Problem

[...]

P. McGregor, D. Shen

01 Jan 1977-IEEE Transactions on Communications

TL;DR: A topological design aspect of the access problem, which is formulated as the locating of generic access facilities (GAF's) to obtain an economic connection of nodes (users) to a resource connection point (RESCOP).

...read moreread less

Abstract: In any network where a large number of widely dispersed "users" share a limited number of "resources," the strategy for access will play a large part in determining the cost and performance of the network. In this paper we consider a topological design aspect of the access problem. In particular, we consider the problem of locating "access facilities," or concentration points, to obtain an economic connection of users to resources. The problem is formulated as the locating of generic access facilities (GAF's) to obtain an economic connection of nodes (users) to a resource connection point (RESCOP). The nodes may be connected through multipoint lines, but with a constraint on the number of nodes which may share a single line. The GAF's are constrained in capacity, expressed as the number of nodes they can support, and have a cost associated with them. The basic solution technique presented is a heuristic algorithm characterized by the following four steps. 1) Simplify the problem to a point-to-point problem by replacing clusters of nodes by single "center-of-mass" (COM) nodes. 2) Partition the reduced set of COM nodes by applying an Add algorithm, resulting in one of the COM nodes selected as a GAF site. 3) Select one of the original nodes as a real GAF site in each partition by examining the original nodes closest to the COM node selected in the Add algorithm, and selecting the best. 4) Apply a line-layout algorithm to each partition, with its selected GAF site serving as the central node.

...read moreread less

79 citations

Journal Article•DOI•

Modification of Poisson statistics: modeling defects induced by diffusion

[...]

O. Paz, T.R. Lawson

01 Oct 1977-IEEE Journal of Solid-state Circuits

TL;DR: By visually mapping anodically decorated transistors, the authors found that in highly defective sites, emitter-collector shorts-pipes-tend to collect in clusters of totally defective areas.

...read moreread less

Abstract: This paper examines a model of LSI device failure and the departure from Poisson statistics that it necessitates. By visually mapping anodically decorated transistors, the authors found that in highly defective sites, emitter-collector shorts-pipes-tend to collect in clusters of totally defective areas. Less defective sites have a nearly random distribution of defects, though some limited clustering may still exist. In general, a slightly curved relationship is obtained when the logarithm of actual yield is plotted versus area. However, for a small enough area, such as a single chip, one can make a linear approximation and use it to estimate the fraction of the area that is totally defective, and the defect density. The paper describes an analytical method of modeling device failures, and of projecting yields for areas larger than the data base from which the parameters of the yield equation were estimated.

...read moreread less

Journal Article•DOI•

Clustering large files of documents using the single-link method

[...]

W. Bruce Croft¹•Institutions (1)

University of Cambridge¹

01 Nov 1977-Journal of the Association for Information Science and Technology

TL;DR: A comparison of clustering times with other methods show that large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

...read moreread less

Abstract: A method for clustering large files of documents using a clustering algorithm which takes O(n2) operations (single-link) is proposed. This method is tested on a file of 11,613 documents derived from an operational system. One property of the generated cluster hierarchy (hierarchy connection percentage) is examined and it indicates that the hierarchy is similar to those from other test collections. A comparison of clustering times with other methods showsthat large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

...read moreread less

Journal Article•DOI•

Journal clustering using a bibliographic coupling method

[...]

Henry Small, Michael E. D. Koenig

01 Jan 1977-Information Processing and Management

TL;DR: An algorithm is described which accomplishes journal classification using the single-link clustering technique and a novel application of the method of bibliographic coupling, which consists in the use of two-step bibliographical coupling linkages, rather than the usual one-step linkages.

...read moreread less

Abstract: The classification of journal titles into fields or specialties is a problem of practical importance in library and information science. An algorithm is described which accomplishes such a classification using the single-link clustering technique and a novel application of the method of bibliographic coupling. The novelty consists in the use of two-step bibliographic coupling linkages, rather than the usual one-step linkages. This modification of the similarity measure leads to a marked improvement in the performance of single-link clustering in the formation of field or specialty clusters of journals. Results of an experiment using this algorithm are reported which grouped 890 journals into 168 clusters. This scope is an improvement of nearly an order of magnitude over previous journal clustering experiments. The results are evaluated by comparison with an independently derived manual classification of the same journal set. The generally good agreement indicates that this method of journal clustering will have significant practical utility for journal classification.

...read moreread less

Journal Article•DOI•

Multi-dimensional clustering for data base organizations

[...]

J.H. Liou¹, S.B. Yao¹•Institutions (1)

Purdue University¹

01 Jan 1977-Information Systems

TL;DR: The costs of retrieval, update and storage space for this data base structure are mathematically formulated and an example illustrates that this new database structure can be superior to the classical combination of indexed sequential and file inversion techniques.

...read moreread less

Journal Article•DOI•

A non-parametric clustering scheme for landsat

[...]

Patrenahalli M. Narendra¹, Morris Goldberg²•Institutions (2)

Honeywell¹, University of Ottawa²

01 Jan 1977-Pattern Recognition

TL;DR: A 4-dimensional histogram is computed to reduce the large LANDSAT pixel data to the much smaller number of distinct vectors and their frequency of occurrence in the scene, using the histogram count as a probability density estimate.

...read moreread less

Journal Article•DOI•

Clustering on coloured lattices

[...]

David J. Strauss

01 Mar 1977-Journal of Applied Probability

TL;DR: In this article, the authors extended the work of Strauss (1975) on clustering in the two-colour case and compared it with the more general methods of Besag (1974).

...read moreread less

Abstract: This paper is concerned with nearest-neighbour systems on the coloured lattice (unordered state space). It extends the paper of Strauss (1975) on clustering in the two-colour case. Comparison is made with the more general methods of Besag (1974). Some tests are developed, and illustrated with an example. NEAREST-NEIGHBOUR SYSTEM; MARKOV RANDOM FIELD; CLUSTERING; QUALITATIVE DATA

...read moreread less

Journal Article•DOI•

The clustering of galaxies.

[...]

Edward J. Groth, P. James E. Peebles, Michael Seldner, Raymond M. Soneira

01 Nov 1977-Scientific American

Journal Article•DOI•

Complete-link clustering as a complement to factor analysis: A comparison to factor analysis used alone

[...]

Norman L. Berven¹, Lawrence Hubert¹•Institutions (1)

University of Wisconsin-Madison¹

01 Feb 1977-Journal of Vocational Behavior

TL;DR: In this article, a complete-link hierarchical clustering technique for rehabilitation counselors is discussed. But the authors focus on the role and function of the rehabilitation counselor in the rehabilitation care.

...read moreread less

Journal Article•DOI•

Clustering representations of group overlap

[...]

Phipps Arabie

01 Jan 1977-Journal of Mathematical Sociology

TL;DR: A distinction is made between implicit and explicit group overlap in sociological data, and literature is briefly reviewed in terms of this distinction, and the conclusion is drawn that for implicit overlap, the method of data analysis should use continuous input, while yielding output in a discrete form of subsets.

...read moreread less

Abstract: A distinction is made between implicit and explicit group overlap in sociological data, and literature is briefly reviewed in terms of this distinction. The conclusion is drawn that for implicit overlap, the method of data analysis should use continuous input, while yielding output in a discrete form of (possibly overlapping) subsets. Such a method of clustering (ADCLUS) is presented briefly and is applied to the communication structure of a biomedical area of specialization.

...read moreread less

Cluster compression algorithm: A joint clustering/data compression concept

[...]

E. E. Hilbert¹•Institutions (1)

California Institute of Technology¹

01 Dec 1977

TL;DR: The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described and experimental results are presented to show trade-offs and characteristics of the various implementations.

...read moreread less

Abstract: The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

...read moreread less

Simplified clustering of nonorthogonal grids generated by elliptic partial differential equations

[...]

R. L. Sorenson¹, J. L. Steger•Institutions (1)

Ames Research Center¹

01 Aug 1977

TL;DR: A simple clustering transformation is combined with the Thompson, Thames, and Mastin (TTM) method of generating computational grids to produce controlled mesh spacings to create a hybrid scheme for airfoil problems.

...read moreread less

Abstract: A simple clustering transformation is combined with the Thompson, Thames, and Mastin (TTM) method of generating computational grids to produce controlled mesh spacings. For various practical grids, the resulting hybrid scheme is easier to apply than the inhomogeneous clustering terms included in the TTM method for this purpose. The technique is illustrated in application to airfoil problems, and listings of a FORTRAN computer code for this usage are included.

...read moreread less

Journal Article•DOI•

Evaluation of Some Coefficients for Use in Numerical Taxonomy of Microorganisms

[...]

Brian Austin, Rita R. Colwell

01 Jul 1977-International Journal of Systematic and Evolutionary Microbiology

TL;DR: Taxonomic data, obtained for 141 Enterobacteriaceae strains for which 240 unit characters were recorded, were subjected to numerical taxonomy analysis employing 36 coefficients and it was found that 15 coefficients provided useful discriminating properties.

...read moreread less

Abstract: Taxonomic data, obtained for 141 Enterobacteriaceae strains for which 240 unit characters were recorded, were subjected to numerical taxonomy analysis employing 36 coefficients. Clustering was by unweighted average linkage. From sorted similarity matrices, it was found that 15 coefficients, which included SSM, SH, STD, SJ, SNM, SO, SRT, SSHD, Sin−1(SSM), SP, So, SUN1, SUN4, SD, and SK2, provided useful discriminating properties. The coefficients SH and STD were found to provide results indistinguishable from SSM, and the coefficients SO and So yielded results very similar to those obtained with SSM coefficient.

...read moreread less

Journal Article•DOI•

Digital pattern recognition

[...]

G.M. White¹•Institutions (1)

PARC¹

01 Aug 1977

TL;DR: Digital pattern recognition will lead you to love reading starting from now, because book is the window to open the new world and more books you read can mean also the bore is full.

...read moreread less

Abstract: We may not be able to make you love reading, but digital pattern recognition will lead you to love reading starting from now. Book is the window to open the new world. The world that you want is in the better stage and level. World will always guide you to even the prestige stage of the life. You know, this is some of how reading will give you the kindness. In this case, more books you read more knowledge you know, but it can mean also the bore is full.

...read moreread less

Journal Article•DOI•

Analysis of clustering phenomena based on shock model

[...]

Y. Hayashiuchi¹, Yasuhiro Kitazoe¹, Tamotsu Sekiya¹, Y. Yamamura•Institutions (1)

Osaka University¹

01 Dec 1977-Journal of Nuclear Materials

Dissertation•

Régression typologique et reconnaissance des formes

[...]

Christian Charles¹•Institutions (1)

Paris Dauphine University¹

10 Jun 1977

Journal Article•DOI•

A Novel Clustering Method for Estimating Numbers of Bird Territories

[...]

Philip M. North¹•Institutions (1)

University of Kent¹

01 Jun 1977-Applied statistics

TL;DR: A new method of cluster analysis is described that groups coplanar points associated with different instants of time and an ornithological application illustrates the method.

...read moreread less

Abstract: SUMMARY A new method of cluster analysis is described that groups coplanar points associated with different instants of time. An ornithological application illustrates the method.

...read moreread less

Journal Article•DOI•

Clustering of Helium Atoms at a ½ {110} Edge Dislocation in α—Iron

[...]

F. v.d. Berg, W. v. Heugten, L.M. Caspers, A. v. Veen, J.Th.M. De Hosson - Show less +1 more

01 Oct 1977-Solid State Communications

TL;DR: In this paper, atomistic calculations on a 1 2 {110} edge dislocation show a restricted tendency of clustering of helium atom along this dislocation, and clusters with up to 4 helium atoms have been studied.

...read moreread less

Journal Article•DOI•

A Decision-Directed Clustering Algorithm for Discrete Data

[...]

Andrew K. C. Wong¹, T. S. Liu¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1977-IEEE Transactions on Computers

TL;DR: A decision-directed approach for classifying discrete data through the use of a clustering algorithm based on a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure.

...read moreread less

Abstract: This article presents a decision-directed approach for classifying discrete data. In the clustering algorithm, probable clusters are initiated through the use of a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure. The subsequent iterative reclassification procedures are directed by the estimated distribution of each class. The distribution estimation adopted is modified from the dependence tree procedure. The algorithm performance is then evaluated through the use of simulated and clinical data. Finally, the algorithm is applied to disease categorization and to signs and symptoms extraction for each disease class.

...read moreread less

Journal Article•DOI•

Bose-Einstein Clustering and Critical Hadron Temperatures

[...]

Istvan Montvay, Helmut Satz

01 Jun 1977-Nuovo Cimento Della Societa Italiana Di Fisica A-nuclei Particles and Fields

TL;DR: In this article, a relativistic quantum gas of pions, due to Bose-Einstein statistics and the resulting possibility of condensation, exhibits a structure similar to that obtained in the statistical bootstrap model, with clusters of condensed pions taking the place of fireballs.

...read moreread less

Abstract: It is shown that a relativistic quantum gas of pions, due to Bose-Einstein statistics and the resulting possibility of condensation, exhibits a structure similar to that obtained in the statistical bootstrap model, with clusters of condensed pions taking the place of fireballs. The critical temperatureT* of the BE system is, however, associated with a first-order phase transition from a gas of pions and clusters at low energy density to a system of condensed pions at high energy density. The phase transition and the nature of the two phases are investigated.

...read moreread less

Journal Article•DOI•

Inference models for categorical clustering.

[...]

Lawrence Hubert, Joel R. Levin

01 Jan 1977-Psychological Bulletin