scispace - formally typeset
Search or ask a question
Author

Robert L. Ling

Bio: Robert L. Ling is an academic researcher from University of Chicago. The author has contributed to research in topics: Fuzzy clustering & Set (abstract data type). The author has an hindex of 1, co-authored 1 publications receiving 100 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A well-known set of data consisting of the correlations of 24 psychological tests is used to illustrate the comparison of groupings by four methods of factor analysis and two methods of cluster analysis.
Abstract: A computer generated graphic method, which can be used in conjunction with any hierarchical scheme of cluster analysis, is described and illustrated. The graphic principle used is the representation of the elements of a data matrix of similarities or dissimilarities by computer printed symbols (of character overstrikes) of various shades of darkness, where a dark symbol corresponds to a small dissimilarity. The plots, applied to a data matrix before clustering and to the rearranged matrix after clustering, show at a glance whether clustering brought forth any distinctive clusters. A well-known set of data consisting of the correlations of 24 psychological tests is used to illustrate the comparison of groupings by four methods of factor analysis and two methods of cluster analysis.

102 citations


Cited by
More filters
Journal ArticleDOI
Ali S. Hadi1
TL;DR: This book make understandable the cluster analysis is based notion of starsmodern treatment, which efficiently finds accurate clusters in data and discusses various types of study the user set explicitly but also proposes another.
Abstract: The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase In both the increasingly important and distribution we show how these methods. Our experiments demonstrate that together can deal with most applications technometrics. In an appropriate visualization technique is to these new. The well written and efficiently finds accurate clusters in data including. Of applied value for several preprocessing tasks discontinuity preserving smoothing feature clusters! However the model based notion of domain knowledge from real data repositories in data. Discusses various types of study the user set explicitly but also propose another. This book make understandable the cluster analysis is based notion of starsmodern treatment.

7,423 citations

Book
01 Jan 1984
TL;DR: Cluster analysis is a multivariate procedure for detecting natural groupings in data that resembles discriminant analysis in one respect—the researcher seeks to classify a set of objects into subgroups although neither the number nor members of the subgroups are known.
Abstract: SYSTAT provides a variety of cluster analysis methods on rectangular or symmetric data matrices. Cluster analysis is a multivariate procedure for detecting natural groupings in data. It resembles discriminant analysis in one respect—the researcher seeks to classify a set of objects into subgroups although neither the number nor members of the subgroups are known. CLUSTER provides three procedures for clustering: Hierarchical Clustering, K-Clustering, and Additive Trees. The Hierarchical Clustering procedure comprises hierarchical linkage methods. The K-Clustering procedure splits a set of objects into a selected number of groups by maximizing between-cluster variation and minimizing within-cluster variation. The Additive Trees Clustering procedure produces a Sattath-Tversky additive tree clustering. Hierarchical Clustering clusters cases, variables, or both cases and variables simultaneously; K-Clustering clusters cases only; and Additive Trees clusters a similarity or dissimilarity matrix. Several distance metrics are available with Hierarchical Clustering and K-Clustering including metrics for binary, quantitative and frequency count data. Hierarchical Clustering has ten methods for linking clusters and displays the results as a tree (dendrogram) or a polar dendrogram. When the MATRIX option is used to cluster cases and variables, SYSTAT uses a gray-scale or color spectrum to represent the values. SYSTAT further provides five indices, viz., statistical criteria by which an appropriate number of clusters can be chosen from the Hierarchical Tree. Options for cutting (or pruning) and coloring the hierarchical tree are also provided. In the K-Clustering procedure SYSTAT offers two algorithms, KMEANS and KMEDIANS, for partitioning. Further, SYSTAT provides nine methods for selecting initial seeds for both KMEANS and KMEDIANS. Cluster analysis is a multivariate procedure for detecting groupings in data. The objects in these groups may be: Cases (observations or rows of a rectangular data file). For example, suppose health indicators (numbers of doctors, nurses, hospital beds, life expectancy, etc.) are recorded for countries (cases), then developed nations may form a subgroup or cluster separate from developing countries. Variables (characteristics or columns of the data). For example, suppose causes of death (cancer, cardiovascular, lung disease, diabetes, accidents, etc.) are recorded for each U.S. state (case); the results show that accidents are relatively independent of the illnesses. Cases and variables (individual entries in the data matrix). For example, certain wines are associated with good years of production. Other wines have other years that are better. Clusters may be of two sorts: overlapping or exclusive. Overlapping clusters allow the same object to appear in more than one …

2,533 citations

Journal ArticleDOI
TL;DR: The earliest sources of this cluster heat map are located in late 19th century publications, and a diverse 20th century statistical literature is traced that provided a foundation for this most widely used of all bioinformatics displays.
Abstract: The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling, with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications, and trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays.

718 citations

Journal ArticleDOI
TL;DR: Semantic Clustering is introduced, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary that interpret them as linguistic topics that reveal the intention of the code.
Abstract: Many of the existing approaches in Software Comprehension focus on program structure or external documentation. However, by analyzing formal information the informal semantics contained in the vocabulary of source code are overlooked. To understand software as a whole, we need to enrich software analysis with the developer knowledge hidden in the code naming. This paper proposes the use of information retrieval to exploit linguistic information found in source code, such as identifier names and comments. We introduce Semantic Clustering, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary. We call these groups semantic clusters and we interpret them as linguistic topics that reveal the intention of the code. We compare the topics to each other, identify links between them, provide automatically retrieved labels, and use a visualization to illustrate how they are distributed over the system. Our approach is language independent as it works at the level of identifier names. To validate our approach we applied it on several case studies, two of which we present in this paper. Note: Some of the visualizations presented make heavy use of colors. Please obtain a color copy of the article for better understanding.

505 citations

Journal ArticleDOI
TL;DR: The package seriation is presented which provides an infrastructure for seriation with R and comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation.
Abstract: Seriation, i.e., nding a suitable linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem’s combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides an infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented.

409 citations