Showing papers on "CURE data clustering algorithm published in 1977"

PDF

Open Access

Journal Article•

A Clustering Algorithm for Hierarchical Structures.

[...]

01 Jan 1977-ACM Transactions on Database Systems

TL;DR: A very fast algorithm which determines the optimal partition of the tree is described, used to determine the best partition of an IMS type tree into data set groups as well as to evaluate the cost of different alternatives.

...read moreread less

Abstract: The problem of determining how to store a hierarchic structure in order to minimize the expected access time to it is examined. A paging environment is assumed. The solution space considered is the set of partitions of the hierarchic structure, each partition being stored in heirarchical order. A very fast algorithm which determines the optimal partition of the tree is described. The algorithm has been used to determine the best partition of an IMS type tree into data set groups as well as to evaluate the cost of different alternatives. Actual measurements against the restructured databases have shown the validity of the model used by this method. The measurements have also shown that selecting the wrong choice of clustering instead of the optimal one may substantially increase the expected access time.

...read moreread less

99 citations

Journal Article•DOI•

Clustering large files of documents using the single-link method

[...]

W. Bruce Croft¹•Institutions (1)

University of Cambridge¹

01 Nov 1977-Journal of the Association for Information Science and Technology

TL;DR: A comparison of clustering times with other methods show that large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

...read moreread less

Abstract: A method for clustering large files of documents using a clustering algorithm which takes O(n2) operations (single-link) is proposed. This method is tested on a file of 11,613 documents derived from an operational system. One property of the generated cluster hierarchy (hierarchy connection percentage) is examined and it indicates that the hierarchy is similar to those from other test collections. A comparison of clustering times with other methods showsthat large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

...read moreread less

72 citations

Journal Article•DOI•

A Decision-Directed Clustering Algorithm for Discrete Data

[...]

Andrew K. C. Wong¹, T. S. Liu¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1977-IEEE Transactions on Computers

TL;DR: A decision-directed approach for classifying discrete data through the use of a clustering algorithm based on a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure.

...read moreread less

Abstract: This article presents a decision-directed approach for classifying discrete data. In the clustering algorithm, probable clusters are initiated through the use of a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure. The subsequent iterative reclassification procedures are directed by the estimated distribution of each class. The distribution estimation adopted is modified from the dependence tree procedure. The algorithm performance is then evaluated through the use of simulated and clinical data. Finally, the algorithm is applied to disease categorization and to signs and symptoms extraction for each disease class.

...read moreread less

18 citations

Book Chapter•DOI•

Data Dependent Clustering Techniques

[...]

Herbert Solomon¹•Institutions (1)

Stanford University¹

01 Jan 1977

TL;DR: This chapter discusses data-dependent clustering techniques, which make visual clustering feasible by reducing the p-dimensional space to two dimensions or changing the multidimensional vector to a human face or some other analogue representation.

...read moreread less

Abstract: Publisher Summary This chapter discusses data-dependent clustering techniques. In a large number of disciplines, large amounts of multivariate data are collected, and the structure underlying the data is unknown or at best has received some initial and tentative exploration. It is not that important for an exact number of clusters with just the right elements in each cluster to be determined. It is important that the n multidimensional points in a p-dimensional space are partitioned in a valid, reliable, and parsimonious manner in an efficient way. This suggests that the consumer, the statistician, and computing specialists state some satisfaction with a procedure. There are some data representation and graphical techniques that are sometimes mistaken as clustering procedures. These techniques make visual clustering feasible by reducing the p-dimensional space to two dimensions or changing the multidimensional vector to a human face or some other analogue representation. The regular factor analysis can also be employed to reduce the p-dimensional space to two dimensions when measurement variables rather than elements are being clustered.

...read moreread less

5 citations