scispace - formally typeset
Search or ask a question

Showing papers on "CURE data clustering algorithm published in 1977"


Journal Article
TL;DR: A very fast algorithm which determines the optimal partition of the tree is described, used to determine the best partition of an IMS type tree into data set groups as well as to evaluate the cost of different alternatives.
Abstract: The problem of determining how to store a hierarchic structure in order to minimize the expected access time to it is examined. A paging environment is assumed. The solution space considered is the set of partitions of the hierarchic structure, each partition being stored in heirarchical order. A very fast algorithm which determines the optimal partition of the tree is described. The algorithm has been used to determine the best partition of an IMS type tree into data set groups as well as to evaluate the cost of different alternatives. Actual measurements against the restructured databases have shown the validity of the model used by this method. The measurements have also shown that selecting the wrong choice of clustering instead of the optimal one may substantially increase the expected access time.

99 citations


Journal ArticleDOI
TL;DR: A comparison of clustering times with other methods show that large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.
Abstract: A method for clustering large files of documents using a clustering algorithm which takes O(n2) operations (single-link) is proposed. This method is tested on a file of 11,613 documents derived from an operational system. One property of the generated cluster hierarchy (hierarchy connection percentage) is examined and it indicates that the hierarchy is similar to those from other test collections. A comparison of clustering times with other methods showsthat large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

72 citations


Journal ArticleDOI
TL;DR: A decision-directed approach for classifying discrete data through the use of a clustering algorithm based on a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure.
Abstract: This article presents a decision-directed approach for classifying discrete data. In the clustering algorithm, probable clusters are initiated through the use of a sorting scheme based on the estimated probability distribution of the data and an arbitrary distance measure. The subsequent iterative reclassification procedures are directed by the estimated distribution of each class. The distribution estimation adopted is modified from the dependence tree procedure. The algorithm performance is then evaluated through the use of simulated and clinical data. Finally, the algorithm is applied to disease categorization and to signs and symptoms extraction for each disease class.

18 citations


Book ChapterDOI
01 Jan 1977
TL;DR: This chapter discusses data-dependent clustering techniques, which make visual clustering feasible by reducing the p-dimensional space to two dimensions or changing the multidimensional vector to a human face or some other analogue representation.
Abstract: Publisher Summary This chapter discusses data-dependent clustering techniques. In a large number of disciplines, large amounts of multivariate data are collected, and the structure underlying the data is unknown or at best has received some initial and tentative exploration. It is not that important for an exact number of clusters with just the right elements in each cluster to be determined. It is important that the n multidimensional points in a p-dimensional space are partitioned in a valid, reliable, and parsimonious manner in an efficient way. This suggests that the consumer, the statistician, and computing specialists state some satisfaction with a procedure. There are some data representation and graphical techniques that are sometimes mistaken as clustering procedures. These techniques make visual clustering feasible by reducing the p-dimensional space to two dimensions or changing the multidimensional vector to a human face or some other analogue representation. The regular factor analysis can also be employed to reduce the p-dimensional space to two dimensions when measurement variables rather than elements are being clustered.

5 citations



Journal ArticleDOI
01 Jun 1977

1 citations