scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1969"


Journal ArticleDOI
TL;DR: An algorithm for the analysis of multivariate data is presented along with some experimental results that is based upon a point mapping of N L-dimensional vectors from the L-space to a lower-dimensional space such that the inherent data "structure" is approximately preserved.
Abstract: An algorithm for the analysis of multivariate data is presented along with some experimental results. The algorithm is based upon a point mapping of N L-dimensional vectors from the L-space to a lower-dimensional space such that the inherent data "structure" is approximately preserved.

3,460 citations


Journal ArticleDOI
TL;DR: A new method of representation of the reduced data, based on the idea of “fuzzy sets,” is proposed to avoid some of the problems of current clustering procedures and to provide better insight into the structure of the original data.
Abstract: A general formulation of data reduction and clustering processes is proposed. These procedures are regarded as mappings or transformations of the original space onto a “representation” or “code” space subjected to some constraints. Current clustering methods, as well as three other data reduction techniques, are specified within the framework of this formulation. A new method of representation of the reduced data, based on the idea of “fuzzy sets,” is proposed to avoid some of the problems of current clustering procedures and to provide better insight into the structure of the original data.

1,452 citations


Journal ArticleDOI
TL;DR: An algorithm is given which can be programmed to compute a general classification process for six accepted methods and results derived for the method of optimizing an error sum of squares objective function are derived.
Abstract: SUMMARY The recent interest in numerical classification and its application in the biological sciences to the evaluation of taxa has prompted the introduction of a large number of clustering processes, many of which are justified by empirical results. There is clearly a need for the formulation of a theoretical approach to the subject, and this is aided by the comparison and generalisation of existing processes by analytic methods. The work of Lance and Williams in this direction is supplemented here by results derived for the method of optimizing an error sum of squares objective function, and an algorithm is given which can be programmed to compute a general classification process for six accepted methods.

371 citations


Journal ArticleDOI
TL;DR: In this article, the authors reviewed and evaluated research on clustering and subjective organization (SO) in free recall, and two intercorrelation matrices among clustering measures and the number of words recalled are presented.
Abstract: Research on clustering and subjective organization (SO) in free recall is reviewed and evaluated. Various indexes developed to measure clustering and SO are evaluated, and two intercorrelation matrices among clustering measures and the number of words recalled are presented. The existence of a large negative bias in the correlation between the ratio of repetition (RR) measure and recall is demonstrated. Various theoretical issues which have developed from the study of organization in free recall are presented and discussed.

193 citations


Book ChapterDOI
Joseph B. Kruskal1
01 Jan 1969
TL;DR: A major problem in data analysis is to find any structure in a set of multivariate observations, if each observation is represented as a point in multidimensional space, this means finding the structure of a configuration of points in high-dimensional space.
Abstract: Publisher Summary A major problem in data analysis is to find any structure in a set of multivariate observations. If each observation is represented as a point in multidimensional space, this means finding the structure of a configuration of points in high-dimensional space. To find linear relationships among the variables, linear regression, principal components, and factor analysis are often used. One very simple and important kind of structure is clustering. Whenever the points cluster together, the knowledge of this is almost sure to be useful to the man who is interested in the data. Some structure-seeking methods depend on the distances between the points. For example, many cluster-seeking techniques look for collections of points whose interpoint distances are small in some sense. Similarly, the method of parametric mapping starts by calculating the matrix of interpoint distances of the original configuration and subsequently works only with that.

143 citations


Journal ArticleDOI
TL;DR: The key defect in almost all clustering procedures seems to be the absence of a statistical model, and the suggestion is made that the clustering problem be stated as a mixture problem.
Abstract: The need for methods of clustering individuals into homogeneous groups seems clear. One hopes, by applying them to his data, to discover clusterings which may prove to be important. This aim appears straightforward, but the methods which exist do not necessarily satisfy them. The procedures which employ the correlation measure of profile similarity, and those which employ the distance measure are discussed. Technical and logical problems are shown to exist for both measures. The key defect in almost all clustering procedures seems to be the absence of a statistical model. The suggestion is made that the clustering problem be stated as a mixture problem. The need for further work by psychologists and statisticians is pointed out.

125 citations


Journal ArticleDOI
TL;DR: A dynamic programming approach is presented that reduces the amount of redundant transitional calculations implicit in a total enumeration approach to partitioning N entities into M disjoint and nonempty subsets clusters.
Abstract: This paper considers the problem of partitioning N entities into M disjoint and nonempty subsets clusters. Except when both N and N-M are very small, a search for the optimal solution by total enumeration of all clustering alternatives is quite impractical. The paper presents a dynamic programming approach that reduces the amount of redundant transitional calculations implicit in a total enumeration approach. A comparison of the number of calculations required under each approach is presented in Appendix A. Unlike most clustering approaches used in practice, the dynamic programming algorithm will always converge on the best clustering solution. The efficiency of the dynamic programming approach depends upon the rapid-access computer memory available. A numerical example is given in Appendix B.

122 citations


Book
01 Feb 1969

109 citations


Journal ArticleDOI
01 Apr 1969
TL;DR: Remote sensor imaging technology makes it possible to obtain multiple images of extensive land areas simultaneously from the radar, infrared, and visible portions of the electromagnetic spectrum, and it would be useful to automatically obtain from such data land-use maps indicating those areas of similar types of land that are similar as seen through the sensor's eyes.
Abstract: Remote sensor imaging technology makes it possible to obtain multiple images of extensive land areas simultaneously from the radar, infrared, and visible portions of the electromagnetic spectrum. It would be useful to automatically obtain from such data land-use maps indicating those areas of similar types of land, that is, similar as seen through the sensor's eyes. This classification problem is approached from the perspective of the structure inherent in the data. The classification categories or clusters so constructed are the natural homogeneous groupings within the data. There is high similarity within each cluster and high dissimilarity between clusters. Two clustering procedures are presented: the first partitions the image sequence and the second partitions the measurement space. In both, the partition is constructed by finding appropriate center sets and then chaining to them all similar enough points. The resulting clusters are simply connected and not necessarily convex. An example of the measurement space clustering procedure is presented for a set of three multispectral images taken over Phoenix, Ariz.

79 citations


Journal ArticleDOI
TL;DR: This article shows some of the interrelationships among various measures that have been suggested for summarizing pairwise proximities and to demonstrate that clustering results are not invariant over these alternative measures.
Abstract: Clustering techniques and related approaches to numerical classification are beginning to receive a fair amount of attention by marketing researchers. Three articles on the subject [2, 9, 11] have already appeared in JMR, and a variety of marketing studies using clustering procedures have been reported in working papers. One of the principal problems in applying cluster analysis is the choice of what proximity measure to use in summarizing the similarity (or dissimilarity) of profile pairs. Morrison [10] discussed some problems associated with using a Euclidean distance measure in the space of original variables, a point also made by Overall [12] in the psychological literature. This article shows some of the interrelationships among various measures that have been suggested for summarizing pairwise proximities and to demonstrate that clustering results are not invariant over these alternative measures. Despite the arguments for using one measure in preference to another, we believe that no "dominant" proximity measure currently exists, given such high variation in the researcher's objectives [5]. The ten proximity measures used in this comparative study follow:

64 citations


Journal ArticleDOI
George Nagy1
TL;DR: A modified version of the Isodata or K-means clustering algorithm is applied to a set of patterns originally proposed by Block, Nilsson, and Duda, and to another artificial alphabet.
Abstract: The objects and methods of automatic feature extraction on binary patterns are briefly reviewed. An intuitive interpretation for geometric features is suggested whereby such a feature is conceived of as a cluster of component vectors in pattern space. A modified version of the Isodata or K-means clustering algorithm is applied to a set of patterns originally proposed by Block, Nilsson, and Duda, and to another artificial alphabet. Results are given in terms of a figure-of-merit which measures the deviation between the original patterns and the patterns reconstructed from the automatically derived feature set.

Journal ArticleDOI
TL;DR: In this paper, the formal description of penetrant clustering in polymers given by Zimm and Lundberg is applied to systems obeying Flory-Huggins thermodynamics.
Abstract: The formal description of penetrant clustering in polymers given by Zimm and Lundberg is applied to systems obeying Flory-Huggins thermodynamics. Several equivalent procedures for evaluation of the cluster function are suggested including circumstances in which the interaction parameter χ1 varies with composition. An analysis of the cluster function for penetrant and the companion expression for clustering of the polymeric solute is presented for some familiar, special cases of Flory-Huggins' behavior. The utility of the analyses is illustrated by application of these analytical techniques to data taken from the literature.

Journal Article
TL;DR: The analyses revealed no significant or suggestive indication of time-space clustering, whether applied to deaths at age 0–14, 0–5, or 2–9 years, and the absence of clustering is interpreted as suggesting that, if leukemia is due to an infectious agent, then it is one to which humans are highly resistant.
Abstract: Dates and residence data for 298 Los Angeles childhood leukemia deaths during the period 1960–64 were analyzed for time-space clustering by 2 approaches. By the Knox approach, 2 cases are considered to be close neighbors if they are within both a specified spatial and a specified temporal distance of each other. A wide spectrum of critical distances, both for time and space, was used in identifying close neighbors. By the Mantel approach, a correlation- or regression-type analysis is made to see if the reciprocal of the spatial separation between any 2 cases is related to the reciprocal of the absolute temporal separation. The latter approach requires increasing each separation by a suitable additive constant prior to taking reciprocals, and for this purpose a wide spectrum of additive constants was employed relative to each type of separation. Significance tests for both approaches were made by the permutational procedure described by Mantel. This procedure permits one to obtain the expectation and variance of the statistics yielded by either the Knox or Mantel approach under all possible random pairings of the reported dates of death and places of residence. The analyses revealed no significant or suggestive indication of time-space clustering, whether applied to deaths at age 0–14, 0–5, or 2–9 years. The absence of clustering is interpreted as suggesting that, if leukemia is due to an infectious agent, then it is one to which humans are highly resistant. Some interesting insights were obtained from the behavior of the statistical procedures employed.

Journal ArticleDOI
TL;DR: The authors dealt with rhyme as a determinant of clustering in free recall and showed significan clustering as well as high variance attribu table to both Ss and word pairs.
Abstract: This study dealt with rhyme as a determinant of clustering in free recall. The stimuli comprised 12 rhyming pairs of words. The results from 30 Ss showed significan clustering as well as high variance attribu table to both Ss and word pairs.

Journal ArticleDOI
TL;DR: While concentrated Cu-Ni alloys show local correlations characteristic of clustering, it is re-emphasized, in light of the recent claim of Kidron, that normal solution heat treatment of them (which includes nearly any cooling procedure) yields a small deviation from randomness of the atomic arrangements and no Guinier-zone formation or phase separation.
Abstract: While concentrated Cu-Ni alloys show local correlations characteristic of clustering, it is re-emphasized, in light of the recent claim of Kidron, that normal solution heat treatment of them (which includes nearly any cooling procedure) yields a small deviation from randomness of the atomic arrangements and no Guinier-zone formation or phase separation.


Journal ArticleDOI
TL;DR: It is shown to be effective to select a small number of “representative” objects first and to apply the clustering program on them, and to place them in the generated classes by the pattern recognition technique.

Journal ArticleDOI
TL;DR: The statistical analysis on some 540 cases of acute leukaemia occurring in Maine, Massachusetts, New Hampshire, and Vermont finds evidence for clustering of cases in place and time.
Abstract: The problem of discovering evidence for clustering of cases in place and time has only recently been investigated by epidemiologists and statisticians. Acute leukaemia has sometimes been described as a disease which does occur in clusters (Heath and Hasterlik, 1963; Kellett, 1937; Knox, 1964; Mainwaring, 1966; Meighan and Knox, 1965). Conversely, it has also been reported that the disease does not occur in clusters (Ager, Schuman, Wallace, Rosenfield, and Gullen, 1965; Barton, David, and Merrington, 1965; Clemmesen, Busk and Nielsen, 1952; Ederer, Myers and Mantel, 1964; Ederer, Myers, Eisenberg, and Campbell, 1965; Lock and Merrington, 1967; Lundin, Fraumeni, Lloyd, and Smith, 1966; Stark and Mantel 1967). We report here the statistical analysis on some 540 cases of acute leukaemia occurring in Maine, Massachusetts, New Hampshire, and Vermont. The records of acute leukaemia in Connecticut were not available to us but they have been described elsewhere (Ederer et al., 1965).


Journal ArticleDOI
TL;DR: A sequential algorithm for designing piecewise linear classification functions without a priori knowledge of pattern class distributions is described that combines adaptive error correcting linear classifier design procedures and clustering techniques under control of a performance criterion.
Abstract: A sequential algorithm for designing piecewise linear classification functions without a priori knowledge of pattern class distributions is described. The algorithm combines adaptive error correcting linear classifier design procedures and clustering techniques under control of a performance criterion. The classification function structure is constrained to minimize design calculations and increase recognition through-put for many classification problems. Examples from the literature are used to evaluate this approach relative to other classification algorithms.

Journal Article
TL;DR: A method Collaborative Tagging (CT) with incremental clustering and Trust is proposed which enhances the recommendation quality by removing the issues of scalability with the help of Incremental Clustering and sparsity and cold start user or item problems are resolved with theHelp of Trust.
Abstract: Due to huge data in web sites, recommending users for every product is impossible. For this problem Recommender Systems (RS) are introduced. RS is categorized into Content-Based (CB), collaborative Filtering (CF) and Hybrid RS. Based on these techniques recommendations are done to user. In this, CF is the recent technique used in RS in which tagging feature also provided. Three main issues occur in RS are scalability problem which occurs when there is a huge data, sparsity problem occurs when rating data is missing and cols start user or item problem occurs when new user or new item enters in the system. To avoid these issues here we have proposed Incremental clustering and Trust in Collaborative Tagging. Here we have proposed a method Collaborative Tagging (CT) with Incremental Clustering and Trust which enhances the recommendation quality by removing the issues of scalability with the help of Incremental Clustering and sparsity and cold start user or item problems are resolved with the help of Trust. Here we have compared the results of Collaborative tagging with Incremental Clustering and Trust (CFT-EDIC-TS) with the baseline approaches of CT with Cosine similarity (CFT-CS), CT with Euclidian Distance and Incremental Clustering (CFT-EDIC) and CT with Trust (CFT-TS). Here we have compare the proposed approach with the baseline approaches and the metrics are used MAE, prediction percentage, Precision and Recall. Based on these metrics for every split CFT-EDICTS shown best results as compared to other baseline approaches.

Journal ArticleDOI
TL;DR: In this paper, a list of 16 words was exposed for one study trial in a modified free-recall experiment, where critical words in the list were paired randomly and presented three times each.
Abstract: A list of 16 words was exposed for one study trial in a modified free-recall experiment. Critical words in the list were paired randomly and presented three times each. For one group (E) the two members of common pairs always appeared in successive positions during the study trial.. For the other group (C) members of the predetermined random pairs were never presented successively. Clustering scores in recall based on the predetermined random pairings were significantly higher in Group E than in Group C. It was concluded that adjacency relations during the study trial provided a sufficient basis for clustering during recall.

Journal Article
TL;DR: The stub based clustering approach reduces computation time over a traditional clustering and also increases its efficiency.
Abstract: Many issues concerned with clustering process are due to large datasets involves. In clustering computation become expensive when there are large data sets involved and work efficiently when there is limited number of cluster with relatively small data set. This paper will present a new technique for clustering for large datasets. That will work efficiently equally with large data set as well as with small data sets. The main idea behind this method is to divide the whole process in two steps. The first step uses a cheap approximate distance measure that divide the data into overlapped subsets we call it stubs. Then in second step clustering is performed for measuring exact distances only between points that occur in common stubs. The stub based clustering approach reduces computation time over a traditional clustering and also increases its efficiency.

Journal ArticleDOI
TL;DR: The authors observed reliable category clustering in free recall of words which rhymed but which did not elicit one another as free associates, and found that intrusions were phonemically similar to list items.
Abstract: Reliable category clustering was observed in the free recall of words which rhymed but which did not elicit one another as free associates. Intrusions were phonemically similar to list items.



Journal ArticleDOI
TL;DR: The paper establishes the mathematical model of fractal clustering, and uses the fractal dimension to describe and depict Fractal Company, and conducts customer segmentation by using fractal theory and clustering analysis technology, in order to dig out the most valuable customers and potential customers.
Abstract: This paper puts forward the concept of fractal company is in the process of online shopping. According to the characteristics of the network shopping, the paper establishes the mathematical model of fractal clustering, and uses the fractal dimension to describe and depict Fractal Company. Online shopping is actually the management of the entire supply chain. Based on similar structure of fractal supply chain, the article carries out mathematical model of the fractal supply chain by using the dissipative structure theory and entropy theory. At last the paper conducts customer segmentation by using the fractal theory and clustering analysis technology, in order to dig out the most valuable customers and potential customers


01 Mar 1969
TL;DR: This report is concerned with the problem of classifying objects into clusters in such a way that objects within the same cluster are alike and objects in different clusters are relatively dissimilar.
Abstract: : This report is concerned with the problem of classifying objects into clusters in such a way that objects within the same cluster are alike and objects in different clusters are relatively dissimilar. A distance or measure of similarity is required in order to measure the degree of likeness of similarity existing between any pair of objects. A clustering criterion or measure of the goodness of any given allocation is developed from basic postulates which attempt to quantify the notions of within group homogeneity and between group heterogeneity. Basic mathematical and experimental properties of the clustering criterion are demonstrated and illustrated. The problem is then imbedded into a mathematical programming formulation which permits the theoretical development of a computational algorithm which converges to an optimal solution for problems of a limited size. With a significant contribution from the algorithm a heuristic method is developed to facilitate the use of the technique for larger problems with a great increase in speed and a very small reduction in accuracy. Several examples are presented to illustrate the properties of the two computational methods developed. Finally, the work presented is compared with other major contributions in this field and suggestions for further research are given. The appendices include three of the major computer programs developed, together with an outline of the problem of grouping objects to minimize an interaction cost, which could be considered a special case of the clustering problem. (Author)

Journal ArticleDOI
TL;DR: In this paper, a multidimensional scaling analysis, parametric mapping, was applied to the semantic differential data and the results showed that the ratings on the semantic difference represent the perception of the voice quality and are not de...
Abstract: Additional data have been obtained on the perceptual classification of voices by means of the voice classification by hierarchical clustering method [J. Acoust. Soc. Amer. 40, 1282 (A) (1966)]. The voice sample has been extended to 100 male and 100 female voices. The method appears to yield stable results since the gross structure of the clustering remains invariant. The semantic differential data were also submitted to a multidimensional scaling analysis, parametric mapping, which provided an excellent three‐dimensional fit. To investigate any dependence of the classification method on the speaker's language, 20 males whose native language was other than English were recorded in their native language and in English. The clustering analysis showed closely comparable results for the voices speaking either English or the foreign language, with 13 languages represented. This provides additional evidence that the ratings on the semantic differential represent the perception of the voice quality and are not de...