Showing papers on "Cluster analysis published in 1996"

PDF

Open Access

Proceedings Article•

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

[...]

Martin Ester¹, Hans-Peter Kriegel¹, Jörg Sander¹, Xiaowei Xu¹•Institutions (1)

Ludwig Maximilian University of Munich¹

02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read moreread less

17,056 citations

Proceedings Article•

A density-based algorithm for discovering clusters in large spatial Databases with Noise

[...]

Martin Ester¹, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read moreread less

14,297 citations

Proceedings Article•DOI•

BIRCH: an efficient data clustering method for very large databases

[...]

Tian Zhang¹, Raghu Ramakrishnan¹, Miron Livny¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 1996

TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.

...read moreread less

Abstract: Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.

...read moreread less

4,090 citations

Journal Article•DOI•

Image retrieval using color and shape

[...]

Anil K. Jain¹, A. Vailaya¹•Institutions (1)

Michigan State University¹

01 Aug 1996-Pattern Recognition

TL;DR: Experimental results on a database of 400 trademark images show that an integrated color- and shape-based feature representation results in 99% of the images being retrieved within the top two positions.

...read moreread less

1,017 citations

Journal Article•DOI•

Engineering applications of the self-organizing map

[...]

Teuvo Kohonen¹, Erkki Oja¹, Olli Simula¹, Ari Visa¹, Jari Kangas¹ - Show less +1 more•Institutions (1)

Helsinki University of Technology¹

01 Oct 1996

TL;DR: The self-organizing map method, which converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display, can be utilized for many tasks: reduction of the amount of training data, speeding up learning nonlinear interpolation and extrapolation, generalization, and effective compression of information for its transmission.

...read moreread less

Abstract: The self-organizing map (SOM) method is a new, powerful software tool for the visualization of high-dimensional data. It converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display. As it thereby compresses information while preserving the most important topological and metric relationships of the primary data elements on the display, it may also be thought to produce some kind of abstractions. The term self-organizing map signifies a class of mappings defined by error-theoretic considerations. In practice they result in certain unsupervised, competitive learning processes, computed by simple-looking SOM algorithms. Many industries have found the SOM-based software tools useful. The most important property of the SOM, orderliness of the input-output mapping, can be utilized for many tasks: reduction of the amount of training data, speeding up learning nonlinear interpolation and extrapolation, generalization, and effective compression of information for its transmission.

...read moreread less

845 citations

Book•

Mathematical Classification and Clustering

[...]

Boris Mirkin

31 Aug 1996

TL;DR: This paper presents a meta-analyses of Hierarchy as a Clustering Structure, a model for hierarchical clustering based on the model developed in [Bouchut-Boyaval, M3].

...read moreread less

Abstract: 1. Classes and Clusters. 2. Geometry of Data. 3. Clustering Algorithms: A Review. 4. Single Cluster Clustering. 5. Partition: Square Data Table. 6. Partition: Rectangular Table. 7. Hierarchy as a Clustering Structure.

...read moreread less

739 citations

The EM algorithm for mixtures of factor analyzers

[...]

Zoubin Ghahramani¹, Geoffrey E. Hinton•Institutions (1)

University of Toronto¹

21 May 1996

TL;DR: This work presents an exact Expectation{Maximization algorithm for determining the parameters of this mixture of factor analyzers which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians.

...read moreread less

Abstract: Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing di erent local factor models in di erent regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact Expectation{Maximization algorithm for tting the parameters of this mixture of factor analyzers.

...read moreread less

705 citations

An Efficient Data Clustering Method for Very Large Databases

[...]

T. Zhang

01 Jan 1996

TL;DR: A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases.

...read moreread less

685 citations

Journal Article•DOI•

Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection

[...]

and Robert D. Brown, Yvonne C. Martin

23 May 1996-Journal of Chemical Information and Computer Sciences

TL;DR: The results suggest that 2D descriptors and hierarchical clustering methods are best at separating biologically active molecules from inactives, a prerequisite for a good compound selection method.

...read moreread less

Abstract: An evaluation of a variety of structure-based clustering methods for use in compound selection is presented. The use of MACCS, Unity and Daylight 2D descriptors; Unity 3D rigid and flexible descriptors and two in-house 3D descriptors based on potential pharmacophore points, are considered. The use of Ward's and group-average hierarchical agglomerative, Guenoche hierarchical divisive, and Jarvis−Patrick nonhierarchical clustering methods are compared. The results suggest that 2D descriptors and hierarchical clustering methods are best at separating biologically active molecules from inactives, a prerequisite for a good compound selection method. In particular, the combination of MACCS descriptors and Ward's clustering was optimal.

...read moreread less

631 citations

Proceedings Article•

Context-specific independence in Bayesian networks

[...]

Craig Boutilier¹, Nir Friedman², Moises Goldszmidt³, Daphne Koller²•Institutions (3)

University of British Columbia¹, Stanford University², SRI International³

01 Aug 1996

TL;DR: This paper proposes a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node, and proposes a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network.

...read moreread less

Abstract: Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution, This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian network structure: independencies that hold only in certain contexts, i.e., given a specific assignment of values to certain variables, In this paper, we propose a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node. We present a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network. We then focus on a particular qualitative representation scheme--tree-structured CPTs-- for capturing CSI. We suggest ways in which this representation can be used to support effective inference algorithms, in particular, we present a structural decomposition of the resulting network which can improve the performance of clustering algorithms, and an alternative algorithm based on outset conditioning.

...read moreread less

614 citations

Book•

Clustering And Classification

[...]

Phipps Arabie, Lawrence Hubert, G De Soete

29 Jan 1996

TL;DR: This book seeks to cover the areas of clustering and related methods of data analysis where major advances are being made in hierarchical clustering, variable selection and weighting, additive trees and other network models, and relevance of neural network models to clustering.

...read moreread less

Abstract: At a moderately advanced level, this book seeks to cover the areas of clustering and related methods of data analysis where major advances are being made. Topics include: hierarchical clustering, variable selection and weighting, additive trees and other network models, relevance of neural network models to clustering, the role of computational complexity in cluster analysis, latent class approaches to cluster analysis, theory and method with applications of a hierarchical classes model in psychology and psychopathology, combinatorial data analysis, clusterwise aggregation of relations, review of the Japanese-language results on clustering, review of the Russian-language results on clustering and multidimensional scaling, practical advances, and significance tests.

...read moreread less

Journal Article•DOI•

Superparamagnetic Clustering of Data

[...]

Marcelo Blatt¹, Shai Wiseman¹, Eytan Domany¹•Institutions (1)

Weizmann Institute of Science¹

29 Apr 1996-Physical Review Letters

TL;DR: This work presents a new approach for clustering, based on the physical properties of an inhomogeneous ferromagnetic model, which outperforms other algorithms for toy problems as well as for real data.

...read moreread less

Abstract: We present a new approach for clustering, based on the physical properties of an inhomogeneous ferromagnetic model. We do not assume any structure of the underlying distribution of the data. A Potts spin is assigned to each data point and short range interactions between neighboring points are introduced. Spin-spin correlations, measured ( by Monte Carlo procedure) in a superparamagnetic regime in which aligned domains appear, serve to partition the data points into clusters. Our method outperforms other algorithms for toy problems as well as for real data. [S0031-9007(96)00104-4] Many natural phenomena can be viewed as optimization processes, and the drive to understand and analyze them yielded powerful mathematical methods. Thus when wishing to solve a hard optimization problem, it may be advantageous to identify a related physical problem, for

...read moreread less

Book Chapter•DOI•

Clustering validation: results and implications for applied analyses

[...]

Glenn W. Milligan

01 Jan 1996

Journal Article•DOI•

An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies

[...]

Lawrence A. Kelley¹, Stephen P. Gardner, Michael J. Sutcliffe¹•Institutions (1)

University of Leicester¹

01 Nov 1996-Protein Engineering

TL;DR: A computer program is developed that automatically, systematically and rapidly clusters an ensemble of structures into a set of conformationally related subfamilies, and selects a representative structure from each cluster.

...read moreread less

Abstract: Unlike structures determined by X-ray crystallography, which are deposited in the Brookhaven Protein Data Bank (Abola et al., 1987) as a single structure, each NMR-derived structure is often deposited as an ensemble containing many structures, each consistent with the restraint set used. However, there is often a need to select a single 'representative' structure, or a 'representative' subset of structures, from such an ensemble. This is useful, for example, in the case of homology modelling or when compiling a relational database of protein structures. It has been shown that cluster analysis, based on overall fold, followed by selection of the structure closest to the centroid of the largest cluster, is likely to identify a structure more representative of the ensemble than the commonly used minimized average structure (Sutcliffe, 1993). Two approaches to the problem of clustering ensembles of NMR-derived structures have been described. One of these (Adzhubei et al., 1995) performs the pairwise superposition of all structures using C a atoms to generate a set of r.m.s. distances. After cluster analysis based on these distances, a user-defined cut-off is required to determine the final membership of clusters and therefore the representative structures. The other approach (Diamond, 1995) uses collective superpositions and rigid-body transformations. Again, the position at which to draw a cut-off based on the particular clustering pattern was not addressed. Whenever fixed values are used for the cut-off in clustering, there is a danger of missing 'true' clusters under the threshold imposed by the rigid cut-off value. Considering the highly diverse nature of NMR-derived ensembles of proteins, it would seem most appropriate to avoid the use of predefined values for determining clusters. In fact, of the 302 ensembles we have studied, the average pairwise r.m.s. distance across an ensemble varied from 0.29 to 11.3 A (mean value 3.0, SD 1.9 A). Here we present an automated method for cut-off determination that avoids the dangers of using fixed values for this purpose. We have developed a computer program that automatically, systematically and rapidly (i) clusters an ensemble of structures into a set of conformationally related subfamilies, and (ii) selects a representative structure from each cluster. The program uses the method of average linkage to define how clusters are built up, followed by the application of a penalty function that seeks to minimize simultaneously the number of clusters

...read moreread less

Proceedings Article•

Clustering Sequences with Hidden Markov Models

[...]

Padhraic Smyth¹•Institutions (1)

University of California, Irvine¹

03 Dec 1996

TL;DR: Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.

...read moreread less

Abstract: This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.

...read moreread less

Journal Article•DOI•

Validity-guided (re)clustering with applications to image segmentation

[...]

A. Bensaid¹, Lawrence O. Hall², James C. Bezdek³, Laurence P. Clarke², Martin L. Silbiger², John A. Arrington², Reed Murtagh² - Show less +3 more•Institutions (3)

Al Akhawayn University¹, University of South Florida², University of West Florida³

01 May 1996-IEEE Transactions on Fuzzy Systems

TL;DR: The validity-guided VGC algorithm uses cluster-validity information to guide a fuzzy (re)clustering process toward better solutions, and VGC's performance approaches that of the (supervised) k-nearest-neighbors algorithm.

...read moreread less

Abstract: When clustering algorithms are applied to image segmentation, the goal is to solve a classification problem. However, these algorithms do not directly optimize classification duality. As a result, they are susceptible to two problems: 1) the criterion they optimize may not be a good estimator of "true" classification quality, and 2) they often admit many (suboptimal) solutions. This paper introduces an algorithm that uses cluster validity to mitigate problems 1 and 2. The validity-guided (re)clustering (VGC) algorithm uses cluster-validity information to guide a fuzzy (re)clustering process toward better solutions. It starts with a partition generated by a soft or fuzzy clustering algorithm. Then it iteratively alters the partition by applying (novel) split-and-merge operations to the clusters. Partition modifications that result in improved partition validity are retained. VGC is tested on both synthetic and real-world data. For magnetic resonance image (MRI) segmentation, evaluations by radiologists show that VGC outperforms the (unsupervised) fuzzy c-means algorithm, and VGC's performance approaches that of the (supervised) k-nearest-neighbors algorithm.

...read moreread less

Proceedings Article•DOI•

Partition based spatial-merge join

[...]

Jignesh M. Patel¹, David J. DeWitt¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 1996

TL;DR: PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation that is especially effective when neither of the inputs to the join have an index on the joining attribute, is described.

...read moreread less

Abstract: This paper describes PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamically redistributed. The PBSM algorithm partitions the inputs into manageable chunks, and joins them using a computational geometry based plane-sweeping technique. This paper also presents a performance study comparing the the traditional indexed nested loops join algorithm, a spatial join algorithm based on joining spatial indices, and the PBSM algorithm. These comparisons are based on complete implementations of these algorithms in Paradise, a database system for handling GIS applications. Using real data sets, the performance study examines the behavior of these spatial join algorithms in a variety of situations, including the cases when both, one, or none of the inputs to the join have an suitable index. The study also examines the effect of clustering the join inputs on the performance of these join algorithms. The performance comparisons demonstrates the feasibility, and applicability of the PBSM join algorithm.

...read moreread less

Patent•

Data analysis system and method

[...]

Colin P. Sheppard

22 May 1996

TL;DR: In this article, a system for analyzing a data file containing a plurality of data records with each data record containing plurality of parameters is provided, which includes an input (40) for receiving the data file and a data processor (32) having at least one of several data processing functions.

...read moreread less

Abstract: A system (10) for analyzing a data file containing a plurality of data records with each data record containing a plurality of parameters is provided. The system (10) includes an input (40) for receiving the data file and a data processor (32) having at least one of several data processing functions. These data processing functions include, for example, a segmentation function (34) for segmenting the data records into a plurality of segments based on the parameters. The data processing functions also include a clustering function (36) for clustering the data records into a plurality of clusters containing data records having similar parameters. A prediction function (38) for predicting expected future results from the parameters in the data records may also be provided with the data processor (32).

...read moreread less

Journal Article•DOI•

A comparison of SOM neural network and hierarchical clustering methods

[...]

Paul Mangiameli¹, Shaw K. Chen¹, David West²•Institutions (2)

University of Rhode Island¹, East Carolina University²

06 Sep 1996-European Journal of Operational Research

TL;DR: A Self Organizing Map (SOM) neural network clustering methodology is used and it is demonstrated that it is superior to the hierarchical clustering methods.

...read moreread less

Clustering in an object-oriented environment

[...]

Anja Struyf, Mia Hubert, Peter J. Rousseeuw

01 Jan 1996

TL;DR: This paper describes the incorporation of seven stand-alone clustering programs into S-PLUS, where they can now be used in a much more flexible way.

...read moreread less

Abstract: This paper describes the incorporation of seven stand-alone clustering programs into S-PLUS, where they can now be used in a much more flexible way. The original Fortran programs carried out new cluster analysis algorithms introduced in the book of Kaufman and Rousseeuw (1990). These clustering methods were designed to be robust and to accept dissimilarity data as well as objects-by-variables data. Moreover, they each provide a graphical display and a quality index reflecting the strength of the clustering. The powerful graphics of S-PLUS made it possible to improve these graphical representations considerably. The integration of the clustering algorithms was performed according to the object-oriented principle supported by S-PLUS. The new functions have a uniform interface, and are compatible with existing S-PLUS functions. We will describe the basic idea and the use of each clustering method, together with its graphical features. Each function is briefly illustrated with an example.

...read moreread less

Proceedings Article•DOI•

HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

[...]

Ron Weiss¹, Bienvenido Vélez¹, Mark A. Sheldon¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1996

TL;DR: Experience with HyPursuit suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies, and is encouraged by preliminary results on clustering based on both document contents and hyperlink structures.

...read moreread less

Abstract: HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search act ivities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit’s abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information 10SS. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf WorldWide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. Experience with our system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies. We are also encouraged by preliminary results on clustering based on both document contents and hyperlink structures.

...read moreread less

Journal Article•DOI•

In search of optimal clusters using genetic algorithms

[...]

C. A. Murthy¹, Nirmalya Chowdhury²•Institutions (2)

Indian Statistical Institute¹, Ramakrishna Mission²

01 Jul 1996-Pattern Recognition Letters

TL;DR: Genetic Algorithms have been used in an attempt to optimize a specified objective function related to a clustering problem and it is shown that the proposed method may improve the final output of K-Means where an improvement is possible.

...read moreread less

Journal Article•DOI•

Patternizing communities by using an artificial neural network

[...]

Tae-Soo Chon¹, Young-Seuk Park¹, Kyong Hi Moon¹, Eui Young Cha¹•Institutions (1)

Pusan National University¹

01 Sep 1996-Ecological Modelling

TL;DR: The Kohonen network, an unsupervised learning algorithm in artificial neural networks, performs self-organizing mapping and reduces dimensions of a complex data set and showed a possibility of producing easily comprehensible low-dimensional maps under the total configuration of community groups in a target ecosystem.

...read moreread less

Book Chapter•DOI•

Multilevel Visualization of Clustered Graphs

[...]

Peter Eades¹, Qing-Wen Feng¹•Institutions (1)

University of Newcastle¹

18 Sep 1996

TL;DR: This paper describes some two dimensional plane drawing algorithms for clustered graphs and shows how to extend these algorithms to three dimensional multilevel drawings, and considers two conventions: straight-line convex drawings and orthogonal rectangular drawings.

...read moreread less

Abstract: Clustered graphs are graphs with recursive clustering structures over the vertices. This type of structure appears in many systems. Examples include CASE tools, management information systems, VLSI design tools, and reverse engineering systems. Existing layout algorithms represent the clustering structure as recursively nested regions in the plane. However, as the structure becomes more and more complex, two dimensional plane representations tend to be insufficient. In this paper, firstly, we describe some two dimensional plane drawing algorithms for clustered graphs; then we show how to extend two dimensional plane drawings to three dimensional multilevel drawings. We consider two conventions: straight-line convex drawings and orthogonal rectangular drawings; and we show some examples.

...read moreread less

Journal Article•DOI•

A self-organizing network for hyperellipsoidal clustering (HEC)

[...]

Jianchang Mao¹, Anil K. Jain²•Institutions (2)

IBM¹, Michigan State University²

01 Jan 1996

TL;DR: Experiments show that the HEC network leads to a significant improvement in the clustering results over the K-means algorithm with Euclidean distance, and indicates that hyperellipsoidal shaped clusters are often encountered in practice.

...read moreread less

Abstract: We propose a self-organizing network for hyperellipsoidal clustering (HEC). It consists of two layers. The first employs a number of principal component analysis subnetworks to estimate the hyperellipsoidal shapes of currently formed clusters. The second performs competitive learning using the cluster shape information from the first. The network performs partitional clustering using the proposed regularized Mahalanobis distance, which was designed to deal with the problems in estimating the Mahalanobis distance when the number of patterns in a cluster is less than or not considerably larger than the dimensionality of the feature space during clustering. This distance also achieves a tradeoff between hyperspherical and hyperellipsoidal cluster shapes so as to prevent the HEC network from producing unusually large or small clusters. The significance level of the Kolmogorov-Smirnov test on the distribution of the Mahalanobis distances of patterns in a cluster to the cluster center under the Gaussian cluster assumption is used as a compactness measure. The HEC network has been tested on a number of artificial data sets and real data sets, We also apply the HEC network to texture segmentation problems. Experiments show that the HEC network leads to a significant improvement in the clustering results over the K-means algorithm with Euclidean distance. Our results on real data sets also indicate that hyperellipsoidal shaped clusters are often encountered in practice.

...read moreread less

Journal Article•DOI•

A lattice conceptual clustering system and its application to browsing retrieval

[...]

Claudio Carpineto¹, Giovanni Romano•Institutions (1)

Fondazione Ugo Bordoni¹

01 Aug 1996-Machine Learning

TL;DR: This paper presents GALOIS, a system that automates and applies the theory of concept lattices, and describes a prototype user interface for browsing through the concept lattice of a document-term relation, possibly enriched with a thesaurus of terms.

...read moreread less

Abstract: The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each update being done in time at most quadratic in the number of objects in the lattice. Also, the algorithm may incorporate background information into the lattice, and through clustering, extend the scope of the theory. The application we present is concerned with information retrieval via browsing, for which we argue that concept lattices may represent major support structures. We describe a prototype user interface for browsing through the concept lattice of a document-term relation, possibly enriched with a thesaurus of terms. An experimental evaluation of the system performed on a medium-sized bibliographic database shows good retrieval performance and a significant improvement after the introduction of background knowledge.

...read moreread less

Proceedings Article•DOI•

Texture features and learning similarity

[...]

Wei-Ying Ma¹, B.S. Manjunath¹•Institutions (1)

University of California, Santa Barbara¹

18 Jun 1996

TL;DR: A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database, and these features compare favorably with other existing texture representations.

...read moreread less

Abstract: This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database. These features compare favorably with other existing texture representations. A simple hybrid neural network algorithm is used to learn the similarity by simple clustering in the texture feature space. With learning similarity the performance of similar pattern retrieval improves significantly. An important aspect of this work is its application to real image data. Texture feature extraction with similarity learning is used to search through large aerial photographs. Feature clustering enables efficient search of the database as our experimental results indicate.

...read moreread less

Journal Article•DOI•

Conditional Fuzzy C-Means

[...]

Witold Pedrycz¹•Institutions (1)

University of Manitoba¹

15 May 1996-Pattern Recognition Letters

TL;DR: A Fuzzy C-Means-based clustering method guided by an auxiliary (conditional) variable is introduced that reveals a structure within a family of patterns by considering their vicinity in a feature space along with the similarity of the values assumed by a certain conditional variable.

...read moreread less

Proceedings Article•DOI•

A novel word clustering algorithm based on latent semantic analysis

[...]

Jerome R. Bellegarda¹, J.W. Butzberger¹, Yen-Lu Chow¹, Noah Coccaro¹, Devang Naik¹ - Show less +1 more•Institutions (1)

Apple Inc.¹

07 May 1996

TL;DR: A new approach is proposed for the clustering of words in a given vocabulary based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis, which leads to a parsimonious vector representation of each word in a suitable vector space.

...read moreread less

Abstract: A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that, the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement to conventional class-based statistical language modeling techniques.

...read moreread less

Journal Article•DOI•

Partially supervised clustering for image segmentation

[...]

A. Bensaid¹, Lawrence O. Hall², James C. Bezdek³, Laurence P. Clarke²•Institutions (3)

Al Akhawayn University¹, University of South Florida², University of West Florida³

01 May 1996-Pattern Recognition

TL;DR: The examples show that the semi-supervised approach provides MRI segmentations that are superior to ordinary fuzzy c-means and to the crisp k-nearest neighbor rule and further, that the new method ameliorates (P1)-(P3).

...read moreread less

Collapse