Showing papers on "Cluster analysis published in 1992"

PDF

Open Access

Journal Article•DOI•

Scatter/Gather: a cluster-based approach to browsing large document collections

[...]

Douglass R. Cutting¹, David R. Karger¹, Jan O. Pedersen¹, John W Tukey¹•Institutions (1)

01 Jun 1992

TL;DR: A document browsing technique that employs docum-ent clustering as its primary operation is presented and a fast (linear time) clustering algorithm is presented that provides a powerful new access paradigm.

...read moreread less

Abstract: Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.

...read moreread less

1,596 citations

Journal Article•DOI•

A Classification EM algorithm for clustering and two stochastic versions

[...]

Gilles Celeux, Gérard Govaert

01 Oct 1992-Computational Statistics & Data Analysis

TL;DR: Two stochastic algorithms are derived from this general Classification EM algorithm, incorporating random perturbations, to reduce the initial-position dependence of the classical optimization clustering algorithms.

...read moreread less

810 citations

Proceedings Article•DOI•

An evaluation of phrasal and clustered representations on a text categorization task

[...]

David D. Lewis¹•Institutions (1)

University of Chicago¹

01 Jun 1992

TL;DR: It is shown that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing.

...read moreread less

Abstract: Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.

...read moreread less

667 citations

Proceedings Article•DOI•

Feature selection and feature extraction for text categorization

[...]

David D. Lewis¹•Institutions (1)

University of Chicago¹

23 Feb 1992

TL;DR: The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets and the optimal feature set size for word-based indexing was found to be surprisingly low despite the large training sets.

...read moreread less

Abstract: The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization performance was achieved using a statistical classifier and a proportional assignment strategy. The optimal feature set size for word-based indexing was found to be surprisingly low (10 to 15 features) despite the large training sets. The extraction of new text features by syntactic analysis and feature clustering was investigated on the Reuters data set. Syntactic indexing phrases, clusters of these phrases, and clusters of words were all found to provide less effective representations than individual words.

...read moreread less

585 citations

Journal Article•DOI•

An adaptive clustering algorithm for image segmentation

[...]

Thrasyvoulos N. Pappas¹•Institutions (1)

Bell Labs¹

01 Apr 1992-IEEE Transactions on Signal Processing

TL;DR: The algorithm that is presented is a generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image to preserve the most significant features of the originals, while removing unimportant details.

...read moreread less

Abstract: The problem of segmenting images of objects with smooth surfaces is considered. The algorithm that is presented is a generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image. Spatial constraints are included by the use of a Gibbs random field model. Local intensity variations are accounted for in an iterative procedure involving averaging over a sliding window whose size decreases as the algorithm progresses. Results with an 8-neighbor Gibbs random field model applied to pictures of industrial objects, buildings, aerial photographs, optical characters, and faces show that the algorithm performs better than the K-means algorithm and its nonadaptive extensions that incorporate spatial constraints by the use of Gibbs random fields. A hierarchical implementation is also presented that results in better performance and faster speed of execution. The segmented images are caricatures of the originals which preserve the most significant features, while removing unimportant details. They can be used in image recognition and as crude representations of the image. >

...read moreread less

575 citations

Journal Article•DOI•

Clustering on surfaces

[...]

Martin Zinke-Allmang¹, Leonard C. Feldman², Marcia H. Grabow²•Institutions (2)

University of Western Ontario¹, Bell Labs²

01 Dec 1992-Surface Science Reports

TL;DR: In this paper, the authors summarize the current theoretical and experimental understanding of clustering phenomena on surfaces, with an emphasis on dynamical properties, including surface diffusion coefficients and adatom binding energies.

...read moreread less

559 citations

Journal Article•DOI•

Classification of multispectral remote sensing data using a back-propagation neural network

[...]

P.D. Heermann, N. Khazenie

01 Jan 1992-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The suitability of a back-propagation neural network for classification of multispectral image data is explored and a methodology is developed for selection of both training parameters and data sets for the training phase.

...read moreread less

Abstract: The suitability of a back-propagation neural network for classification of multispectral image data is explored. A methodology is developed for selection of both training parameters and data sets for the training phase. A new technique is also developed to accelerate the learning phase. To benchmark the network, the results are compared to those obtained using three other algorithms: a statistical contextual technique, a supervised piecewise linear classifier, and an unsupervised multispectral clustering algorithm. All three techniques were applied to simulated and real satellite imagery. Results from the classification of both Monte Carlo simulation and real imagery are summarized. >

...read moreread less

414 citations

Journal Article•DOI•

A Comparison of Clustering Heuristics for Scheduling Directed Acyclic Graphs on Multiprocessors

[...]

Apostolos Gerasoulis¹, Tao Yang¹•Institutions (1)

Rutgers University¹

01 Dec 1992-Journal of Parallel and Distributed Computing

TL;DR: This paper identifies important characteristics of clustering algorithms and proposes a general framework for analyzing and evaluating such algorithms and presents an analytic performance comparison of Dominant Sequence Clustering (DSC), explaining why DSC is superior to other algorithms.

...read moreread less

393 citations

Journal Article•DOI•

Recursive hybrid algorithm for non-linear system identification using radial basis function networks

[...]

Sheng Chen, Stephen A. Billings, Peter Grant

01 May 1992-International Journal of Control

TL;DR: A novel approach is adopted which employs a hybrid clustering and least squares algorithm which significantly enhances the real-time or adaptive capability of radial basis function models.

...read moreread less

Abstract: Recursive identification of non-linear systems is investigated using radial basis function networks. A novel approach is adopted which employs a hybrid clustering and least squares algorithm. The recursive clustering algorithm adjusts the centres of the radial basis function network while the recursive least squares algorithm estimates the connection weights of the network. Because these two recursive learning rules are both linear, rapid convergence is guaranteed and this hybrid algorithm significantly enhances the real-time or adaptive capability of radial basis function models. The application to simulated real data are included to demonstrate the effectiveness of this hybrid approach.

...read moreread less

359 citations

Journal Article•

Clustering algorithms

[...]

Edie Rasmussen

01 Jun 1992-Information Retrieval

322 citations

Journal Article•DOI•

Adaptive fuzzy systems for backing up a truck-and-trailer

[...]

Seong G. Kong¹, Bart Kosko¹•Institutions (1)

University of Southern California¹

01 Mar 1992-IEEE Transactions on Neural Networks

TL;DR: The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed, and they also performed well when the key FAM equilibration rule was replaced with destructive, or ;sabotage', rules.

...read moreread less

Abstract: Fuzzy control systems and neural-network control systems for backing up a simulated truck, and truck-and-trailer, to a loading dock in a parking lot are presented. The supervised backpropagation learning algorithm trained the neural network systems. The robustness of the neural systems was tested by removing random subsets of training data in learning sequences. The neural systems performed well but required extensive computation for training. The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed. They also performed well when the key FAM equilibration rule was replaced with destructive, or 'sabotage', rules. Unsupervised differential competitive learning (DCL) and product-space clustering adaptively generated FAM rules from training data. The original fuzzy control systems and neural control systems generated trajectory data. The DCL system rapidly recovered the underlying FAM rules. Product-space clustering converted the neural truck systems into structured sets of FAM rules that approximated the neural system's behavior. >

...read moreread less

Proceedings Article•DOI•

e-approximations with minimum packing constraint violation (extended abstract)

[...]

Jyh-Han Lin¹, Jeffrey Scott Vitter¹•Institutions (1)

Brown University¹

01 Jul 1992

TL;DR: Efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem are presented.

...read moreread less

Abstract: We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint violation, the e-approximation problem for many problems of this type is itself NP-hard. Our methods provide polynomial-time e-approximations while attempting to minimize the packing constraint violation.Our methods lead to the first known approximation algorithms with provable performance guarantees for the s-median problem, the tree prunning problem, and the generalized assignment problem. These important problems have numerous applications to data compression, vector quantization, memory-based learning, computer graphics, image processing, clustering, regression, network location, scheduling, and communication. We provide evidence via reductions that our approximation algorithms are nearly optimal in terms of the packing constraint violation. We also discuss some recent applications of our techniques to scheduling problems.

...read moreread less

Journal Article•DOI•

Symbolic clustering using a new similarity measure

[...]

K.C. Gowda, Edwin Diday

01 Mar 1992

TL;DR: A hierarchical, agglomerative, symbolic clustering methodology based on a similarity measure that takes into consideration the position, span, and content of symbolic objects is proposed and is capable of discerning clusters in data sets made up of numeric as well as symbolic objects consisting of different types and combinations of qualitative and quantitative feature values.

...read moreread less

Abstract: A hierarchical, agglomerative, symbolic clustering methodology based on a similarity measure that takes into consideration the position, span, and content of symbolic objects is proposed. The similarity measure used is of a new type in the sense that it is not just another aspect of dissimilarity. The clustering methodology forms composite symbolic objects using a Cartesian join operator when two symbolic objects are merged. The maximum and minimum similarity values at various merging levels permit the determination of the number of clusters in the data set. The composite symbolic objects representing different clusters give a description of the resulting classes and lead to knowledge acquisition. The algorithm is capable of discerning clusters in data sets made up of numeric as well as symbolic objects consisting of different types and combinations of qualitative and quantitative feature values. In particular, the algorithm is applied to fat-oil and microcomputer data. >

...read moreread less

Approximations with Minimum Packing Constraint Violation

[...]

Jyh Lin, Jeffrey Scott Vitter

01 Jun 1992

...read moreread less

Abstract: We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint violation, the epsilon-approximation problem for many problems of this type is itself NP-hard. Our methods provide polynomial-time epsilon-approximations while attempting to minimize the packing constraint violation. Our methods lead to the first known approximation algorithms with provable performance guarantees for the s-median problem, the tree pruning problem, and the generalized assignment problem. These important problems have numerous applications to data compression, vector quantization, memory-based learning, computer graphics, image processing, clustering, regression, network location, scheduling, protocol testing, and communication. We provide evidence via reductions that our approximation algorithms are nearly optimal in terms of the packing constraint violation. We also discuss some recent applications of our techniques to scheduling problems.

...read moreread less

Proceedings Article•DOI•

Fuzzy Kohonen clustering networks

[...]

James C. Bezdek¹, E.C.-K. Tsao¹, Nikhil R. Pal¹•Institutions (1)

University of Florida¹

08 Mar 1992

TL;DR: A fuzzy Kohonen clustering network which integrates the fuzzy c-means (FCM) model into the learning rate and updating strategies of the Kohonen network is proposed, and it is proved that the proposed scheme is equivalent to the c-Means algorithms.

...read moreread less

Abstract: The authors propose a fuzzy Kohonen clustering network which integrates the fuzzy c-means (FCM) model into the learning rate and updating strategies of the Kohonen network. This yields an optimization problem related to FCM, and the numerical results show improved convergence as well as reduced labeling errors. It is proved that the proposed scheme is equivalent to the c-means algorithms. The new method can be viewed as a Kohonen type of FCM, but it is self-organizing, since the size of the update neighborhood and the learning rate in the competitive layer are automatically adjusted during learning. Anderson's IRIS data were used to illustrate this method. The results are compared with the standard Kohonen approach. >

...read moreread less

Journal Article•DOI•

Adaptive fuzzy c-shells clustering and detection of ellipses

[...]

Rajesh N. Dave¹, Kurra Bhaswan¹•Institutions (1)

New Jersey Institute of Technology¹

01 Sep 1992-IEEE Transactions on Neural Networks

TL;DR: Several generalizations of the fuzzy c-shells (FCS) algorithm are presented for characterizing and detecting clusters that are hyperellipsoidal shells and show that the AFCS algorithm requires less memory than the HT-based methods, and it is at least an order of magnitude faster than theHT approach.

...read moreread less

Abstract: Several generalizations of the fuzzy c-shells (FCS) algorithm are presented for characterizing and detecting clusters that are hyperellipsoidal shells. An earlier generalization, the adaptive fuzzy c-shells (AFCS) algorithm, is examined in detail and is found to have global convergence problems when the shapes to be detected are partial. New formulations are considered wherein the norm inducing matrix in the distance metric is unconstrained in contrast to the AFCS algorithm. The resulting algorithm, called the AFCS-U algorithm, performs better for partial shapes. Another formulation based on the second-order quadrics equation is considered. These algorithms can detect ellipses and circles in 2D data. They are compared with the Hough transform (HT)-based methods for ellipse detection. Existing HT-based methods for ellipse detection are evaluated, and a multistage method incorporating the good features of all the methods is used for comparison. Numerical examples of real image data show that the AFCS algorithm requires less memory than the HT-based methods, and it is at least an order of magnitude faster than the HT approach. >

...read moreread less

Proceedings Article•DOI•

Enhancements to probabilistic neural networks

[...]

D.F. Specht

07 Jun 1992

TL;DR: The clustering technique described provides a basis for automatic feature selection and dimensionality reduction and Adaptation of kernel shape provides a tradeoff of increased accuracy for increased complexity and training time.

...read moreread less

Abstract: Probabilistic neural networks (PNNs) learn quickly from examples in one pass and asymptotically achieve the Bayes-optimal decision boundaries. The major disadvantage of a PNN stems from the fact that it requires one node or neuron for each training pattern. Various clustering techniques have been proposed to reduce this requirement to one node per cluster center. The correct choice of clustering technique will depend on the data distribution, data rate, and hardware implementation. Adaptation of kernel shape provides a tradeoff of increased accuracy for increased complexity and training time. The technique described also provides a basis for automatic feature selection and dimensionality reduction. >

...read moreread less

Journal Article•DOI•

An optimization-based heuristic for vehicle routing and scheduling with soft time window constraints

[...]

Yiannis A. Koskosidis, Warren B. Powell, Marius M. Solomon

01 May 1992-Transportation Science

TL;DR: A new formulation based on the treatment of the time window constraints as soft constraints that can be violated at a cost and heuristically decompose the problem into an assignment/clustering component and a series of routing and scheduling components is presented.

...read moreread less

Abstract: The Vehicle Routing and Scheduling Problem with Time Window constraints is formulated as a mixed integer program, and optimization-based heuristics which extend the cluster-first, route-second algorithm of Fisher and Jaikumar are developed for its solution. We present a new formulation based on the treatment of the time window constraints as soft constraints that can be violated at a cost and we heuristically decompose the problem into an assignment/clustering component and a series of routing and scheduling components. Numerical results based on randomly generated and benchmark problem sets indicate that the algorithm compares favorably to state-of-the-art local insertion and improvement heuristics.

...read moreread less

Book Chapter•DOI•

Clustering Properties of Hierarchical Self-Organizing Maps

[...]

Jouko Lampinen¹, Erkki Oja¹•Institutions (1)

Lappeenranta University of Technology¹

01 Jan 1992-Journal of Mathematical Imaging and Vision

TL;DR: In experiments with both artificial and real data it is demonstrated that the multilayer SOM forms clusters that match better to the desired classes than do direct SOM's, classical k-means, or Isodata algorithms.

...read moreread less

Abstract: A multilayer hierarchical self-organizing map (HSOM) is discussed as an unsupervised clustering method. The HSOM is shown to form arbitrarily complex clusters, in analogy with multilayer feedforward networks. In addition, the HSOM provides a natural measure for the distance of a point from a cluster that weighs all the points belonging to the cluster appropriately. In experiments with both artificial and real data it is demonstrated that the multilayer SOM forms clusters that match better to the desired classes than do direct SOM's, classical k-means, or Isodata algorithms.

...read moreread less

Journal Article•DOI•

Clustering of chemical structures on the basis of two-dimensional similarity measures

[...]

John M. Barnard, Geoffrey M. Downs

01 Nov 1992-Journal of Chemical Information and Computer Sciences

Journal Article•DOI•

The fuzzy c spherical shells algorithm: A new approach

[...]

Raghu Krishnapuram¹, Olfa Nasraoui¹, Hichem Frigui¹•Institutions (1)

University of Missouri¹

01 Sep 1992-IEEE Transactions on Neural Networks

TL;DR: A new approach to the fuzzy c spherical shells algorithm is presented, which uses a cluster validity measure to identify good clusters, merges all compatible clusters, and eliminates spurious clusters to achieve the final results.

...read moreread less

Abstract: The fuzzy c spherical shells (FCSS) algorithm is specially designed to search for clusters that can be described by circular arcs or, generally, by shells of hyperspheres. A new approach to the FCSS algorithm is presented. This algorithm is computationally and implementationally simpler than other clustering algorithms that have been suggested for this purpose. An unsupervised algorithm which automatically finds the optimum number of clusters is not known. It uses a cluster validity measure to identify good clusters, merges all compatible clusters, and eliminates spurious clusters to achieve the final results. Experimental results on several data sets are presented. >

...read moreread less

Journal Article•DOI•

The k ⊥ -clustering algorithm for jets in deep inelastic scattering and hadron collisions

[...]

Stefano Catani¹, Yu.L. Dokshitzer², Yu.L. Dokshitzer³, Bryan R. Webber¹•Institutions (3)

CERN¹, Lund University², Petersburg Nuclear Physics Institute³

09 Jul 1992-Physics Letters B

TL;DR: In this paper, a new QCD-motivated clustering algorithm was proposed to define jets in lepton-hadron and hadronhadron collisions, which combines the k ⊥ algorithm, proposed earlier for e + e − annihilation, with a pre-clustering procedure that ensures the universal factorization of initial state collinear singularities.

...read moreread less

Journal Article•DOI•

Machine-component cell formation in group technology : a neural network approach

[...]

Shashidhar Kaparthi¹, Nallan C. Suresh¹•Institutions (1)

State University of New York System¹

01 Jun 1992-International Journal of Production Research

TL;DR: In this paper, a neural network clustering method for the part-machine grouping problem in group technology is presented, which utilizes binary-valued inputs and it can be trained without supervision.

...read moreread less

Abstract: SUMMARY This paper presents a neural network clustering method for the part-machine grouping problem in group technology. Among the several neural networks, a Carpenter-Grossberg network is selected due to the fact that this clustering method utilizes binary-valued inputs and it can be trained without supervision. It is shown that this adaptive leader algorithm offers the capability of handling large, industry-size data sets due to the computational efficiency. The algorithm was tested on three data sets from prior literature, and solutions obtained were found to result in block diagonal forms. Some solutions were also found to be identical to solutions presented by others. Experiments on larger data sets, involving 10000 parts by 100 machine types, revealed that the method results in the identification of clusters with fast execution times. If a block diagonal structure existed in the input data, it was identified to a good degree of perfection. It was also found to be efficient with some imperfections i...

...read moreread less

Journal Article•DOI•

Cluster analysis and related techniques in medical research

[...]

Geoffrey J. McLachlan¹•Institutions (1)

University of Queensland¹

01 Jan 1992-Statistical Methods in Medical Research

TL;DR: This paper reviews methods of cluster analysis in the context of classifying patients on the basis of clinical and/or laboratory type observations, with particular attention devoted to the mixture likelihood-based approach.

...read moreread less

Abstract: In this paper we review methods of cluster analysis in the context of classifying patients on the basis of clinical and/or laboratory type observations. Both hierarchical and non-hierarchical methods of clustering are considered, although the emphasis is on the latter type, with particular attention devoted to the mixture likelihood-based approach. For the purposes of dividing a given data set into g clusters, this approach fits a mixture model of g components, using the method of maximum likelihood. It thus provides a sound statistical basis for clustering. The important but difficult question of how many clusters are there in the data can be addressed within the framework of standard statistical theory, although theoretical and computational difficulties still remain. Two case studies, involving the cluster analysis of some haemophilia and diabetes data respectively, are reported to demonstrate the mixture likelihood-based approach to clustering.

...read moreread less

Book Chapter•DOI•

Model-Based Object Tracking in Traffic Scenes

[...]

Dieter Koller¹, Konstantinos Daniilidis¹, T. Thórhallson¹, Hans-Hellmut Nagel², Hans-Hellmut Nagel¹ - Show less +1 more•Institutions (2)

Karlsruhe Institute of Technology¹, Indian Institute of Technology Bombay²

19 May 1992

TL;DR: This contribution addresses the problem of detection and tracking of moving vehicles in image sequences from traffic scenes recorded by a stationary camera by using a parameterized vehicle model and a recursive estimator based on a motion model for motion estimation.

...read moreread less

Abstract: This contribution addresses the problem of detection and tracking of moving vehicles in image sequences from traffic scenes recorded by a stationary camera. In order to exploit the a priori knowledge about the shape and the physical motion of vehicles in traffic scenes, a parameterized vehicle model is used for an intraframe matching process and a recursive estimator based on a motion model is used for motion estimation. The initial guess about the position and orientation for the models are computed with the help of a clustering approach of moving image features. Shadow edges of the models are taken into account in the matching process. This enables tracking of vehicles under complex illumination conditions and within a small effective field of view. Results on real world traffic scenes are presented and open problems are outlined.

...read moreread less

Proceedings Article•DOI•

A new approach to effective circuit clustering

[...]

L. Hagen¹, Andrew B. Kahng¹•Institutions (1)

University of California, Los Angeles¹

08 Nov 1992

TL;DR: The DS quality measure, a general metric for evaluation of clustering algorithms, is established and motivates the RW-ST algorithm, a self-tuning clustering method based on random walks in the circuit netlist, which efficiently captures a globally good circuit clustering.

...read moreread less

Abstract: The complexity of next-generation VLSI systems will exceed the capabilities of top-down layout synthesis algorithms, particularly in netlist partitioning and module placement. Bottom-up clustering is needed to “condense” the netlist so that the problem size becomes tractable to existing optimization methods. In this paper, we establish the DS qua.lity measure, the first general metric for evaluation of clustering algorithms. The DS metric in turn motivates our RWST algorithm, a new self-tuning clustering method based on random walks in the circuit netlist. RWST efficiently captures a globally good circuit clustering. When incorporated within a two-phase iterative Fiduccia-Mattheyses partitioning strategy, the RW-ST clustering method improves bisection width by an average of 17% over previous maiching-based methods.

...read moreread less

Journal Article•DOI•

Selection of a representative set of structures from brookhaven protein data-bank

[...]

Jorma Boberg, Tapio Salakoski, Mauno Vihinen¹•Institutions (1)

University of Turku¹

01 Oct 1992-Proteins

TL;DR: The results of these clusterings indicate conservation of α‐and β‐structures even when sequence similarity is relatively low, and suggest reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data.

...read moreread less

Abstract: Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjectice view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics. (Less)

...read moreread less

Proceedings Article•

Learning in FOL with a similarity measure

[...]

Gilles Bisson¹•Institutions (1)

University of Paris-Sud¹

12 Jul 1992

TL;DR: It is shown that major learning processes, namely generalizatiorl and clustering, can be solved in a homogeneous way by using a similarity measure.

...read moreread less

Abstract: There are still very few systems performing a Similarity Based Learning and using a First Order Logic (FOL) representation. This limitation comes from the intrinsic complexity of the learning processes in FOL and from the difficulty to deal with numerical knowledge in this representation. In this paper, we show that major learning processes, namely generalizatiorl and clustering, can be solved in a homogeneous way by using a similarity measure. As this measure is defined, the similarity computation comes down to a problem of solving a set of equations in several unknowns. The representation language used to express our examples is a subset of FOL allowing to express both quantitative knowledge and a relevance scale on the predicates.

...read moreread less

Journal Article•DOI•

Adaptive fuzzy leader clustering of complex data sets in pattern recognition

[...]

S.C. Newton¹, S. Pemmaraju¹, Sunanda Mitra¹•Institutions (1)

Texas Tech University¹

01 Sep 1992-IEEE Transactions on Neural Networks

TL;DR: The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns online in a stable and efficient manner and successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets.

...read moreread less

Abstract: A modular, unsupervised neural network architecture that can be used for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns online in a stable and efficient manner. The system used a control structure similar to that found in the adaptive resonance theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two-stage process: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid position from fuzzy C-means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The AFLC algorithm is applied to the Anderson iris data and laser-luminescent finger image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. >

...read moreread less

Proceedings Article•DOI•

On the performance of object clustering techniques

[...]

Manolis M. Tsangaris¹, Jeffrey F. Naughton¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 1992

TL;DR: This work investigates the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark and demonstrates that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system.

...read moreread less

Abstract: We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a high steady-state performance with a reasonably large cache, the appropriate metric is the number of pages to which the clustering algorithm maps the active portion of the database. For this metric, the PRP clustering algorithm, which only uses access probabilities achieves nearly optimal performance.

...read moreread less

Collapse