scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1994"


Journal ArticleDOI
TL;DR: An efficient method for estimating cluster centers of numerical data that can be used to determine the number of clusters and their initial values for initializing iterative optimization-based clustering algorithms such as fuzzy C-means is presented.
Abstract: We present an efficient method for estimating cluster centers of numerical data. This method can be used to determine the number of clusters and their initial values for initializing iterative optimization-based clustering algorithms such as fuzzy C-means. Here we use the cluster estimation method as the basis of a fast and robust algorithm for identifying fuzzy models. A benchmark problem involving the prediction of a chaotic time series shows this model identification method compares favorably with other, more computationally intensive methods. We also illustrate an application of this method in modeling the relationship between automobile trips and demographic factors.

2,815 citations


Proceedings Article
12 Sep 1994
TL;DR: The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.
Abstract: Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases In this paper, we explore whether clustering methods have a role to play in spatial data mining To this end, we develop a new clustering method called CLAHANS which is based on randomized search We also develop two spatial data mining algorithms that use CLAHANS Our analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms Furthermore, experiments conducted to compare the performance of CLAHANS with that of existing clustering methods show that CLAHANS is the most efficient

1,999 citations


Proceedings Article
01 Jan 1994
TL;DR: An incremental network model is introduced which is able to learn the important topological relations in a given set of input vectors by means of a simple Hebb-like learning rule.
Abstract: An incremental network model is introduced which is able to learn the important topological relations in a given set of input vectors by means of a simple Hebb-like learning rule. In contrast to previous approaches like the "neural gas" method of Martinetz and Schulten (1991, 1994), this model has no parameters which change over time and is able to continue learning, adding units and connections, until a performance criterion has been met. Applications of the model include vector quantization, clustering, and interpolation.

1,806 citations


Journal ArticleDOI
TL;DR: A new self-organizing neural network model that has two variants that performs unsupervised learning and can be used for data visualization, clustering, and vector quantization is presented and results on the two-spirals benchmark and a vowel classification problem are presented that are better than any results previously published.

1,319 citations


Book ChapterDOI
01 Jan 1994
TL;DR: The aim of this paper is to compare three methods based on the hypervolume criterion with four other well-known methods for determining the number of clusters on artificial data sets.
Abstract: A problem common to all clustering techniques is the difficulty of deciding the number of clusters present in the data. The aim of this paper is to compare three methods based on the hypervolume criterion with four other well-known methods. This evaluation of procedures for determining the number of clusters is conducted on artificial data sets. To provide a variety of solutions the data sets are analysed by six clustering methods. We finally conclude by pointing out the performance of each method and by giving some guidance for making choices between them.

1,264 citations


Posted Content
TL;DR: Deterministic annealing is used to find lowest distortion sets of clusters: as the annealed parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data.
Abstract: We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

1,024 citations


Proceedings ArticleDOI
08 Mar 1994
TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.
Abstract: The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

781 citations


Journal ArticleDOI
TL;DR: A low-complexity heuristic for scheduling parallel tasks on an unbounded number of completely connected processors, named the dominant sequence clustering algorithm (DSC), which guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's.
Abstract: We present a low-complexity heuristic, named the dominant sequence clustering algorithm (DSC), for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is on average, comparable to, or even better than, other higher-complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAG's) is NP-complete. DSC finds optimal schedules for special classes of DAG's, such as fork, join, coarse-grain trees, and some fine-grain trees. It guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's. We compare DSC with three higher-complexity general scheduling algorithms: the ETF by J.J. Hwang, Y.C. Chow, F.D. Anger, and C.Y. Lee (1989); V. Sarkar's (1989) clustering algorithm; and the MD by M.Y. Wu and D. Gajski (1990). We also give a sample of important practical applications where DSC has been found useful. >

694 citations


Journal ArticleDOI
TL;DR: This work develops, based upon the mountain clustering method, a procedure for learning fuzzy systems models from data, and uses a back propagation algorithm to tune the model.
Abstract: We develop, based upon the mountain clustering method, a procedure for learning fuzzy systems models from data. First we discuss the mountain clustering method. We then show how it could be used to obtain the structure of fuzzy systems models. The initial estimates of this model are obtained from the cluster centers. We then use a back propagation algorithm to tune the model.

670 citations


Proceedings Article
01 Jul 1994

622 citations


Journal ArticleDOI
TL;DR: A simple and effective approach for approximate estimation of the cluster centers on the basis of the concept of a mountain function, based upon a griding on the space, the construction of amountain function from the data and then a destruction of the mountains to obtain the cluster center centers.
Abstract: We develop a simple and effective approach for approximate estimation of the cluster centers on the basis of the concept of a mountain function. We call the procedure the mountain method. It can be useful for obtaining the initial values of the clusters that are required by more complex cluster algorithms. It also can be used as a stand alone simple approximate clustering technique. The method is based upon a griding on the space, the construction of a mountain function from the data and then a destruction of the mountains to obtain the cluster centers. >

Book ChapterDOI
10 Jul 1994
TL;DR: On four datasets, it is shown that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater.
Abstract: With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term “prototypes” refers to the reference instances used in a nearest neighbor computation — the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. We briefly investigate how random mutation hill climbing may be applied to select features and prototypes simultaneously. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target classes.

Book
09 Jan 1994
TL;DR: Inductive Learning Algorithms for complex Systems Modeling is a professional monograph that surveys new types of learning algorithms for modelling complex scientific systems in science and engineering.
Abstract: Introduction: Systems and Cybernetics. Inductive Learning Algorithms: Self-Organization Method. Network Structures. Long Term Quantitative Predictions. Dialogue Language Generalization. Noise Immunity and Convergence: Analogy with Information Theory. Classification and Analysis of Criteria. Improvement of Noise Immunity. Asymptotic Properties of Criteria. Balance Criterion of Predictions. Convergence of Algorithms. Physical Fields and Modeling: Finite-Difference Pattern Schemes. Comparative Studies. Cyclic Processes. Clusterization and Recognition: Self-Organization Modeling and Clustering. Methods of Self-Organization Clustering. Objective Computer Clustering Algorithm. Levels of Discretization and Balance Criterion. Forecasting Methods of Analogues. Applications: Fields of Application. Weather Modeling. Ecological System Studies. Modeling of Economical Systems. Agricultural System Studies. Modeling of Solar Activity. Inductive and Deductive Networks: Self-Organization Mechanism in the Networks. Network Techniques. Generalization. Comparison and Simulation Results. Basic Algorithms and Program Listings: Computational Aspects of Multilayered Algorithm. Computational Aspects of Combinatorial Algorithm. Computational Aspects of Harmonical Algorithm.

Journal ArticleDOI
TL;DR: A spectral approach to multi-way ratio-cut partitioning that provides a generalization of the ratio- cut cost metric to L-way partitioning and a lower bound on this cost metric is developed.
Abstract: Recent research on partitioning has focused on the ratio-cut cost metric, which maintains a balance between the cost of the edges cut and the sizes of the partitions without fixing the size of the partitions a priori. Iterative approaches and spectral approaches to two-way ratio-cut partitioning have yielded higher quality partitioning results. In this paper, we develop a spectral approach to multi-way ratio-cut partitioning that provides a generalization of the ratio-cut cost metric to L-way partitioning and a lower bound on this cost metric. Our approach involves finding the k smallest eigenvalue/eigenvector pairs of the Laplacian of the graph. The eigenvectors provide an embedding of the graph's n vertices into a k-dimensional subspace. We devise a time and space efficient clustering heuristic to coerce the points in the embedding into k partitions. Advancement over the current work is evidenced by the results of experiments on the standard benchmarks. >

Journal ArticleDOI
TL;DR: A fuzzy Kohonen clustering network is proposed which integrates the Fuzzy c-Means (FCM) model into the learning rate and updating strategies of the Kohonen network, and the numerical results show improved convergence as well as reduced labeling errors.

Journal ArticleDOI
TL;DR: An efficient method is proposed to obtain a good initial codebook that can accelerate the convergence of the generalized Lloyd algorithm and achieve a better local minimum as well.
Abstract: The generalized Lloyd algorithm plays an important role in the design of vector quantizers (VQ) and in feature clustering for pattern recognition. In the VQ context, this algorithm provides a procedure to iteratively improve a codebook and results in a local minimum that minimizes the average distortion function. We propose an efficient method to obtain a good initial codebook that can accelerate the convergence of the generalized Lloyd algorithm and achieve a better local minimum as well. >

Proceedings ArticleDOI
10 Jun 1994
TL;DR: The optimum solution to the k-clustering problem is characterized by the ordinary Euclidean Voronoi diagram and the weighted Vor onoi diagram with both multiplicative and additive weights.
Abstract: In this paper we consider thek-clustering problem for a set S of n points i=(xi) in thed-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion to minimize, the sum on intra-cluster errors over every cluster is used, and as the intra-cluster criterion of a cluster Sj,|Sj|α-1 ΣpiϵSj || xi - x(Sj)||2is considered, where ||·|| is the L2 norm and x(Sj) is the centroid of points in Sj, i.e., (1/|Sj|)Σp ∈Sjxi. The cases of α=1,2 correspond to the sum of squared errors and the all-pairs sum of squared errors, respectively.The k-clustering problem under the criterion with α=1,2 are treated in a unified manner by characterizing the optimum solution to the kclustering problem by the ordinary Euclidean Voronoi diagram and the weighted Voronoi diagram with both multiplicative and additive weights. With this framework, the problem is related to the generalized primary shutter function for the Voronoi diagrams. The primary shutter function is shown to be O(nO(kd)), which implies that, for fixed k, this clustering problem can be solved in a polynomial time. For the problem with the most typical intra-cluster criterion of the sum of squared errors, we also present an efficient randomized algorithm which, roughly speaking, finds an ∈–approximate 2–clustering in O(n(1/∈)d) time, which is quite practical and may be used to real large-scale problems such as the color quantization problem.

Proceedings ArticleDOI
23 May 1994
TL;DR: A new model of learning probability distributions from independent draws is introduced, inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples, in the sense that it emphasizes efficient and approximate learning, and it studies the learnability of restricted classes of target distributions.
Abstract: We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples [24], in the sense that we emphasize efficient and approximate learning, and we study the learnability of restricted classes of target distributions. The dist ribut ion classes we examine are often defined by some simple computational mechanism for transforming a truly random string of input bits (which is not visible to the learning algorithm) into the stochastic observation (output) seen by the learning algorithm. In this paper, we concentrate on discrete distributions over {O, I}n. The problem of inferring an approximation to an unknown probability distribution on the basis of independent draws has a long and complex history in the pattern recognition and statistics literature. For instance, the problem of estimating the parameters of a Gaussian density in highdimensional space is one of the most studied statistical problems. Distribution learning problems have often been investigated in the context of unsupervised learning, in which a linear mixture of two or more distributions is generating the observations, and the final goal is not to model the distributions themselves, but to predict from which distribution each observation was drawn. Data clustering methods are a common tool here. There is also a large literature on nonpararnetric density estimation, in which no assumptions are made on the unknown target density. Nearest-neighbor approaches to the unsupervised learning problem often arise in the nonparametric setting. While we obviously cannot do justice to these areas here, the books of Duda and Hart [9] and Vapnik [25] provide excellent overviews and introductions to the pattern recognition work, as well as many pointers for further reading. See also Izenman’s recent survey article [16]. Roughly speaking, our work departs from the traditional statistical and pattern recognition approaches in two ways. First, we place explicit emphasis on the comput ationrd complexity of distribution learning. It seems fair to say that while previous research has provided an excellent understanding of the information-theoretic issues involved in dis-

Journal ArticleDOI
TL;DR: This paper substantially improves RFCM by generalizing it to the case of arbitrary (symmetric) dissimilarity data, and is applicable to any numerical relational data that are positive, reflexive (or anti-reflexive) and symmetric.

Journal ArticleDOI
TL;DR: A random-effects regression model is proposed for analysis of clustered data and a maximum marginal likelihood solution is described, and available statistical software for the model is discussed.
Abstract: A random-effects regression model is proposed for analysis of clustered data. Unlike ordinary regression analysis of clustered data, random-effects regression models do not assume that each observation is independent but do assume that data within clusters are dependent to some degree. The degree of this dependency is estimated along with estimates of the usual model parameters, thus adjusting these effects for the dependency resulting from the clustering of the data. A maximum marginal likelihood solution is described, and available statistical software for the model is discussed. An analysis of a dataset in which students are clustered within classrooms and schools is used to illustrate features of random-effects regression analysis, relative to both individual-level analysis that ignores the clustering of the data, and classroom-level analysis that aggregates the individual data.

Proceedings Article
01 Jan 1994
TL;DR: This paper considers the k-clustering problem for a set S of n points pi = (~i) in the d-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display.
Abstract: In this paper we consider the k-clustering problem for a set S of n points pi = (~i) in the d-dimensional space with variance-based errors as clustering criteria, motivated from the color quantization problem of computing a color lookup table for frame buffer display. As the inter-cluster criterion to minimize, the sum of intracluster errors over every cluster is used, and as the int racluster criterion of a cluster Sj,

ReportDOI
01 Dec 1994
TL;DR: A set of algorithms are described that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner that make two distinct appeals to the Expectation-Maximization principle.
Abstract: Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation

Journal ArticleDOI
TL;DR: In this article, a simple constructive heuristic, a λ-interchange generation mechanism, a hybrid simulated annealing (SA) and tabu search (TS) algorithm which has computationally desirable features using a new non-monotonic cooling schedule, are developed.

Journal ArticleDOI
TL;DR: The system described here is an attempt to provide completely automatic segmentation and labeling of normal volunteer brains and the absolute accuracy of the segmentations has not yet been rigorously established.
Abstract: The authors' main contribution is to build upon their earlier efforts by expanding the tissue model concept to cover a brain volume. Furthermore, processing time is reduced and accuracy is enhanced by the use of knowledge propagation, where information derived from one slice is made available to succeeding slices as additional knowledge. The system is organized as follows. Each MR slice is initially segmented by an unsupervised fuzzy c-means clustering algorithm. Next, an expert system uses model-based recognition techniques to locate a landmark, called a focus-of attention tissue. Qualitative models of slices of brain tissue are defined and matched with their instances from imaged slices. If a significant deformation is detected in a tissue, the slice is classified to be abnormal and volume processing halts. Otherwise, the expert system locates the next focus-of-attention tissue, based on a hierarchy of expected tissues. This process is repeated until either a slice is classified as abnormal or all tissues of the slice are labeled. If the slice is determined to be abnormal, the entire volume is also considered abnormal and processing halts. Otherwise, the system will proceed to the next slice and repeat the classification steps until all slices that comprise the volume are processed. A rule-based expert system tool, CLIPS, is used to organize the system. Low level modules for image processing and high level modules for image analysis, all written in the C language, are called as actions from the right hand sides of the rules. The system described here is an attempt to provide completely automatic segmentation and labeling of normal volunteer brains. The absolute accuracy of the segmentations has not yet been rigorously established. The relative accuracy appears acceptable. Efforts have been made to segment an entire volume (rather than merging a set of segmented slices) using supervised pattern recognition techniques or unsupervised fuzzy clustering. However, there is sometimes enough data nonuniformity between slices to prevent satisfactory segmentation. >

Journal ArticleDOI
TL;DR: A method for locating clusters of geometrically similar conformers in ensembles of chemical conformations is described, which first calculates the pairwise interconformational distance matrix in either torsional or Cartesian space and uses an agglomerative, single‐link clustering method to define a hierarchy of clusterings in the same space.
Abstract: We describe a method for locating clusters of geometrically similar conformers in ensembles of chemical conformations. We first calculate the pairwise interconformational distance matrix in either torsional or Cartesian space and then use an agglomerative, single-link clustering method to define a hierarchy of clusterings in the same space. Especially good clusterings are distinguished by high values of the separation ratio: the ratio of the shortest intercluster distance to the characteristic threshold distance defining the clustering. We also discuss other statistics. The method has been embodied in a program called XCluster, which can display the distance matrix, the hierarchy of clusterings, and the clustering statistics in a variety of formats. XCluster can also write out the clustered conformations for subsequent or simultaneous viewing with a molecular visualization program. We demonstrate the sorts of insight that this approach affords with examples obtained from conformational search and molecular dynamics procedures. © 1994 by John Wiley & Sons, Inc.

Proceedings ArticleDOI
24 Jul 1994
TL;DR: The clustering algorithm presented here operates by estimating energy transfer between collections of objects while maintaining reliable error bounds on each transfer, and has obtained speedups of two orders of magnitude for environments of moderate complexity while maintaining comparable accuracy.
Abstract: We present an approach for accelerating hierarchical radiosity by clustering objects. Previous approaches constructed effective hierarchies by subdividing surfaces, but could not exploit a hierarchical grouping on existing surfaces. This limitation resulted in an excessive number of initial links in complex environments. Initial linking is potentially the most expensive portion of hierarchical radiosity algorithms, and constrains the complexity of the environments that can be simulated. The clustering algorithm presented here operates by estimating energy transfer between collections of objects while maintaining reliable error bounds on each transfer. Two methods of bounding the transfers are employed with different tradeoffs between accuracy and time. In contrast with the O(s2) time and space complexity of the initial linking in previous hierarchical radiosity algorithms, the new methods have complexities of O(slogs) and O(s) for both time and space. Using these methods we have obtained speedups of two orders of magnitude for environments of moderate complexity while maintaining comparable accuracy.

Journal ArticleDOI
TL;DR: Relevance of the clustering to ecological, immune, neural, and cellular networks is discussed, with the emphasis on partially ordered states with chaotic itinerancy, and an extension allowing for the growth of the number of elements is given in connection with cell differentiation.

Proceedings ArticleDOI
06 Oct 1994
TL;DR: This paper takes advantage of the ability of many active optical range sensors to record intensity or even color in addition to the range information to improve the registration procedure by constraining potential matches between pairs of points based on a similarity measure derived from the intensity information.
Abstract: The determination of relative pose between two range images, also called registration, is a ubiquitous problem in computer vision, for geometric model building as well as dimensional inspection. The method presented in this paper takes advantage of the ability of many active optical range sensors to record intensity or even color in addition to the range information. This information is used to improve the registration procedure by constraining potential matches between pairs of points based on a similarity measure derived from the intensity information. One difficulty in using the intensity information is its dependence on the measuring conditions such as distance and orientation. The intensity or color information must first be converted into a viewpoint-independent feature. This can be achieved by inverting an illumination model, by differential feature measurements or by simple clustering. Following that step, a robust iterative closest point method is then used to perform the pose determination. Using the intensity can help to speed up convergence or, in cases of remaining degrees of freedom (e.g. on images of a sphere), to additionally constrain the match. The paper will describe the algorithmic framework and provide examples using range-and-color images.

BookDOI
01 Jan 1994
TL;DR: Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets and a generalisation of the diameter criterion for clustering.
Abstract: Classification and Clustering: Problems for the Future.- From classifications to cognitive categorization: the example of the road lexicon.- A review of graphical methods in Japan-from histogram to dynamic display.- New Data and New Tools: A Hypermedia Environment for Navigating Statistical Knowledge in Data Science.- On the logical necessity and priority of a monothetic conception of class, and on the consequent inadequacy of polythetic accounts of category and categorization.- Research and Applications of Quantification Methods in East Asian Countries.- Algorithms for a geometrical P.C.A. with the L1-norm.- Comparison of hierarchical classifications.- On quadripolar Robinson dissimilarity matrices.- An Ordered Set Approach to Neutral Consensus Functions.- From Apresjan Hierarchies and Bandelt-Dress Weak hierarchies to Quasi-hierarchies.- Spanning trees and average linkage clustering.- Adjustments of tree metrics based on minimum spanning trees.- The complexity of the median procedure for binary trees.- A multivariate analysis of a series of variety trials with special reference to classification of varieties.- Quality control of mixture. Application: The grass.- Mixture Analysis with Noisy Data.- Locally optimal tests on spatial clustering.- Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis.- An examination of procedures for determining the number of clusters in a data set.- The gap test: an optimal method for determining the number of natural classes in cluster analysis.- Mode detection and valley seeking by binary morphological analysis of connectivity for pattern classification.- Interactive Class Classification Using Types.- K-means clustering in a low-dimensional Euclidean space.- Complexity relaxation of dynamic programming for cluster analysis.- Partitioning Problems in Cluster Analysis: A Review of Mathematical Programming Approaches.- Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets.- Graphs and structural similarities.- A generalisation of the diameter criterion for clustering.- Percolation and multimodal data structuring.- Classification and Discrimination Techniques Applied to the Early Detection of Business Failure.- Recursive Partition and Symbolic Data Analysis.- Interpretation Tools For Generalized Discriminant Analysis.- Inference about rejected cases in discriminant analysis.- Structure Learning of Bayesian Networks by Genetic Algorithms.- On the representation of observational data used for classification and identification of natural objects.- Alternative strategies and CATANOVA testing in two-stage binary segmentation.- Alignment, Comparison and Consensus of Molecular Sequences.- An Empirical Evaluation of Consensus Rules for Molecular Sequences.- A Probabilistic Approach To Identifying Consensus In Molecular Sequences.- Applications of Distance Geometry to Molecular Conformation.- Classification of aligned biological sequences.- Use of Pyramids in Symbolic Data Analysis.- Proximity Coefficients between Boolean symbolic objects.- Conceptual Clustering in Structured Domains: A Theory Guided Approach.- Automatic Aid to Symbolic Cluster Interpretation.- Symbolic Clustering Algorithms using Similarity and Dissimilarity Measures.- Feature Selection for Symbolic Data Classification.- Towards extraction method of knowledge founded by symbolic objects.- One Method of Classification based on an Analysis of the Structural Relationship between Independent Variables.- The Integration of Neural Networks with Symbolic Knowledge Processing.- Ordering of Fuzzy k-Partitions.- On the Extension of Probability Theory and Statistics to the Handling of Fuzzy Data.- Fuzzy Regression.- Clustering and Aggregation of Fuzzy Preference Data: Agreement vs. Information.- Rough Classification with Valued Closeness Relation.- Representing proximities by network models.- An Eigenvector Algorithm to Fit lp-Distance Matrices.- A non linear approach to Non Symmetrical Data Analysis.- An Algorithmic Approach to Bilinear Models for Two-Way Contingency Tables.- New Approaches Based on Rankings in Sensory Evaluation.- Estimating failure times distributions from censored systems arranged in series.- Calibration Used as a Nonresponse Adjustment.- Least Squares Smoothers and Additive Decomposition.- High Dimensional Representations and Information Retrieval.- Experiments of Textual Data Analysis at Electricite de France.- Conception of a Data Supervisor in the Prospect of Piloting Management Quality of Service and Marketing.- Discriminant Analysis Using Textual Data.- Recent Developments in Case Based Reasoning: Improvements of Similarity Measures.- Contiguity in discriminant factorial analysis for image clustering.- Exploratory and Confirmatory Discrete Multivariate Analysis in a Probabilistic Approach for Studying the Regional Distribution of Aids in Angola.- Factor Analysis of Medical Image Sequences (FAMIS): Fundamental principles and applications.- Multifractal Segmentation of Medical Images.- The Human Organism-a Place to Thrive for the Immuno-Deficiency Virus.- Comparability and usefulness of newer and classical data analysis techniques. Application in medical domain classification.- The Classification of IRAS Point Sources.- Astronomical classification of the Hipparcos input catalogue.- Group identification and individual assignation of stars from kinematical and luminosity parameters.- Specific numerical and symbolic analysis of chronological series in view to classification of long period variable stars.- Author and Subject Index.

Patent
02 Sep 1994
TL;DR: In this paper, a system and method for processing stroke-based handwriting data for the purposes of automatically scoring and clustering the handwritten data to form letter prototypes is presented, where each character is represented by a plurality of mathematical feature vectors and each one of the plurality of feature vectors is labelled as corresponding to a particular character in the character strings.
Abstract: A system and method for processing stroke-based handwriting data for the purposes of automatically scoring and clustering the handwritten data to form letter prototypes. The present invention includes a method for processing digitized stroke-based handwriting data of known character strings, where each of the character strings is represented by a plurality of mathematical feature vectors. In this method, each one of the plurality of feature vectors is labelled as corresponding to a particular character in the character strings. A trajectory is then formed for each one of the plurality of feature vectors labelled as corresponding to a particular character. After the trajectories are formed, a distance value is calculated for each pair of trajectories corresponding to the particular character using dynamic time warping method. The trajectories which are within a sufficiently small distance of each other are grouped to form a plurality of clusters. The clusters are used to define handwriting prototypes which identify subcategories of the character.