scispace - formally typeset
Search or ask a question
Author

Rodrigo Macías

Other affiliations: University of Granada
Bio: Rodrigo Macías is an academic researcher from Centro de Investigación en Matemáticas. The author has contributed to research in topics: k-means clustering & Latent class model. The author has an hindex of 4, co-authored 6 publications receiving 39 citations. Previous affiliations of Rodrigo Macías include University of Granada.

Papers
More filters
Journal ArticleDOI
TL;DR: A cluster-MDS model for two-way one-mode continuous rating dissimilarity data that aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space is proposed.
Abstract: In this paper, we propose a cluster-MDS model for two-way one-mode continuous rating dissimilarity data. The model aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space. Under the normal distribution assumption, a latent class model is developed in terms of the set of dissimilarities in a maximum likelihood framework. In each iteration, the probability that a dissimilarity belongs to each of the blocks conforming to a partition of the original dissimilarity matrix, and the rest of parameters, are estimated in a simulated annealing based algorithm. A model selection strategy is used to test the number of latent classes and the dimensionality of the problem. Both simulated and classical dissimilarity data are analyzed to illustrate the model.

17 citations

Journal ArticleDOI
TL;DR: A dual latent class model is proposed for a matrix of preference ratings data, which will partition the individuals and the objects into classes, and simultaneously represent the cluster centers in a low dimensional space, while individuals and objects retain their preference relationship.

14 citations

Journal ArticleDOI
TL;DR: This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode $$N\times N$$N×N dissimilarity matrix describing the objects.
Abstract: One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.

9 citations

Journal ArticleDOI
TL;DR: An alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding.
Abstract: Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.

8 citations

Journal ArticleDOI
TL;DR: In this paper, the authors analyzed the usefulness of multidimensional scaling in relation to performing K-means clustering on a dissimilarity matrix, when the dimensionality of the objects is unknown.
Abstract: In this article, we analyse the usefulness of multidimensional scaling in relation to performing K-means clustering on a dissimilarity matrix, when the dimensionality of the objects is unknown. In this situation, traditional algorithms cannot be used, and so K-means clustering procedures are being performed directly on the basis of the observed dissimilarity matrix. Furthermore, the application of criteria originally formulated for two-mode data sets to determine the number of clusters depends on their possible reformulation in a one-mode situation. The linear invariance property in K-means clustering for squared dissimilarities, together with the use of multidimensional scaling, is investigated to determine the cluster membership of the observations and to address the problem of selecting the number of clusters in K-means for a dissimilarity matrix. In particular, we analyse the performance of K-means clustering on the full dimensional scaling configuration and on the equivalently partitioned configuration related to a suitable translation of the squared dissimilarities. A Monte Carlo experiment is conducted in which the methodology examined is compared with the results obtained by procedures directly applicable to a dissimilarity matrix.

4 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: Big search data from a product- and price-comparison site is used to derive consideration sets of consumers that reflect competition between products and shows that the method outperforms traditional models such as multidimensional scaling.
Abstract: In large markets comprising hundreds of products, comprehensive visualization of competitive market structures can be cumbersome and complex. Yet, as we show empirically, reduction of the analysis to smaller representative product sets can obscure important information. Herein we use big search data from a product- and price-comparison site to derive consideration sets of consumers that reflect competition between products. We integrate these data into a new modeling and two-dimensional mapping approach that enables the user to visualize asymmetric competition in large markets (>1,000 products) and to identify distinct submarkets. An empirical application to the LED-TV market, comprising 1,124 products and 56 brands, leads to valid and useful insights and shows that our method outperforms traditional models such as multidimensional scaling. Likewise, we demonstrate that big search data from product- and price-comparison sites provide higher external validity than search data from Google and Amazon.

64 citations

Journal ArticleDOI
TL;DR: The experimental results confirm the theoretical expectation that Simulated Annealing is a suitable strategy to deal by itself with the optimization problems in Multidimensional Scaling, in particular for City-Block, Euclidean and Infinity metrics.
Abstract: It is well known that considering a non-Euclidean Minkowski metric in Multidimensional Scaling, either for the distance model or for the loss function, increases the computational problem of local minima considerably. In this paper, we propose an algorithm in which both the loss function and the composition rule can be considered in any Minkowski metric, using a multivariate randomly alternating Simulated Annealing procedure with permutation and translation phases. The algorithm has been implemented in Fortran and tested over classical and simulated data matrices with sizes up to 200 objects. A study has been carried out with some of the common loss functions to determine the most suitable values for the main parameters. The experimental results confirm the theoretical expectation that Simulated Annealing is a suitable strategy to deal by itself with the optimization problems in Multidimensional Scaling, in particular for City-Block, Euclidean and Infinity metrics.

28 citations

Journal ArticleDOI
TL;DR: Correspondence analysis has come of age and is now generally accepted into the regular toolbox of multivariate techniques.

22 citations