scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis

Joseph B. Kruskal1
01 Mar 1964-Psychometrika (Springer-Verlag)-Vol. 29, Iss: 1, pp 1-27
TL;DR: The fundamental hypothesis is that dissimilarities and distances are monotonically related, and a quantitative, intuitively satisfying measure of goodness of fit is defined to this hypothesis.
Abstract: Multidimensional scaling is the problem of representingn objects geometrically byn points, so that the interpoint distances correspond in some sense to experimental dissimilarities between objects. In just what sense distances and dissimilarities should correspond has been left rather vague in most approaches, thus leaving these approaches logically incomplete. Our fundamental hypothesis is that dissimilarities and distances are monotonically related. We define a quantitative, intuitively satisfying measure of goodness of fit to this hypothesis. Our technique of multidimensional scaling is to compute that configuration of points which optimizes the goodness of fit. A practical computer program for doing the calculations is described in a companion paper.
Citations
More filters
Book
21 Mar 2002
TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.
Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

9,509 citations

Posted Content
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Abstract: Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

7,926 citations


Cites methods from "Multidimensional scaling by optimiz..."

  • ...These methods also bear close relationships to more classic approaches to spectral clustering [23], multi-dimensional scaling [19], as well as the PageRank algorithm [25]....

    [...]

Posted Content
TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.
Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning

5,390 citations

Journal ArticleDOI
TL;DR: A heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems is presented.
Abstract: We consider the problem of partitioning the nodes of a graph with costs on its edges into subsets of given sizes so as to minimize the sum of the costs on all edges cut. This problem arises in several physical situations — for example, in assigning the components of electronic circuits to circuit boards to minimize the number of connections between boards. This paper presents a heuristic method for partitioning arbitrary graphs which is both effective in finding optimal partitions, and fast enough to be practical in solving large problems.

5,082 citations

Journal ArticleDOI
Joseph B. Kruskal1
TL;DR: The numerical methods required in the approach to multi-dimensional scaling are described and the rationale of this approach has appeared previously.
Abstract: We describe the numerical methods required in our approach to multi-dimensional scaling. The rationale of this approach has appeared previously.

4,561 citations


Cites background or methods from "Multidimensional scaling by optimiz..."

  • ...If we adopt the primary approach to ties described in [ 7 ], that is, if the only constraints on the d, are those in Section 1, then this preprocessing simply consists of arranging the dissimilarities within each tie-block in such a way that the distances d, within that block form an increasing sequence....

    [...]

  • ...In a companion paper [ 7 ] we describe the rationale for our approach to scaling, which is related to that of Shepard [9]....

    [...]

  • ...In [ 7 ] we suppose that there are n objects 1, ... , n, and Chat we have experimental values 8~; of dissimilarity between them....

    [...]

References
More filters
Journal ArticleDOI
Joseph B. Kruskal1
TL;DR: The numerical methods required in the approach to multi-dimensional scaling are described and the rationale of this approach has appeared previously.
Abstract: We describe the numerical methods required in our approach to multi-dimensional scaling. The rationale of this approach has appeared previously.

4,561 citations


"Multidimensional scaling by optimiz..." refers background or methods in this paper

  • ...= stress of the fixed configuration xl , ---, x. = min numbe ra ~i i ~ -- ~ satisfying (Mon) We point out that this minimization is accomplished not by varying a trim set of values for the d~;, but rather by a rapid, efficient algorithm which is described in detail in the companion paper [12]....

    [...]

  • ...Finally, at the practical level, we give in a companion paper [12] all the important details necessary to perform this iterative technique successfully....

    [...]

Book
15 Jan 1958

3,060 citations

Journal ArticleDOI
Roger N. Shepard1
TL;DR: The results of two kinds of test applications of a computer program for multidimensional scaling on the basis of essentially nonmetric data are reported to measures of interstimulus similarity and confusability obtained from some actual psychological experiments.
Abstract: A computer program is described that is designed to reconstruct the metric configuration of a set of points in Euclidean space on the basis of essentially nonmetric information about that configuration. A minimum set of Cartesian coordinates for the points is determined when the only available information specifies for each pair of those points—not the distance between them—but some unknown, fixed monotonic function of that distance. The program is proposed as a tool for reductively analyzing several types of psychological data, particularly measures of interstimulus similarity or confusability, by making explicit the multidimensional structure underlying such data.

2,461 citations

Book
01 Jun 1961

1,606 citations


"Multidimensional scaling by optimiz..." refers methods in this paper

  • ...[For proof of this fact, see for example Kolmogorov and Fomin ([ 11 ], pp. 19-22) or Hardy, Littlewood, and Polya ([8], pp. 30-33).] If r = 2, then d~ is ordinary Euclidean distance....

    [...]