scispace - formally typeset
Search or ask a question

Showing papers by "Laurens van der Maaten published in 2012"


Journal ArticleDOI
TL;DR: The extension t-SNE is presented, which aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities, by constructing a collection of maps that reveal complementary structure in the similarity data.
Abstract: Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize "central" objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.

261 citations


Proceedings ArticleDOI
12 Nov 2012
TL;DR: A new technique called t-Distributed Stochastic Triplet Embedding (t-STE) is introduced that collapses similar points and repels dissimilar points in the embedding - even when all triplet constraints are satisfied.
Abstract: This paper considers the problem of learning an embedding of data based on similarity triplets of the form “A is more similar to B than to C”. This learning setting is of relevance to scenarios in which we wish to model human judgements on the similarity of objects. We argue that in order to obtain a truthful embedding of the underlying data, it is insufficient for the embedding to satisfy the constraints encoded by the similarity triplets. In particular, we introduce a new technique called t-Distributed Stochastic Triplet Embedding (t-STE) that collapses similar points and repels dissimilar points in the embedding — even when all triplet constraints are satisfied. Our experimental evaluation on three data sets shows that as a result, t-STE is much better than existing techniques at revealing the underlying data structure.

229 citations


Journal ArticleDOI
TL;DR: A system that automatically recognizes the action units defined in the facial action coding system (FACS) using a sophisticated deformable template, which is known as the active appearance model, to model the appearance of faces.
Abstract: In this paper, we investigate to what extent modern computer vision and machine learning techniques can assist social psychology research by automatically recognizing facial expressions. To this end, we develop a system that automatically recognizes the action units defined in the facial action coding system (FACS). The system uses a sophisticated deformable template, which is known as the active appearance model, to model the appearance of faces. The model is used to identify the location of facial feature points, as well as to extract features from the face that are indicative of the action unit states. The detection of the presence of action units is performed by a time series classification model, the linear-chain conditional random field. We evaluate the performance of our system in experiments on a large data set of videos with posed and natural facial expressions. In the experiments, we compare the action units detected by our approach with annotations made by human FACS annotators. Our results show that the agreement between the system and human FACS annotators is higher than 90% and underlines the potential of modern computer vision and machine learning techniques to social psychology research. We conclude with some suggestions on how systems like ours can play an important role in research on social signals.

40 citations


Journal Article
TL;DR: The results reveal that, although experts are reasonably consistent in their evaluation of embeddings, novices gener- ally disagree on the quality of an embedding, and the impact of this result on the way dimensionality reduction re- searchers should present their results, and on applicability of dimensionality Reduction outside of machine learning.

37 citations


Proceedings ArticleDOI
22 Oct 2012
TL;DR: A small empirical study into emotion and affect recognition based on auditory and visual features, which was performed in the context of the Audio-Visual Emotion Challenge (AVEC) 2012, found that there are only very weak (linear) relations between the features and the continuous-valued ratings.
Abstract: The paper presents a small empirical study into emotion and affect recognition based on auditory and visual features, which was performed in the context of the Audio-Visual Emotion Challenge (AVEC) 2012. The goal of this competition is to predict continuous-valued affect ratings based on the provided auditory and visual features, e.g., local binary pattern (LBP) features extracted from aligned face images, and spectral audio features.Empirically, we found that there are only very weak (linear) relations between the features and the continuous-valued ratings: our best linear regressors employ the offset-feature to exploit the fact that the ratings have a dominant direction (more increasing than decreasing). Much to our surprise, only exploitation of this bias already leads to results that improve over the baseline system presented in [10]. The best performance we obtained on the AVEC 2012 test set (averaged over the test set and over four affective dimensions) is a correlation between predicted and ground-truth ratings of 0.2255 when making continuous predictions, and 0.1920 when making word-level predictions.

19 citations


Proceedings ArticleDOI
17 Dec 2012
TL;DR: This paper presents a generic representation and suitable implementations for three commonly used cost aggregators on many-core processors, and performs typical optimizations on the kernels, which leads to significant performance improvement (up to two orders of magnitude).
Abstract: Real-time stereo matching, which is important in many applications like self-driving cars and 3-D scene reconstruction, requires large computation capability and high memory bandwidth. The most time-consuming part of stereo-matching algorithms is the aggregation of information (i.e. costs) over local image regions. In this paper, we present a generic representation and suitable implementations for three commonly used cost aggregators on many-core processors. We perform typical optimizations on the kernels, which leads to significant performance improvement (up to two orders of magnitude). Finally, we present a performance model for the three aggregators to predict the aggregation speed for a given pair of input images on a given architecture. Experimental results validate our model with an acceptable error margin (an average of 10.4%). We conclude that GPU-like many-cores are excellent platforms for accelerating stereo matching.

12 citations


Journal ArticleDOI
TL;DR: This study investigates a recently proposed ordination approach, multiple maps t-SNE, that constructs multiple, independent ordination spaces in order to reveal and visualize complementary structure in the data, and compares it to several conventional ordination approaches.

9 citations