scispace - formally typeset
Search or ask a question

Showing papers by "Laurens van der Maaten published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The experimental evaluation of the structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking and shows that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.

201 citations


Proceedings Article
16 Jun 2013
TL;DR: This work proposes to corrupt training examples with noise from known distributions within the exponential family and presents a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution.
Abstract: The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples--which, however, is also computationally costly. We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution-- essentially learning with infinitely many (corrupted) training examples. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are more robust to feature deletion at test time.

177 citations


Posted Content
TL;DR: The paper presents an O(N log N)-implementation of t-SNE, an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O( N^2).
Abstract: The paper presents an O(N log N)-implementation of t-SNE -- an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(N^2) The new implementation uses vantage-point trees to compute sparse pairwise similarities between the input data objects, and it uses a variant of the Barnes-Hut algorithm - an algorithm used by astronomers to perform N-body simulations - to approximate the forces between the corresponding points in the embedding Our experiments show that the new algorithm, called Barnes-Hut-SNE, leads to substantial computational advantages over standard t-SNE, and that it makes it possible to learn embeddings of data sets with millions of objects

64 citations


Proceedings Article
15 Jan 2013
TL;DR: In this article, the authors present an O(N log N)-implementation of t-SNE, an embedding technique commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(n 2 ).
Abstract: The paper presents an O(N log N)-implementation of t-SNE -- an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(N^2). The new implementation uses vantage-point trees to compute sparse pairwise similarities between the input data objects, and it uses a variant of the Barnes-Hut algorithm - an algorithm used by astronomers to perform N-body simulations - to approximate the forces between the corresponding points in the embedding. Our experiments show that the new algorithm, called Barnes-Hut-SNE, leads to substantial computational advantages over standard t-SNE, and that it makes it possible to learn embeddings of data sets with millions of objects.

30 citations


Journal ArticleDOI
03 Jan 2013-PLOS ONE
TL;DR: Using the R programming language and techniques from the field of pattern recognition, methods are devised to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets and show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions.
Abstract: The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that tdistributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.

23 citations


Journal Article
TL;DR: Divvy is an application for applying unsupervised machine learning techniques (clustering and dimensionality reduction) to the data analysis process and provides a novel UI that allows researchers to tighten the action-perception loop of changing algorithm parameters and seeing a visualization of the result.
Abstract: Divvy is an application for applying unsupervised machine learning techniques (clustering and dimensionality reduction) to the data analysis process. Divvy provides a novel UI that allows researchers to tighten the action-perception loop of changing algorithm parameters and seeing a visualization of the result. Machine learning researchers can use Divvy to publish easy to use reference implementations of their algorithms, which helps themachine learning field have a greater impact on research practices elsewhere.

3 citations