Showing papers by "Laurens van der Maaten published in 2013"

PDF

Open Access

Proceedings Article•DOI•

[...]

Lu Zhang¹, Laurens van der Maaten¹•Institutions (1)

23 Jun 2013

TL;DR: The experimental evaluation of the structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking and shows that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.

...read moreread less

Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation of our structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.

...read moreread less

201 citations

Proceedings Article•

Learning with Marginalized Corrupted Features

[...]

Laurens van der Maaten¹, Minmin Chen², Stephen Tyree², Kilian Q. Weinberger²•Institutions (2)

Delft University of Technology¹, Washington University in St. Louis²

16 Jun 2013

TL;DR: This work proposes to corrupt training examples with noise from known distributions within the exponential family and presents a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution.

...read moreread less

Abstract: The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples--which, however, is also computationally costly. We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution-- essentially learning with infinitely many (corrupted) training examples. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are more robust to feature deletion at test time.

...read moreread less

177 citations

Posted Content•

Barnes-Hut-SNE

[...]

Laurens van der Maaten¹•Institutions (1)

Delft University of Technology¹

15 Jan 2013-arXiv: Learning

TL;DR: The paper presents an O(N log N)-implementation of t-SNE, an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O( N^2).

...read moreread less

Abstract: The paper presents an O(N log N)-implementation of t-SNE -- an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(N^2) The new implementation uses vantage-point trees to compute sparse pairwise similarities between the input data objects, and it uses a variant of the Barnes-Hut algorithm - an algorithm used by astronomers to perform N-body simulations - to approximate the forces between the corresponding points in the embedding Our experiments show that the new algorithm, called Barnes-Hut-SNE, leads to substantial computational advantages over standard t-SNE, and that it makes it possible to learn embeddings of data sets with millions of objects

...read moreread less

64 citations

Proceedings Article•

Barnes-Hut-SNE

[...]

Laurens van der Maaten¹•Institutions (1)

Delft University of Technology¹

15 Jan 2013

TL;DR: In this article, the authors present an O(N log N)-implementation of t-SNE, an embedding technique commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(n 2 ).

...read moreread less

Abstract: The paper presents an O(N log N)-implementation of t-SNE -- an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots and that normally runs in O(N^2). The new implementation uses vantage-point trees to compute sparse pairwise similarities between the input data objects, and it uses a variant of the Barnes-Hut algorithm - an algorithm used by astronomers to perform N-body simulations - to approximate the forces between the corresponding points in the embedding. Our experiments show that the new algorithm, called Barnes-Hut-SNE, leads to substantial computational advantages over standard t-SNE, and that it makes it possible to learn embeddings of data sets with millions of objects.

...read moreread less

30 citations

Journal Article•DOI•

Wrangling Phosphoproteomic Data to Elucidate Cancer Signaling Pathways

[...]

Mark L. Grimes¹, Wan-Jui Lee², Laurens van der Maaten², Paul Shannon•Institutions (2)

University of Montana¹, Delft University of Technology²

03 Jan 2013-PLOS ONE

TL;DR: Using the R programming language and techniques from the field of pattern recognition, methods are devised to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets and show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions.

...read moreread less

Abstract: The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that tdistributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.

...read moreread less

23 citations

Journal Article•

Divvy: fast and intuitive exploratory data analysis

[...]

Joshua M. Lewis¹, Virginia R. de Sa¹, Laurens van der Maaten²•Institutions (2)

University of California, San Diego¹, Delft University of Technology²

01 Jan 2013-Journal of Machine Learning Research

TL;DR: Divvy is an application for applying unsupervised machine learning techniques (clustering and dimensionality reduction) to the data analysis process and provides a novel UI that allows researchers to tighten the action-perception loop of changing algorithm parameters and seeing a visualization of the result.

...read moreread less

Abstract: Divvy is an application for applying unsupervised machine learning techniques (clustering and dimensionality reduction) to the data analysis process. Divvy provides a novel UI that allows researchers to tighten the action-perception loop of changing algorithm parameters and seeing a visualization of the result. Machine learning researchers can use Divvy to publish easy to use reference implementations of their algorithms, which helps themachine learning field have a greater impact on research practices elsewhere.

...read moreread less

3 citations