scispace - formally typeset
Open AccessJournal ArticleDOI

Approximated and User Steerable tSNE for Progressive Visual Analytics

Reads0
Chats0
TLDR
In this article, a controllable t-Distributed Stochastic Neighbor Embedding (tSNE) is introduced to enable interactive data exploration, where the user can decide on local refinements and steer the approximation level during the analysis.
Abstract
Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.

read more

Content maybe subject to copyright    Report

Delft University of Technology
Approximated and User Steerable tSNE for Progressive Visual Analytics
Pezzotti, Nicola; Lelieveldt, Boudewijn P.F.; van der Maaten, Laurens; Höllt, Thomas; Eisemann, Elmar;
Vilanova, Anna
DOI
10.1109/TVCG.2016.2570755
Publication date
2016
Document Version
Accepted author manuscript
Published in
IEEE Transactions on Visualization and Computer Graphics
Citation (APA)
Pezzotti, N., Lelieveldt, B. P. F., van der Maaten, L., Höllt, T., Eisemann, E., & Vilanova, A. (2016).
Approximated and User Steerable tSNE for Progressive Visual Analytics.
IEEE Transactions on
Visualization and Computer Graphics
,
23
(7), 1739-1752. https://doi.org/10.1109/TVCG.2016.2570755
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. -, NO. -, MONTH - 1
Approximated and User Steerable tSNE for
Progressive Visual Analytics
Nicola Pezzotti, Boudewijn P.F. Lelieveldt, Laurens van der Maaten,
Thomas H
¨
ollt, Elmar Eisemann, and Anna Vilanova
Abstract—Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as
well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D
embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited
technique for the visualization of high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow
initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE),
which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a
density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local
refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world
research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.
Index Terms—High Dimensional Data, Dimensionality Reduction, Progressive Visual Analytics, Approximate Computation
F
1 INTRODUCTION
V
ISUAL analysis of high dimensional data is a chal-
lenging process. Direct visualizations such as parallel
coordinates [1] or scatterplot matrices [2] work well for a
few dimensions but do not scale to hundreds or thousands
of dimensions. Typically indirect visualization is used for
these cases. First the dimensionality of the data is reduced,
usually to two or three dimensions, then the remaining
dimensions are used to lay out the data for visual inspection,
for example in a two dimensional scatterplot.
Dimensionality reduction techniques have been an active
field of research in the last years, resulting in a number of
viable techniques [3]. A variant of tSNE [4], the Barnes Hut
SNE [5] has been accepted as the state of the art for non-
linear dimensionality reduction applied to visual analysis
of high-dimensional space in several application areas, such
as life sciences [6], [7], [8], [9]. tSNE produces 2D and 3D
embeddings that are meant to preserve local structure in
the high-dimensional data. The analyst inspects the embed-
dings with the goal to identify clusters or patterns that are
used to generate new hypothesis on the data, however, the
computational complexity of this technique does not allow
direct employment in interactive systems. This limitation
makes the analytic process a time consuming task that can
take hours, or even days, to adjust the parameters and
generate the right embedding to be analyzed.
N. Pezzotti, T. H¨ollt, E. Eisemann, and A. Vilanova are with the Computer
Graphics and Visualization group, Delft University of Technology, Delft,
the Netherlands.
B. P.F. Lelieveldt and L. van der Maaten are with the Pattern Recognition
and Bioinformatics group, Delft University of Technology, Delft, the
Netherlands.
B. P.F. Lelieveldt is with the Division of Image Processing, Department of
Radiology, Leiden University Medical Center, Leiden, the Netherlands.
Manuscript received August 4, 2015; revised -, -.
Recently Stolper et al. [10], as well as M
¨
uhlbacher et
al. [11] introduced Progressive Visual Analytics. The idea
of Progressive Visual Analytics is to provide the user with
meaningful intermediate results, in case computation of the
final result is too costly. Based on these intermediate results
the user can start with the analysis process. M
¨
uhlbacher et
al. also provide a set of requirements, which an algorithm
needs to fulfill in order to be suitable for Progressive Vi-
sual Analytics. Based on these requirements they analyze
a series of different algorithms, commonly deployed in
visual analytics systems and conclude that, for example,
tSNE fulfills all requirements. The reason being that the
minimization in tSNE builds up on the iterative gradient
descent technique [4] and can therefore be used directly for
a per-iteration visualization, as well as interaction with the
intermediate results. However, M
¨
uhlbacher et al. ignore the
fact that the distances in the high-dimensional space need
to be precomputed to start the minimization process. In fact
this initialization process is dominating the overall perfor-
mance of tSNE. Even with a per-iteration visualization of
the intermediate results [10], [11], [12], [13] the initialization
time will force the user to wait minutes, or even hours,
before the first intermediate result can be generated on a
state-of-the-art desktop computer. Every modification of the
data, for example, the addition of data-points or a change in
the high-dimensional space, will force the user to wait for
the full reinitialization of the algorithm.
In this work, we present A-tSNE, a novel approach
to adapt the complete tSNE pipeline, including a distance
computation for the Progressive Visual Analytics paradigm.
Instead of precomputing precise distances, we propose to
approximate the distances using Approximated K-Nearest
Neighborhood queries. This allows us to start the compu-
tation of the iterative minimization nearly instantly after
loading the data. Based on the intermediate results of the
tSNE, the user can now start the interpretation process of the
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. -, NO. -, MONTH - 2
data immediately. Further, we modified the gradient descent
of tSNE such that it allows for the incorporation of updated
data during the iterative process. This change allows us to
continuously refine the approximated neighborhoods in the
background, triggering updates of the embedding without
restarting the optimization. Eventually, this process arrives
at the precise solution. Furthermore, we allow the user to
steer the level of approximation by selecting points of inter-
est, such as clusters, which appear in the very early stages
of the optimization and enable an interactive exploration of
the high-dimensional data.
Our contributions are as follows:
1) We present A-tSNE, a twofold evolution of the tSNE
algorithm, which
a) minimizes initialization time and as such
enables immediate inspection of preliminary
computation results.
b) allows for interactive modification, removal
or addition of high-dimensional data, with-
out disrupting the visual analysis process.
2) Using a set of standard benchmark data sets, we
show large computational performance improve-
ments of A-tSNE compared to the state of the art
while maintaining high precision.
3) We developed an interactive system for the visual
analysis of high dimensional data, allowing the user
to inspect and steer the level of approximation.
Finally, we illustrate the benefits of exploratory pos-
sibilities in a real-world research scenario and for
the real-time analysis of high-dimensional streams.
2 RELATED WORK
The tSNE [4] algorithm builds the foundation of this work.
As described above, tSNE is used for visualization of high-
dimensional data in a wide field of applications, from life
sciences to the analysis of deep-learning algorithms [6], [7],
[8], [9], [14], [15], [16]. tSNE is a non-linear dimensionality-
reduction algorithm that aims at preserving local structures
in the embedding, whilst showing global information, such
as the presence of clusters at several scales. Most of the user
tasks associated with the visualization of high-dimensional
data embeddings are based on identifying relationships
between data points. Typical tasks comprises the identi-
fication of visual clusters and their verification based on
detail visualization of the high-dimensional data, e.g., using
parallel coordinate plots. For a complete description of such
tasks we refer to Brehmer et al. [17].
tSNE’s computational and memory complexity is
O(N
2
), where N is the number of data-points, which con-
strains the application of the technique. An evolution of the
algorithm, called Barnes-Hut-SNE (BH-SNE) [5], reduces the
computational complexity to O(N log(N)) and the memory
complexity to O(N). This approach was also developed in
parallel by Yang et al. [18]. However, despite the increased
performance, it still cannot be used to interactively explore
the data in a desktop environment.
Interactive performance is at the center of the latest
developments in Visual Analytics. New analytical tools and
algorithms, which are able to trade accuracy for speed and
offer the possibility to interactively refine results [19], [20],
are needed to deal with the scalability issues of existing
analytics algorithms like tSNE. M
¨
uhlbacher et al. [11] de-
fined different strategies to increase the user involvement in
existing algorithms. They provide an in-depth analysis on
how the interconnection between the visualization and the
analytic modules can be achieved. Stolper et al. [10] defined
the term Progressive Visual Analytics, describing techniques
that allow the analyst to directly interact with the analytics
process. Visualization of intermediate results is used to help
the user, for example, to find optimal parameter settings or
filter the data [10]. For the design of our Progressive Visual
Analytics approach, we used the guidelines presented by
Stolper et al. [10], see section 4. Many algorithms are not
suited right away for Progressive Visual Analytics since the
production of intermediate results is computationally too
intensive or they do not generate useful intermediate results
at all. tSNE is an example of such an algorithm because of
its initialization process.
To overcome this problem, we propose to compute an
approximation of tSNE’s initialization stage, followed by a
user steerable [21] refinement of the level of approximation.
To compute the conditional probabilities needed by BH-
SNE, a K-Nearest Neighborhood (KNN) search must be
evaluated for each point in the high-dimensional space.
Under these conditions, a traditional algorithm and data
structure, such as a KD-Tree [22], will not perform well. In
the BH-SNE [5] algorithm, a Vantage-Point Tree [23] is used
for the KNN search, but it is slow to query. In this work, we
propose to use an approximated computation of the KNN
in the initialization stage to start the analysis as soon as
possible. The level of approximation is then refined on the
fly during the analytics process.
Other dimensionality-reduction algorithms implement
approximation and steerability to increase performance as
well. For example MDSteer [24] works on a subset of
the data and allows the user to control the insertion of
points by selecting areas in the reduced space. Yang et
al. [25] present a dimensionality-reduction technique using
a dissimilarity matrix as input. By means of a divide-
and-conquer approach, the computational complexity of
the algorithm can be reduced. Multiple other techniques
provide steerability by means of guiding the dimensionality
reduction via user input. Joja et al. [26] and Paulovich et
al. [27] let the user place a small number of control points.
In other work, Paulovich et al. [28], propose the use of a
non-linear dimensionality-reduction algorithm on a small
number of automatically-selected control points. For these
techniques the position of the data points is finally obtained
by linear-interpolation schemes that make use of the control
points. However, they all limit the non-linear dimensionality
reduction to a subset of the dataset limiting the insights that
can be obtained from the data. In this work, we provide a
way to directly use the complete data allowing the analyst
to immediately start the analysis on all data points.
Ingram and Munzner’s Q-SNE [29] is based on a similar
idea as our approach, using Approximated KNN queries
for the computation of the high-dimensional similarities.
However, they use the APQ algorithm [29] that is designed
to exploit the sparse structure of high-dimensional spaces
obtained from document collections, limiting its application

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. -, NO. -, MONTH - 3
to such a context. A-tSNE improves Q-SNE in the direction
of providing a fast but approximated algorithm for the
analysis of traditional dense high-dimensional spaces. For
this reason it can be used right away in contexts where
BH-SNE is applied and Q-SNE would not be applicable. A
further distinction is that A-tSNE incorporates the principles
of the Progressive Visual Analytics by means of providing
a visualization of the level of approximation, the ability to
refine the approximation based on user input, and allow-
ing the manipulation of the high-dimensional data without
waiting for the recomputation of the exact similarities.
Density-based visualization of the tSNE embedding has
been used in several works [5], [6], [9], however, they
employ slow-to-compute offline techniques. In our work,
we integrate real-time Kernel Density Estimation (KDE)
as described by Lampe and Hauser [30]. The interaction
with the embedding is important to allow the analyst to
explore the high-dimensional data. Selection operations in
the embedding and the visualization of the data in a coor-
dinated multiple-view system are necessary to enable this
exploration. The iVisClassifier system [31] is an example of
such a solution. In our work, we take a similar approach,
providing a coordinated multiple-view framework for the
visualization of a selection in the embedding.
3 TSNE
In this section, we provide a short introduction to tSNE [4],
which is necessary to explain our contribution. tSNE inter-
prets the overall distances between data-points in the high-
dimensional space as a symmetric joint-probability distribu-
tion P . Likewise a joint-probability distribution Q is com-
puted, that describes the similarity in the low-dimensional
space. The goal is to achieve a representation, referred to as
embedding, in the low dimensional space where Q faithfully
represents P . This is achieved by optimizing the positions
in the low-dimensional space to minimize the cost function
C given by the Kullback-Leibler (KL) divergence between
the joint-probability distributions P and Q:
C(P, Q) = KL(P ||Q) =
N
X
i=1
N
X
j=1,j6=i
p
ij
ln
p
ij
q
ij
(1)
Given two data points x
i
and x
j
in the dataset X =
{x
1
...x
N
}, the probability p
ij
models the similarity of these
points in the high-dimensional space. To this extent, for
each point a Gaussian kernel, P
i
, is chosen whose variance
σ
i
is defined according to the local density in the high-
dimensional space and then p
ij
is described as follows:
p
ij
=
p
i|j
+ p
j|i
2N
, (2)
where p
j|i
=
exp((||x
i
x
j
||
2
)/(2σ
2
i
))
P
N
k6=i
exp((||x
i
x
k
||
2
)/(2σ
2
i
))
(3)
p
j|i
can be seen as a relative measure of similarity based
on the local neighborhood of a data-point x
i
. The perplexity
value µ is a user-defined parameter that describes the ef-
fective number of neighbors considered for each data-point.
The value of σ
i
is chosen such that for fixed µ and each i:
µ = 2
P
N
j
p
j|i
log
2
p
j|i
(4)
A Student’s t-Distribution with one degree of freedom
is used to compute the joint-probability distribution in the
low-dimensional space Q, where the positions of the data-
points should be optimized. Given two low-dimensional
points y
i
and y
j
, the probability q
ij
that describes their
similarity is given by:
q
ij
=
(1 + ||y
i
y
j
||
2
)Z
1
(5)
with Z =
N
X
k=1
N
X
l6=k
(1 + ||y
k
y
l
||
2
)
1
(6)
The gradient of the Kullback-Leibler divergence between
P and Q is used to minimize C (see Eq. 1). It indicates the
change in position of the low-dimensional points for each
step of the gradient descent and is given by:
δC
δy
i
= 4
N
X
i=1
(F
attr
i
F
rep
i
) (7)
= 4
N
X
i=1
(
N
X
j6=i
p
ij
q
ij
Z(y
i
y
j
)
N
X
j6=i
q
2
ij
Z(y
i
y
j
)) (8)
The gradient descent can be seen as a N-body simula-
tion [32], where each data-point exerts an attractive and a
repulsive force on all the other points (F
attr
i
and F
rep
i
).
3.1 Barnes-Hut-SNE
In the original tSNE, the force is computed using a brute-
force approach, resulting in computational and memory
complexity of O(N
2
). Barnes-Hut-SNE (BH-SNE) [5] is an
evolution of the tSNE algorithm that introduces two differ-
ent approximations to reduce the computational complexity
to O(N log(N)) and the memory complexity to O(N).
The first approximation is based on the observation
that the probability p
ij
is infinitesimal if x
i
and x
j
are
dissimilar. Therefore, the similarities of a data-point x
i
can
be computed taking into account only the points that belong
to the set of nearest neighbors N
i
. The cardinality of N
i
can
be set to K = b3µc, where µ is the user-selected perplexity
and b·c describes a rounding to the next-lower integer.
Without compromising the quality of the embedding [5], we
can adopt a sparse approximation of the high-dimensional
similarities. Eq. 3 can now be written as follows:
p
j|i
=
exp((||x
i
x
j
||
2
)/(2σ
2
i
))
P
k∈N
i
exp((||x
i
x
k
||
2
)/(2σ
2
i
))
if j N
i
0 otherwise
(9)
The computation of the K-Nearest Neighbors is per-
formed using a Vantage-Point Tree (VP-Tree) [23]. A VP-
Tree is data structure that computes KNN queries in a
high-dimensional metric space, in O(log(N)) time. It is a
binary tree that stores for each non leaf-node a hyper-sphere
centered on a data-point. The left children of each node
will contain the points that reside inside the hyper-sphere,
whereas the right one will contain the points outside it.

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. -, NO. -, MONTH - 4
(a) Progressive Visual Analytics workflow for tSNE.
(b) Progressive Visual Analytics workflow for A-tSNE.
Fig. 1. Comparison between the traditional and our tSNE workflow.
The eye icon marks modules which produce output for visualization,
whereas the hand icon marks modules that allow manipulation by the
user. The increased performance of the similarity computation allows the
user to seamlessly manipulate the input data. The level of approximation
can be visualized and the user can steer the refinement process to
interesting regions.
The second approximation makes use of the formulation
of the gradient presented in Eq. 7. As described above tSNE
can be seen as an N-body simulation and thus the Barnes-
Hut algorithm [33] can be used to reduce the computational
complexity to O(N log(N)). For further details, we refer to
van der Maaten [5].
4 A-TSNE IN PROGRESSIVE VISUAL ANALYTICS
In this work, we introduce Approximated-tSNE (A-tSNE),
an evolution of the BH-SNE algorithm, using approximated
computations of high-dimensional similarities to generate
meaningful intermediate results. The level of approximation
can be defined by the user to allow control on the trade
off between speed and quality. The level of approximation
can be refined by the analyst in interesting regions of
the embedding, making A-tSNE a computational steerable
algorithm [21]. tSNE is well suited for the application in
Progressive Visual Analytics: after the initialization of the
algorithm, the intermediate results generated during the
iterative optimization process can be interpreted by the
analyst while they change over time, as shown in previous
work [11], [12]. Fig. 1a shows a typical Progressive Visual
Analytics workflow for tSNE.
Algorithms that can be used in a Progressive Visual An-
alytics system often have a computational module, e.g. the
initialization of the technique, that cannot be implemented
in an iterative way, creating a speed bump [10] in the user
analysis. tSNE is a good example for such an algorithm. It
consists of two computational modules that are serialized.
In the first part of the algorithm, similarities between high-
dimensional points are calculated. In the second module, a
minimization of the cost function (Eq. 1) is computed by
means of a gradient descent. The first module, depicted in
light grey in Fig. 1a, is slow to compute and does not create
any meaningful intermediate results.
We extend the Progressive Visual Analytics paradigm by
introducing approximated computation rather than aiming
at exact computations, in the modules that are not suited
for a per-iteration visualization. Fig. 1b shows the analytical
workflow for A-tSNE. While the generation and the inspec-
tion of the intermediate results is not changed, we introduce
a refinement module, depicted in red in Fig. 1b, which
can be used to refine the level of the approximation in the
embedding in a concurrent way. Furthermore, the increased
performance of the initialization module and the ability to
update the high-dimensional similarities during the gradi-
ent descent minimization, allows the analyst to manipulate
the high-dimensional data without waiting for the reinitial-
ization of the algorithm. We follow the guideline proposed
by Stolper et al. [10], focusing on providing increasingly
meaningful partial results during the minimization process
(purple modules in Fig. 1). Furthermore, we impose the
following requirements to the modules that compute the
approximated similarities (grey and red modules in Fig. 1):
1) The performance gain due to the approximation
must be high enough to enable interaction.
2) The amount of degradation caused by the approx-
imation must be controllable. A small increase of
approximation must not lead to large degradation
of the results.
3) The approximation quality can be measured and
visualized to avoid misleading the user.
4) The approximation can be refined during the evolu-
tion. The refinement can be steered by the user.
In the following Sections 4.1 to 4.4, we describe the A-
tSNE algorithm in detail using the MNIST [34] dataset for
illustration. The dataset consists of 60k labeled gray scale
images of handwritten digits (compare Fig. 2a). Each image
is represented as a 784 dimensional vector, corresponding to
the gray values of the pixels in the image.
4.1 A-tSNE
A-tSNE improves the BH-SNE algorithm, by using fast
and Approximated KNN computations to build the ap-
proximated high-dimensional joint-probability distribution
P
A
, instead of the exact distribution P . The cost function
C(P
A
, Q
A
) is then minimized in order to obtain the ap-
proximated embedding described by Q
A
.
The similarity between points can be computed using
the set of approximated neighbors N
A
i
, instead of the exact
neighborhood N
i
(see Eq. 9). We define the precision of the
KNN algorithm as ρ. ρ describes the average percentage of
points in the approximated neighborhood N
A
i
that belongs
to the exact neighborhood N
i
:
ρ =
N
X
i=1
ρ
i
N
ρ
k
=
|N
A
k
N
k
|
|N
k
|
, (10)
where | · | indicates the cardinality of the neighborhood.
The cardinality of N
k
is indirectly specified by the user

Figures
Citations
More filters
Journal ArticleDOI

Guidelines for the use of flow cytometry and cell sorting in immunological studies (second edition)

Andrea Cossarizza, +462 more
TL;DR: These guidelines are a consensus work of a considerable number of members of the immunology and flow cytometry community providing the theory and key practical aspects offlow cytometry enabling immunologists to avoid the common errors that often undermine immunological data.
Journal ArticleDOI

The art of using t-SNE for single-cell transcriptomics.

TL;DR: A protocol is introduced to help avoid common shortcomings of t-SNE, for example, enabling preservation of the global structure of the data.
Journal ArticleDOI

Towards better analysis of machine learning models: A visual analytics perspective

TL;DR: In this article, the authors classify the relevant work into three categories: understanding, diagnosis, and refinement, exemplified by recent influential work. And they present a comprehensive analysis and interpretation of this rapidly developing area.
Journal ArticleDOI

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.

TL;DR: In this article, an automated toolkit for t-SNE parameter selection was developed that utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner.
Journal ArticleDOI

The Role of Uncertainty, Awareness, and Trust in Visual Analytics

TL;DR: This paper unpacks the uncertainties that propagate through visual analytics systems, illustrates how human's perceptual and cognitive biases influence the user's awareness of such uncertainties, and how this affects the users' trust building.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Dissertation

Learning Multiple Layers of Features from Tiny Images

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.
Journal ArticleDOI

Graph drawing by force-directed placement

TL;DR: A modification of the spring‐embedder model of Eades for drawing undirected graphs with straight edges is presented, developed in analogy to forces in natural systems, for a simple, elegant, conceptually‐intuitive, and efficient algorithm.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Approximated and user steerable tsne for progressive visual analytics" ?

The authors introduce a controllable tSNE approximation ( A-tSNE ), which trades off speed and accuracy, to enable interactive data exploration. The authors demonstrate their technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis. 

In the future the authors want to explore the application of A-tSNE in other research scenarios. 

The reason being that the minimization in tSNE builds up on the iterative gradient descent technique [4] and can therefore be used directly for a per-iteration visualization, as well as interaction with the intermediate results. 

In this work, the authors use a space partitioning technique called Forest of Randomized KD-Trees [38] to compute the approximated neighborhoods. 

A geometry shader is used to generate a quad for each point that is colored using the precomputed texture, the KDE is obtained by drawing into a Frame Buffer Object using an additive blending [30]. 

Even with a per-iteration visualization of the intermediate results [10], [11], [12], [13] the initialization time will force the user to wait minutes, or even hours, before the first intermediate result can be generated on a state-of-the-art desktop computer. 

To make sure the clusters are not an artifact introduced by the approximated similarities, the user refines the selected data-points while the embedding evolves. 

the authors impose the following requirements to the modules that compute the approximated similarities (grey and red modules in Fig. 1):1) The performance gain due to the approximation must be high enough to enable interaction. 

In such a setting it is crucial to allow an interactive feedback loop, between modeling the data (i.e., finding the right number of dimensions for the PCA before embedding) and visualizing the data. 

The authors propose three different strategies that are used to select the data points to be refined: user selection, breadth-first search and density-based refinement. 

A naive strategy to refine the embedding, is to progressively update the neighborhoods of all the points in X , while the gradient descent optimization is computed. 

Liu et al. [47] demonstrate that, when dealing with real-time data, the response time of the algorithm is of great importance to the user. 

With such a parameterization, A-tSNE computes the high-dimensional similarities in ≈ 51 seconds while 3 hours and 50 minutes are required by BH-SNE. 

reasonable results can be achieved even with low precision, because each data point is usually connected to a large number of springs and, therefore, the overall structure can be preserved.