Approximated and User Steerable tSNE for Progressive Visual Analytics
read more
Citations
Guidelines for the use of flow cytometry and cell sorting in immunological studies (second edition)
The art of using t-SNE for single-cell transcriptomics.
Towards better analysis of machine learning models: A visual analytics perspective
Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.
The Role of Uncertainty, Awareness, and Trust in Visual Analytics
References
Visualizing Data using t-SNE
Human-level control through deep reinforcement learning
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Learning Multiple Layers of Features from Tiny Images
Graph drawing by force-directed placement
Related Papers (5)
A global geometric framework for nonlinear dimensionality reduction.
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Frequently Asked Questions (14)
Q2. What have the authors stated for future works in "Approximated and user steerable tsne for progressive visual analytics" ?
In the future the authors want to explore the application of A-tSNE in other research scenarios.
Q3. Why does tSNE build up on the iterative gradient descent technique?
The reason being that the minimization in tSNE builds up on the iterative gradient descent technique [4] and can therefore be used directly for a per-iteration visualization, as well as interaction with the intermediate results.
Q4. How does the algorithm compute the approximated neighborhoods?
In this work, the authors use a space partitioning technique called Forest of Randomized KD-Trees [38] to compute the approximated neighborhoods.
Q5. How is the geometry shader used to generate a quad for each point?
A geometry shader is used to generate a quad for each point that is colored using the precomputed texture, the KDE is obtained by drawing into a Frame Buffer Object using an additive blending [30].
Q6. Why does the user have to wait for the first intermediate result to be generated?
Even with a per-iteration visualization of the intermediate results [10], [11], [12], [13] the initialization time will force the user to wait minutes, or even hours, before the first intermediate result can be generated on a state-of-the-art desktop computer.
Q7. How does the user make sure the clusters are not an artifact?
To make sure the clusters are not an artifact introduced by the approximated similarities, the user refines the selected data-points while the embedding evolves.
Q8. What are the requirements for the module that computes the approximated similarities?
the authors impose the following requirements to the modules that compute the approximated similarities (grey and red modules in Fig. 1):1) The performance gain due to the approximation must be high enough to enable interaction.
Q9. What is the importance of allowing an interactive feedback loop?
In such a setting it is crucial to allow an interactive feedback loop, between modeling the data (i.e., finding the right number of dimensions for the PCA before embedding) and visualizing the data.
Q10. What are the three strategies used to select the data points to be refined?
The authors propose three different strategies that are used to select the data points to be refined: user selection, breadth-first search and density-based refinement.
Q11. What is the strategy to refine the embedding?
A naive strategy to refine the embedding, is to progressively update the neighborhoods of all the points in X , while the gradient descent optimization is computed.
Q12. What is the significance of the algorithm when dealing with real-time data?
Liu et al. [47] demonstrate that, when dealing with real-time data, the response time of the algorithm is of great importance to the user.
Q13. How long does it take to compute the high-dimensional similarities?
With such a parameterization, A-tSNE computes the high-dimensional similarities in ≈ 51 seconds while 3 hours and 50 minutes are required by BH-SNE.
Q14. How can the BH-SNE algorithm be used to preserve the structure of a data?
reasonable results can be achieved even with low precision, because each data point is usually connected to a large number of springs and, therefore, the overall structure can be preserved.