scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Visualization and Computer Graphics in 2019"


Journal ArticleDOI
TL;DR: A survey of the role of visual analytics in deep learning research is presented, which highlights its short yet impactful history and thoroughly summarizes the state-of-the-art using a human-centered interrogative framework, focusing on the Five W's and How.
Abstract: Deep learning has recently seen rapid development and received significant attention due to its state-of-the-art performance on previously-thought hard problems. However, because of the internal complexity and nonlinear structure of deep neural networks, the underlying decision making processes for why these models are achieving such performance are challenging and sometimes mystifying to interpret. As deep learning spreads across domains, it is of paramount importance that we equip users of deep learning with tools for understanding when a model works correctly, when it fails, and ultimately how to improve its performance. Standardized toolkits for building neural networks have helped democratize deep learning; visual analytics systems have now been developed to support model explanation, interpretation, debugging, and improvement. We present a survey of the role of visual analytics in deep learning research, which highlights its short yet impactful history and thoroughly summarizes the state-of-the-art using a human-centered interrogative framework, focusing on the Five W's and How (Why, Who, What, How, When, and Where). We conclude by highlighting research directions and open research problems. This survey helps researchers and practitioners in both visual analytics and deep learning to quickly learn key aspects of this young and rapidly growing body of research, whose impact spans a diverse range of domains.

483 citations


Journal ArticleDOI
TL;DR: This work proposes modeling visualization design knowledge as a collection of constraints, in conjunction with a method to learn weights for soft constraints from experimental data, which can take theoretical design knowledge and express it in a concrete, extensible, and testable form.
Abstract: There exists a gap between visualization design guidelines and their application in visualization tools. While empirical studies can provide design guidance, we lack a formal framework for representing design knowledge, integrating results across studies, and applying this knowledge in automated design tools that promote effective encodings and facilitate visual exploration. We propose modeling visualization design knowledge as a collection of constraints, in conjunction with a method to learn weights for soft constraints from experimental data. Using constraints, we can take theoretical design knowledge and express it in a concrete, extensible, and testable form: the resulting models can recommend visualization designs and can easily be augmented with additional constraints or updated weights. We implement our approach in Draco, a constraint-based system based on Answer Set Programming (ASP). We demonstrate how to construct increasingly sophisticated automated visualization design systems, including systems based on weights learned directly from the results of graphical perception experiments.

238 citations


Journal ArticleDOI
TL;DR: This paper presented a visual analysis tool that allows interaction and "what if"-style exploration of trained sequence-to-sequence models through each stage of the translation process, identifying which patterns have been learned, detecting model errors, and probing the model with counterfactual scenario.
Abstract: Neural sequence-to-sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work with a five-stage blackbox pipeline that begins with encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction and “what if”-style exploration of trained sequence-to-sequence models through each stage of the translation process. The aim is to identify which patterns have been learned, to detect model errors, and to probe the model with counterfactual scenario. We demonstrate the utility of our tool through several real-world sequence-to-sequence use cases on large-scale models.

189 citations


Journal ArticleDOI
TL;DR: This survey provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors and providing future research axes to fill discovered gaps in this domain.
Abstract: Visual analysis of multidimensional data requires expressive and effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots whose visual patterns reflect some notion of similarity in the original data. However, MDP come with distortions that make these visual patterns not trustworthy, hindering users to infer actual data characteristics. Moreover, the patterns present in the scatter plots might not be enough to allow a clear understanding of multidimensional data, motivating the development of layout enrichment methodologies to operate together with MDP. This survey attempts to cover the main aspects of MDP as a visualization and visual analytic tool. It provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors. The survey also approaches the different types of distortions that can result from MDP mappings and it overviews existing mechanisms to quantitatively evaluate such distortions. A qualitative analysis of the impact of distortions on the different analytic tasks performed by users when exploring multidimensional data through MDP is also presented. Guidelines for choosing the best MDP for an intended task are also provided as a result of this analysis. Finally, layout enrichment schemes to debunk MDP distortions and/or reveal relevant information not directly inferable from the scatter plot are reviewed and discussed in the light of new taxonomies. We conclude the survey providing future research axes to fill discovered gaps in this domain.

185 citations


Journal ArticleDOI
TL;DR: This study designs a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers, and demonstrates how it made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity.
Abstract: We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular prediction. Such black-box nature of RNNs can impede its wide adoption in clinical practice. Furthermore, we have no established methods to interactively leverage users' domain expertise and prior knowledge as inputs for steering the model. Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers. Following the iterative design process between the experts, we design, implement, and evaluate a visual analytics tool called RetainVis, which couples a newly improved, interpretable, and interactive RNN-based model called RetainEX and visualizations for users' exploration of EMR data in the context of prediction tasks. Our study shows the effective use of RetainVis for gaining insights into how individual medical codes contribute to making risk predictions, using EMRs of patients with heart failure and cataract symptoms. Our study also demonstrates how we made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity. This study will provide a useful guideline for researchers that aim to design an interpretable and interactive visual analytics tool for RNNs.

185 citations


Journal ArticleDOI
TL;DR: The broad scope of how dashboards are used in practice is looked at through an analysis of dashboard examples and documentation about their use, and a framework and literature review are suggested to better support dashboard design, implementation, and use.
Abstract: Dashboards are one of the most common use cases for data visualization, and their design and contexts of use are considerably different from exploratory visualization tools. In this paper, we look at the broad scope of how dashboards are used in practice through an analysis of dashboard examples and documentation about their use. We systematically review the literature surrounding dashboard use, construct a design space for dashboards, and identify major dashboard types. We characterize dashboards by their design goals, levels of interaction, and the practices around them. Our framework and literature review suggest a number of fruitful research directions to better support dashboard design, implementation, and use.

182 citations


Journal ArticleDOI
Jiawei Zhang1, Yang Wang2, Piero Molino2, Lezhi Li2, David S. Ebert1 
TL;DR: Manifold is presented, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner and is designed as a generic framework.
Abstract: Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models' outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied.

171 citations


Journal ArticleDOI
TL;DR: RuleMatrix is designed, a matrix-based visualization of rules to help users navigate and verify the rules and the black-box model, and is evaluated via two use cases and a usability study.
Abstract: With the growing adoption of machine learning techniques, there is a surge of research interest towards making machine learning systems more transparent and interpretable Various visualizations have been developed to help model developers understand, diagnose, and refine machine learning models However, a large number of potential but neglected users are the domain experts with little knowledge of machine learning but are expected to work with machine learning systems In this paper, we present an interactive visualization technique to help users with little expertise in machine learning to understand, explore and validate predictive models By viewing the model as a black box, we extract a standardized rule-based knowledge representation from its input-output behavior Then, we design RuleMatrix, a matrix-based visualization of rules to help users navigate and verify the rules and the black-box model We evaluate the effectiveness of RuleMatrix via two use cases and a usability study

146 citations


Journal ArticleDOI
TL;DR: Voder is presented, a system that lets users interact with automatically-generated data facts to explore both alternative visualizations to convey a data fact as well as a set of embellishments to highlight a fact within a visualization.
Abstract: Recently, an increasing number of visualization systems have begun to incorporate natural language generation (NLG) capabilities into their interfaces. NLG-based visualization systems typically leverage a suite of statistical functions to automatically extract key facts about the underlying data and surface them as natural language sentences alongside visualizations. With current systems, users are typically required to read the system-generated sentences and mentally map them back to the accompanying visualization. However, depending on the features of the visualization (e.g., visualization type, data density) and the complexity of the data fact, mentally mapping facts to visualizations can be a challenging task. Furthermore, more than one visualization could be used to illustrate a single data fact. Unfortunately, current tools provide little or no support for users to explore such alternatives. In this paper, we explore how system-generated data facts can be treated as interactive widgets to help users interpret visualizations and communicate their findings. We present Voder , a system that lets users interact with automatically-generated data facts to explore both alternative visualizations to convey a data fact as well as a set of embellishments to highlight a fact within a visualization. Leveraging data facts as interactive widgets, Voder also facilitates data fact-based visualization search. To assess Voder's design and features, we conducted a preliminary user study with 12 participants having varying levels of experience with visualization tools. Participant feedback suggested that interactive data facts aided them in interpreting visualizations. Participants also stated that the suggestions surfaced through the facts helped them explore alternative visualizations and embellishments to communicate individual data facts.

132 citations


Journal ArticleDOI
TL;DR: GAN Lab is presented, the first interactive visualization tool designed for non-experts to learn and experiment with Generative Adversarial Networks (GANs), a popular class of complex deep learning models.
Abstract: Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology. While visual and interactive approaches have been successfully developed to help people more easily learn deep learning, most existing tools focus on simpler models. In this work, we present GAN Lab, the first interactive visualization tool designed for non-experts to learn and experiment with Generative Adversarial Networks (GANs), a popular class of complex deep learning models. With GAN Lab, users can interactively train generative models and visualize the dynamic training process's intermediate results. GAN Lab tightly integrates an model overview graph that summarizes GAN's structure, and a layered distributions view that helps users interpret the interplay between submodels. GAN Lab introduces new interactive experimentation features for learning complex deep learning models, such as step-by-step training at multiple levels of abstraction for understanding intricate training dynamics. Implemented using TensorFlow.js , GAN Lab is accessible to anyone via modern web browsers, without the need for installation or specialized hardware, overcoming a major practical challenge in deploying interactive tools for deep learning.

130 citations


Journal ArticleDOI
TL;DR: In this paper, a visual analytic system aiming at interpreting random forest models and predictions is proposed, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation.
Abstract: As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. The ensemble nature of the model helps random forests outperform any individual decision tree. However, it also leads to a poor model interpretability, which significantly hinders the model from being used in fields that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. The interpretation challenges stem from the variety and complexity of the contained decision trees. Each decision tree has its unique structure and properties, such as the features used in the tree and the feature threshold in each tree node. Thus, a data input may lead to a variety of decision paths. To understand how a final prediction is achieved, it is desired to understand and compare all decision paths in the context of all tree structures, which is a huge challenge for any users. In this paper, we propose a visual analytic system aiming at interpreting random forest models and predictions. In addition to providing users with all the tree information, we summarize the decision paths in random forests, which eventually reflects the working mechanism of the model and reduces users' mental burden of interpretation. To demonstrate the effectiveness of our system, two usage scenarios and a qualitative user study are conducted.

Journal ArticleDOI
TL;DR: DXR, a toolkit for building immersive data visualizations based on the Unity development platform, is presented, which can efficiently specify visualization designs using a concise declarative visualization grammar inspired by Vega-Lite.
Abstract: This paper presents DXR, a toolkit for building immersive data visualizations based on the Unity development platform. Over the past years, immersive data visualizations in augmented and virtual reality (AR, VR) have been emerging as a promising medium for data sense-making beyond the desktop. However, creating immersive visualizations remains challenging, and often require complex low-level programming and tedious manual encoding of data attributes to geometric and visual properties. These can hinder the iterative idea-to-prototype process, especially for developers without experience in 3D graphics, AR, and VR programming. With DXR, developers can efficiently specify visualization designs using a concise declarative visualization grammar inspired by Vega-Lite. DXR further provides a GUI for easy and quick edits and previews of visualization designs in-situ, i.e., while immersed in the virtual world. DXR also provides reusable templates and customizable graphical marks, enabling unique and engaging visualizations. We demonstrate the flexibility of DXR through several examples spanning a wide range of applications.

Journal ArticleDOI
TL;DR: This paper study ensemble visualization works in the recent decade, and categorize them from two perspectives: (1) their proposed visualization techniques; and (2) their involved analytic tasks.
Abstract: Over the last decade, ensemble visualization has witnessed a significant development due to the wide availability of ensemble data, and the increasing visualization needs from a variety of disciplines. From the data analysis point of view, it can be observed that many ensemble visualization works focus on the same facet of ensemble data, use similar data aggregation or uncertainty modeling methods. However, the lack of reflections on those essential commonalities and a systematic overview of those works prevents visualization researchers from effectively identifying new or unsolved problems and planning for further developments. In this paper, we take a holistic perspective and provide a survey of ensemble visualization. Specifically, we study ensemble visualization works in the recent decade, and categorize them from two perspectives: (1) their proposed visualization techniques; and (2) their involved analytic tasks. For the first perspective, we focus on elaborating how conventional visualization techniques (e.g., surface, volume visualization techniques) have been adapted to ensemble data; for the second perspective, we emphasize how analytic tasks (e.g., comparison, clustering) have been performed differently for ensemble data. From the study of ensemble visualization literature, we have also identified several research trends, as well as some future research opportunities.

Journal ArticleDOI
TL;DR: This survey reviews information visualization approaches to digital cultural heritage collections and reflects on the state of the art in techniques and design choices, and contextualizes the survey with humanist perspectives on the field and point out opportunities for future research.
Abstract: After decades of digitization, large cultural heritage collections have emerged on the web, which contain massive stocks of content from galleries, libraries, archives, and museums. This increase in digital cultural heritage data promises new modes of analysis and increased levels of access for academic scholars and casual users alike. Going beyond the standard representations of search-centric and grid-based interfaces, a multitude of approaches has recently started to enable visual access to cultural collections, and to explore them as complex and comprehensive information spaces by the means of interactive visualizations. In contrast to conventional web interfaces, we witness a widening spectrum of innovative visualization types specially designed for rich collections from the cultural heritage sector. This new class of information visualizations gives rise to a notable diversity of interaction and representation techniques while lending currency and urgency to a discussion about principles such as serendipity, generosity, and criticality in connection with visualization design. With this survey, we review information visualization approaches to digital cultural heritage collections and reflect on the state of the art in techniques and design choices. We contextualize our survey with humanist perspectives on the field and point out opportunities for future research.

Journal ArticleDOI
TL;DR: A taxonomy for characterizing decisions made in designing an evaluation of an uncertainty visualization is presented and a concrete set of recommendations for evaluators designed to reduce the mismatch between the conceptualization of uncertainty in visualization versus other fields is concluded.
Abstract: Understanding and accounting for uncertainty is critical to effectively reasoning about visualized data. However, evaluating the impact of an uncertainty visualization is complex due to the difficulties that people have interpreting uncertainty and the challenge of defining correct behavior with uncertainty information. Currently, evaluators of uncertainty visualization must rely on general purpose visualization evaluation frameworks which can be ill-equipped to provide guidance with the unique difficulties of assessing judgments under uncertainty. To help evaluators navigate these complexities, we present a taxonomy for characterizing decisions made in designing an evaluation of an uncertainty visualization. Our taxonomy differentiates six levels of decisions that comprise an uncertainty visualization evaluation: the behavioral targets of the study, expected effects from an uncertainty visualization, evaluation goals, measures, elicitation techniques, and analysis approaches. Applying our taxonomy to 86 user studies of uncertainty visualizations, we find that existing evaluation practice, particularly in visualization research, focuses on Performance and Satisfaction-based measures that assume more predictable and statistically-driven judgment behavior than is suggested by research on human judgment and decision making. We reflect on common themes in evaluation practice concerning the interpretation and semantics of uncertainty, the use of confidence reporting, and a bias toward evaluating performance as accuracy rather than decision quality. We conclude with a concrete set of recommendations for evaluators designed to reduce the mismatch between the conceptualization of uncertainty in visualization versus other fields.

Journal ArticleDOI
TL;DR: The accuracy of the mesh labeling method exceeds that of the state-of-art geometry-based methods, and is also robust to any possible foreign matters on model surface, e.g., air bubbles, dental accessories, and many more.
Abstract: In this paper, we present a novel approach for 3D dental model segmentation via deep Convolutional Neural Networks (CNNs). Traditional geometry-based methods tend to receive undesirable results due to the complex appearance of human teeth (e.g., missing/rotten teeth, feature-less regions, crowding teeth, extra medical attachments, etc.). Furthermore, labeling of individual tooth is hardly enabled in traditional tooth segmentation methods. To address these issues, we propose to learn a generic and robust segmentation model by exploiting deep Neural Networks, namely NNs. The segmentation task is achieved by labeling each mesh face. We extract a set of geometry features as face feature representations. In the training step, the network is fed with those features, and produces a probability vector, of which each element indicates the probability a face belonging to the corresponding model part. To this end, we extensively experiment with various network structures, and eventually arrive at a 2-level hierarchical CNNs structure for tooth segmentation: one for teeth-gingiva labeling and the other for inter-teeth labeling. Further, we propose a novel boundary-aware tooth simplification method to significantly improve efficiency in the stage of feature extraction. After CNNs prediction, we do graph-based label optimization and further refine the boundary with an improved version of fuzzy clustering. The accuracy of our mesh labeling method exceeds that of the state-of-art geometry-based methods, reaching 99.06 percent measured by area which is directly applicable in orthodontic CAD systems. It is also robust to any possible foreign matters on model surface, e.g., air bubbles, dental accessories, and many more.

Journal ArticleDOI
TL;DR: This work proposes DQNViz, a visual analytics system to expose details of the blind training process in four levels, and enable users to dive into the large experience space of the DQN agent for comprehensive analysis, and demonstrates that it can effectively help domain experts to understand, diagnose, and potentially improve D QN models.
Abstract: Deep Q-Network (DQN), as one type of deep reinforcement learning model, targets to train an intelligent agent that acquires optimal actions while interacting with an environment. The model is well known for its ability to surpass professional human players across many Atari 2600 games. Despite the superhuman performance, in-depth understanding of the model and interpreting the sophisticated behaviors of the DQN agent remain to be challenging tasks, due to the long-time model training process and the large number of experiences dynamically generated by the agent. In this work, we propose DQNViz , a visual analytics system to expose details of the blind training process in four levels, and enable users to dive into the large experience space of the agent for comprehensive analysis. As an initial attempt in visualizing DQN models, our work focuses more on Atari games with a simple action space, most notably the Breakout game. From our visual analytics of the agent's experiences, we extract useful action/reward patterns that help to interpret the model and control the training. Through multiple case studies conducted together with deep learning experts, we demonstrate that DQNViz can effectively help domain experts to understand, diagnose, and potentially improve DQN models.

Journal ArticleDOI
TL;DR: In this article, the authors report results from a crowdsourced experiment to evaluate the effectiveness of five small scale (5-34 data points) two-dimensional visualization types (Table, Line Chart, Bar Chart, Scatterplot, and Pie Chart) across ten common data analysis tasks using two datasets.
Abstract: Visualizations of tabular data are widely used; understanding their effectiveness in different task and data contexts is fundamental to scaling their impact. However, little is known about how basic tabular data visualizations perform across varying data analysis tasks. In this paper, we report results from a crowdsourced experiment to evaluate the effectiveness of five small scale (5-34 data points) two-dimensional visualization types—Table, Line Chart, Bar Chart, Scatterplot, and Pie Chart—across ten common data analysis tasks using two datasets. We find the effectiveness of these visualization types significantly varies across task, suggesting that visualization design would benefit from considering context-dependent effectiveness. Based on our findings, we derive recommendations on which visualizations to choose based on different tasks. We finally train a decision tree on the data we collected to drive a recommender, showcasing how to effectively engineer experimental user data into practical visualization systems.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed the first real-time system for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on single cap-mounted fisheye camera.
Abstract: We propose the first real-time system for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera. From the captured egocentric live stream, our CNN based 3D pose estimation approach runs at 60 Hz on a consumer-level GPU. In addition to the lightweight hardware setup, our other main contributions are: 1) a large ground truth training corpus of top-down fisheye images and 2) a disentangled 3D pose estimation approach that takes the unique properties of the egocentric viewpoint into account. As shown by our evaluation, we achieve lower 3D joint error as well as better 2D overlay than the existing baselines.

Journal ArticleDOI
TL;DR: Charticulator is an interactive authoring tool that enables the creation of bespoke and reusable chart layouts and transforms a chart specification into mathematical layout constraints and automatically computes a set of layout attributes using a constraint-solving algorithm to realize the chart.
Abstract: We present Charticulator , an interactive authoring tool that enables the creation of bespoke and reusable chart layouts. Charticulator is our response to most existing chart construction interfaces that require authors to choose from predefined chart layouts, thereby precluding the construction of novel charts. In contrast, Charticulator transforms a chart specification into mathematical layout constraints and automatically computes a set of layout attributes using a constraint-solving algorithm to realize the chart. It allows for the articulation of compound marks or glyphs as well as links between these glyphs, all without requiring any coding or knowledge of constraint satisfaction. Furthermore, thanks to the constraint-based layout approach, Charticulator can export chart designs into reusable templates that can be imported into other visualization tools. In addition to describing Charticulator's conceptual framework and design, we present three forms of evaluation: a gallery to illustrate its expressiveness, a user study to verify its usability, and a click-count comparison between Charticulator and three existing tools. Finally, we discuss the limitations and potentials of Charticulator as well as directions for future research. Charticulator is available with its source code at https://charticulator.com .

Journal ArticleDOI
TL;DR: This work identifies and motivate an appropriate task to investigate realistic judgments of uncertainty in the public domain through a qualitative analysis of uncertainty visualizations in the news, and contributes two crowdsourced experiments comparing the effectiveness of HOPs, error bars, and line ensembles for supporting perceptual decision-making from visualized uncertainty.
Abstract: Animated representations of outcomes drawn from distributions (hypothetical outcome plots, or HOPs) are used in the media and other public venues to communicate uncertainty. HOPs greatly improve multivariate probability estimation over conventional static uncertainty visualizations and leverage the ability of the visual system to quickly, accurately, and automatically process the summary statistical properties of ensembles. However, it is unclear how well HOPs support applied tasks resembling real world judgments posed in uncertainty communication. We identify and motivate an appropriate task to investigate realistic judgments of uncertainty in the public domain through a qualitative analysis of uncertainty visualizations in the news. We contribute two crowdsourced experiments comparing the effectiveness of HOPs, error bars, and line ensembles for supporting perceptual decision-making from visualized uncertainty. Participants infer which of two possible underlying trends is more likely to have produced a sample of time series data by referencing uncertainty visualizations which depict the two trends with variability due to sampling error. By modeling each participant's accuracy as a function of the level of evidence presented over many repeated judgments, we find that observers are able to correctly infer the underlying trend in samples conveying a lower level of evidence when using HOPs rather than static aggregate uncertainty visualizations as a decision aid. Modeling approaches like ours contribute theoretically grounded and richly descriptive accounts of user perceptions to visualization evaluation.

Journal ArticleDOI
TL;DR: This paper reinterprets the traditional VA pipeline to encompass model-development workflows and introduces necessary definitions, rules, syntaxes, and visual notations for formulating VIS4ML and makes use of semantic web technologies for implementing it in the Web Ontology Language (OWL).
Abstract: While many VA workflows make use of machine-learned models to support analytical tasks, VA workflows have become increasingly important in understanding and improving Machine Learning (ML) processes. In this paper, we propose an ontology (VIS4ML) for a subarea of VA, namely “VA-assisted ML”. The purpose of VIS4ML is to describe and understand existing VA workflows used in ML as well as to detect gaps in ML processes and the potential of introducing advanced VA techniques to such processes. Ontologies have been widely used to map out the scope of a topic in biology, medicine, and many other disciplines. We adopt the scholarly methodologies for constructing VIS4ML, including the specification, conceptualization, formalization, implementation, and validation of ontologies. In particular, we reinterpret the traditional VA pipeline to encompass model-development workflows. We introduce necessary definitions, rules, syntaxes, and visual notations for formulating VIS4ML and make use of semantic web technologies for implementing it in the Web Ontology Language (OWL). VIS4ML captures the high-level knowledge about previous workflows where VA is used to assist in ML. It is consistent with the established VA concepts and will continue to evolve along with the future developments in VA and ML. While this ontology is an effort for building the theoretical foundation of VA, it can be used by practitioners in real-world applications to optimize model-development workflows by systematically examining the potential benefits that can be brought about by either machine or human capabilities. Meanwhile, VIS4ML is intended to be extensible and will continue to be updated to reflect future advancements in using VA for building high-quality data-analytical models or for building such models rapidly.

Journal ArticleDOI
TL;DR: This study focuses on participants' descriptions of exploratory activities and tool usage in these activities, providing guidelines for future tool development, as well as a better understanding of the meaning of the term “data exploration” based on the words of practitioners “in the wild.”
Abstract: We report the results of interviewing thirty professional data analysts working in a range of industrial, academic, and regulatory environments. This study focuses on participants' descriptions of exploratory activities and tool usage in these activities. Highlights of the findings include: distinctions between exploration as a precursor to more directed analysis versus truly open-ended exploration; confirmation that some analysts see “finding something interesting” as a valid goal of data exploration while others explicitly disavow this goal; conflicting views about the role of intelligent tools in data exploration; and pervasive use of visualization for exploration, but with only a subset using direct manipulation interfaces. These findings provide guidelines for future tool development, as well as a better understanding of the meaning of the term “data exploration” based on the words of practitioners “in the wild.”

Journal ArticleDOI
Junpeng Wang1, Liang Gou, Wei Zhang, Hao Yang, Han-Wei Shen1 
TL;DR: DeepVID, a Deep learning approach to Visually Interpret and Diagnose DNN models, especially image classifiers, is proposed and validated through comprehensive evaluations, as well as case studies conducted together with deep learning experts.
Abstract: Deep Neural Networks (DNNs) have been extensively used in multiple disciplines due to their superior performance. However, in most cases, DNNs are considered as black-boxes and the interpretation of their internal working mechanism is usually challenging. Given that model trust is often built on the understanding of how a model works, the interpretation of DNNs becomes more important, especially in safety-critical applications (e.g., medical diagnosis, autonomous driving). In this paper, we propose DeepVID, a Deep learning approach to Visually Interpret and Diagnose DNN models, especially image classifiers. In detail, we train a small locally-faithful model to mimic the behavior of an original cumbersome DNN around a particular data instance of interest, and the local model is sufficiently simple such that it can be visually interpreted (e.g., a linear model). Knowledge distillation is used to transfer the knowledge from the cumbersome DNN to the small model, and a deep generative model (i.e., variational auto-encoder) is used to generate neighbors around the instance of interest. Those neighbors, which come with small feature variances and semantic meanings, can effectively probe the DNN's behaviors around the interested instance and help the small model to learn those behaviors. Through comprehensive evaluations, as well as case studies conducted together with deep learning experts, we validate the effectiveness of DeepVID.

Journal ArticleDOI
TL;DR: This research comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization and text mining and derived around 300 concepts and built a taxonomy for each type of concept.
Abstract: Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization and text mining. From the analysis, we derived around 300 concepts (visualization techniques, mining techniques, and analysis tasks) and built a taxonomy for each type of concept. The co-occurrence relationships between the concepts were also extracted. Our research can be used as a stepping-stone for other researchers to 1) understand a common set of concepts used in this research topic; 2) facilitate the exploration of the relationships between visualization techniques, mining techniques, and analysis tasks; 3) understand the current practice in developing visual text analytics tools; 4) seek potential research opportunities by narrowing the gulf between visualization and mining techniques based on the analysis tasks; and 5) analyze other interdisciplinary research areas in a similar way. We have also contributed a web-based visualization tool for analyzing and understanding research trends and opportunities in visual text analytics.

Journal ArticleDOI
TL;DR: This work model multidimensional ST data as tensors and proposes a novel piecewise rank-one tensor decomposition algorithm which supports automatically slicing the data into homogeneous partitions and extracting the latent patterns in each partition for comparison and visual summarization.
Abstract: Consider a multi-dimensional spatio-temporal (ST) dataset where each entry is a numerical measure defined by the corresponding temporal, spatial and other domain-specific dimensions. A typical approach to explore such data utilizes interactive visualizations with multiple coordinated views. Each view displays the aggregated measures along one or two dimensions. By brushing on the views, analysts can obtain detailed information. However, this approach often cannot provide sufficient guidance for analysts to identify patterns hidden within subsets of data. Without a priori hypotheses, analysts need to manually select and iterate through different slices to search for patterns, which can be a tedious and lengthy process. In this work, we model multidimensional ST data as tensors and propose a novel piecewise rank-one tensor decomposition algorithm which supports automatically slicing the data into homogeneous partitions and extracting the latent patterns in each partition for comparison and visual summarization. The algorithm optimizes a quantitative measure about how faithfully the extracted patterns visually represent the original data. Based on the algorithm we further propose a visual analytics framework that supports a top-down, progressive partitioning workflow for level-of-detail multidimensional ST data exploration. We demonstrate the general applicability and effectiveness of our technique on three datasets from different application domains: regional sales trend analysis, customer traffic analysis in department stores, and taxi trip analysis with origin-destination (OD) data. We further interview domain experts to verify the usability of the prototype.

Journal ArticleDOI
TL;DR: Clustrophile 2, a new interactive tool for guided clustering analysis that guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings is introduced.
Abstract: Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The number of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large clustering space so as to improve the effectiveness of exploratory clustering analysis. We introduce Clustrophile 2 , a new interactive tool for guided clustering analysis. Clustrophile 2 guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, Clustrophile 2 contributes a novel feature, the Clustering Tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate Clustrophile 2 through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson's disease patients. Results suggest that Clustrophile 2 improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts.

Journal ArticleDOI
TL;DR: FiberClay is proposed: a system to display and interact with 3D trajectories in immersive environments and a new set of interactive techniques for composing complex queries in 3D space leveraging immersive environment controllers and user position.
Abstract: Visualizing 3D trajectories to extract insights about their similarities and spatial configuration is a critical task in several domains. Air traffic controllers for example deal with large quantities of aircrafts routes to optimize safety in airspace and neuroscientists attempt to understand neuronal pathways in the human brain by visualizing bundles of fibers from DTI images. Extracting insights from masses of 3D trajectories is challenging as the multiple three dimensional lines have complex geometries, may overlap, cross or even merge with each other, making it impossible to follow individual ones in dense areas. As trajectories are inherently spatial and three dimensional, we propose FiberClay: a system to display and interact with 3D trajectories in immersive environments. FiberClay renders a large quantity of trajectories in real time using GP-GPU techniques. FiberClay also introduces a new set of interactive techniques for composing complex queries in 3D space leveraging immersive environment controllers and user position. These techniques enable an analyst to select and compare sets of trajectories with specific geometries and data properties. We conclude by discussing insights found using FiberClay with domain experts in air traffic control and neurology.

Journal ArticleDOI
TL;DR: A novel spatio-temporal visual representation of changes in team formation is designed, allowing analysts to visually analyze the evolution of formations and track the spatial flow of players within formations over time, and further design and develop ForVizor, a visual analytics system, which empowers users to track the spatio/temporal changes in formation and understand how and why such changes occur.
Abstract: Regarded as a high-level tactic in soccer, a team formation assigns players different tasks and indicates their active regions on the pitch, thereby influencing the team performance significantly. Analysis of formations in soccer has become particularly indispensable for soccer analysts. However, formations of a team are intrinsically time-varying and contain inherent spatial information. The spatio-temporal nature of formations and other characteristics of soccer data, such as multivariate features, make analysis of formations in soccer a challenging problem. In this study, we closely worked with domain experts to characterize domain problems of formation analysis in soccer and formulated several design goals. We design a novel spatio-temporal visual representation of changes in team formation, allowing analysts to visually analyze the evolution of formations and track the spatial flow of players within formations over time. Based on the new design, we further design and develop ForVizor, a visual analytics system, which empowers users to track the spatio-temporal changes in formation and understand how and why such changes occur. With ForVizor, domain experts conduct formation analysis of two games. Analysis results with insights and useful feedback are summarized in two case studies.

Journal ArticleDOI
TL;DR: A characterization of OD flows is established based on an analogy between OD flows and natural language processing (NPL) terms, and an iterative multi-objective sampling scheme is designed to select OD flows in a vectorized representation space to enhance the readability of sampled OD flows.
Abstract: A variety of human movement datasets are represented in an Origin-Destination(OD) form, such as taxi trips, mobile phone locations, etc. As a commonly-used method to visualize OD data, flow map always fails to discover patterns of human mobility, due to massive intersections and occlusions of lines on a 2D geographical map. A large number of techniques have been proposed to reduce visual clutter of flow maps, such as filtering, clustering and edge bundling, but the correlations of OD flows are often neglected, which makes the simplified OD flow map present little semantic information. In this paper, a characterization of OD flows is established based on an analogy between OD flows and natural language processing (NPL) terms. Then, an iterative multi-objective sampling scheme is designed to select OD flows in a vectorized representation space. To enhance the readability of sampled OD flows, a set of meaningful visual encodings are designed to present the interactions of OD flows. We design and implement a visual exploration system that supports visual inspection and quantitative evaluation from a variety of perspectives. Case studies based on real-world datasets and interviews with domain experts have demonstrated the effectiveness of our system in reducing the visual clutter and enhancing correlations of OD flows.