scispace - formally typeset
Search or ask a question
Journal ArticleDOI

DataSite: Proactive visual data exploration with computation of insight-based recommendations:

01 Apr 2019-Information Visualization (SAGE PublicationsSage UK: London, England)-Vol. 18, Iss: 2, pp 251-267
TL;DR: In this paper, the authors propose that effective data analysis ideally requires the analyst to have high expertise as well as high knowledge of the data, even with such familiarity, manually pursuing all potential hypotheses and explor...
Abstract: Effective data analysis ideally requires the analyst to have high expertise as well as high knowledge of the data. Even with such familiarity, manually pursuing all potential hypotheses and explori...
Citations
More filters
Journal ArticleDOI
TL;DR: Voder is presented, a system that lets users interact with automatically-generated data facts to explore both alternative visualizations to convey a data fact as well as a set of embellishments to highlight a fact within a visualization.
Abstract: Recently, an increasing number of visualization systems have begun to incorporate natural language generation (NLG) capabilities into their interfaces. NLG-based visualization systems typically leverage a suite of statistical functions to automatically extract key facts about the underlying data and surface them as natural language sentences alongside visualizations. With current systems, users are typically required to read the system-generated sentences and mentally map them back to the accompanying visualization. However, depending on the features of the visualization (e.g., visualization type, data density) and the complexity of the data fact, mentally mapping facts to visualizations can be a challenging task. Furthermore, more than one visualization could be used to illustrate a single data fact. Unfortunately, current tools provide little or no support for users to explore such alternatives. In this paper, we explore how system-generated data facts can be treated as interactive widgets to help users interpret visualizations and communicate their findings. We present Voder , a system that lets users interact with automatically-generated data facts to explore both alternative visualizations to convey a data fact as well as a set of embellishments to highlight a fact within a visualization. Leveraging data facts as interactive widgets, Voder also facilitates data fact-based visualization search. To assess Voder's design and features, we conducted a preliminary user study with 12 participants having varying levels of experience with visualization tools. Participant feedback suggested that interactive data facts aided them in interpreting visualizations. Participants also stated that the suggestions surfaced through the facts helped them explore alternative visualizations and embellishments to communicate individual data facts.

132 citations

Journal ArticleDOI
TL;DR: This work presents DataShot, the first automated system that creates fact sheets automatically from tabular data, and proposes a fact sheet generation pipeline, consisting of fact extraction, fact composition, and presentation synthesis, for the auto-generation workflow.
Abstract: Fact sheets with vivid graphical design and intriguing statistical insights are prevalent for presenting raw data. They help audiences understand data-related facts effectively and make a deep impression. However, designing a fact sheet requires both data and design expertise and is a laborious and time-consuming process. One needs to not only understand the data in depth but also produce intricate graphical representations. To assist in the design process, we present DataShot which, to the best of our knowledge, is the first automated system that creates fact sheets automatically from tabular data. First, we conduct a qualitative analysis of 245 infographic examples to explore general infographic design space at both the sheet and element levels. We identify common infographic structures, sheet layouts, fact types, and visualization styles during the study. Based on these findings, we propose a fact sheet generation pipeline, consisting of fact extraction, fact composition, and presentation synthesis, for the auto-generation workflow. To validate our system, we present use cases with three real-world datasets. We conduct an in-lab user study to understand the usage of our system. Our evaluation results show that DataShot can efficiently generate satisfactory fact sheets to support further customization and data presentation.

110 citations

Journal ArticleDOI
TL;DR: In this paper, a proof-of-concept system that automatically converts statements about simple proportion-related statistics to a set of infographics with pre-designed styles is presented, based on the preliminary study.
Abstract: Combining data content with visual embellishments, infographics can effectively deliver messages in an engaging and memorable manner. Various authoring tools have been proposed to facilitate the creation of infographics. However, creating a professional infographic with these authoring tools is still not an easy task, requiring much time and design expertise. Therefore, these tools are generally not attractive to casual users, who are either unwilling to take time to learn the tools or lacking in proper design expertise to create a professional infographic. In this paper, we explore an alternative approach: to automatically generate infographics from natural language statements. We first conducted a preliminary study to explore the design space of infographics. Based on the preliminary study, we built a proof-of-concept system that automatically converts statements about simple proportion-related statistics to a set of infographics with pre-designed styles. Finally, we demonstrated the usability and usefulness of the system through sample results, exhibits, and expert reviews.

79 citations

Journal ArticleDOI
Danqing Shi1, Xinyue Xu1, Fuling Sun1, Yang Shi1, Nan Cao1 
TL;DR: This paper introduces a novel visual data story generating system, Calliope, which creates visual data stories from an input spreadsheet through an automatic process and facilities the easy revision of the generated story based on an online story editor.
Abstract: Visual data stories shown in the form of narrative visualizations such as a poster or a data video, are frequently used in data-oriented storytelling to facilitate the understanding and memorization of the story content. Although useful, technique barriers, such as data analysis, visualization, and scripting, make the generation of a visual data story difficult. Existing authoring tools rely on users' skills and experiences, which are usually inefficient and still difficult. In this paper, we introduce a novel visual data story generating system, Calliope, which creates visual data stories from an input spreadsheet through an automatic process and facilities the easy revision of the generated story based on an online story editor. Particularly, Calliope incorporates a new logic-oriented Monte Carlo tree search algorithm that explores the data space given by the input spreadsheet to progressively generate story pieces (i.e., data facts) and organize them in a logical order. The importance of data facts is measured based on information theory, and each data fact is visualized in a chart and captioned by an automatically generated description. We evaluate the proposed technique through three example stories, two controlled experiments, and a series of interviews with 10 domain experts. Our evaluation shows that Calliope is beneficial to efficient visual data story generation.

77 citations

Proceedings ArticleDOI
Rui Ding1, Shi Han1, Yong Xu1, Haidong Zhang1, Dongmei Zhang1 
25 Jun 2019
TL;DR: This work proposes a unified formulation of interesting patterns, called insights, and designs a systematic mining framework to discover high-quality insights efficiently, and demonstrates the effectiveness and efficiency of QuickInsights through evaluation on 447 real datasets as well as user studies on both expert users and non-expert users.
Abstract: Discovering interesting data patterns is a common and important analytical need in data, with increasing user demand for automated discovery abilities. However, automatically discovering interesting patterns from multi-dimensional data remains challenging. Existing techniques focus on mining individual types of patterns. There is a lack of unified formulation for different pattern types, as well as general mining frameworks to derive them effectively and efficiently. We present a novel technique QuickInsights, which quickly and automatically discovers interesting patterns from multi-dimensional data. QuickInsights proposes a unified formulation of interesting patterns, called insights, and designs a systematic mining framework to discover high-quality insights efficiently. We demonstrate the effectiveness and efficiency of QuickInsights through our evaluation on 447 real datasets as well as user studies on both expert users and non-expert users. QuickInsights is released in Microsoft Power BI.

63 citations

References
More filters
Journal ArticleDOI
TL;DR: It is argued that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades, and it is shown thatLMEMs generalize best when they include the maximal random effects structure justified by the design.

6,878 citations

Journal ArticleDOI
TL;DR: The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.
Abstract: Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.

5,686 citations

Journal ArticleDOI
TL;DR: This work shows how representational transparency improves expressiveness and better integrates with developer tools than prior approaches, while offering comparable notational efficiency and retaining powerful declarative components.
Abstract: Data-Driven Documents (D3) is a novel representation-transparent approach to visualization for the web Rather than hide the underlying scenegraph within a toolkit-specific abstraction, D3 enables direct inspection and manipulation of a native representation: the standard document object model (DOM) With D3, designers selectively bind input data to arbitrary document elements, applying dynamic transforms to both generate and modify content We show how representational transparency improves expressiveness and better integrates with developer tools than prior approaches, while offering comparable notational efficiency and retaining powerful declarative components Immediate evaluation of operators further simplifies debugging and allows iterative development Additionally, we demonstrate how D3 transforms naturally enable animation and interaction with dramatic performance improvements over intermediate representations

2,550 citations

Journal ArticleDOI
TL;DR: The approach is based on graphical perception—the visual decoding of information encoded on graphs—and it includes both theory and experimentation to test the theory, providing a guideline for graph construction.
Abstract: The subject of graphical methods for data analysis and for data presentation needs a scientific foundation. In this article we take a few steps in the direction of establishing such a foundation. Our approach is based on graphical perception—the visual decoding of information encoded on graphs—and it includes both theory and experimentation to test the theory. The theory deals with a small but important piece of the whole process of graphical perception. The first part is an identification of a set of elementary perceptual tasks that are carried out when people extract quantitative information from graphs. The second part is an ordering of the tasks on the basis of how accurately people perform them. Elements of the theory are tested by experimentation in which subjects record their judgments of the quantitative information on graphs. The experiments validate these elements but also suggest that the set of elementary tasks should be expanded. The theory provides a guideline for graph construction...

1,545 citations

Journal ArticleDOI
TL;DR: APT as discussed by the authors is an application-independent presentation tool that automatically designs effective graphical presentations (such as bar charts, scatter plots, and connected graphs) of relational information, based on the view that graphical presentations are sentences of graphical languages.
Abstract: The goal of the research described in this paper is to develop an application-independent presentation tool that automatically designs effective graphical presentations (such as bar charts, scatter plots, and connected graphs) of relational information. Two problems are raised by this goal: The codification of graphic design criteria in a form that can be used by the presentation tool, and the generation of a wide variety of designs so that the presentation tool can accommodate a wide variety of information. The approach described in this paper is based on the view that graphical presentations are sentences of graphical languages. The graphic design issues are codified as expressiveness and effectiveness criteria for graphical languages. Expressiveness criteria determine whether a graphical language can express the desired information. Effectiveness criteria determine whether a graphical language exploits the capabilities of the output medium and the human visual system. A wide variety of designs can be systematically generated by using a composition algebra that composes a small set of primitive graphical languages. Artificial intelligence techniques are used to implement a prototype presentation tool called APT (A Presentation Tool), which is based on the composition algebra and the graphic design criteria.

1,483 citations

Trending Questions (1)
Can a software developer become data analyst?

DataSite effectively turns data analysis into a conversation between analyst and computer, thereby reducing the cognitive load and domain knowledge requirements.