scispace - formally typeset
Search or ask a question

Showing papers by "Eugene Wu published in 2021"


Journal ArticleDOI
TL;DR: In this article, the authors examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias and conclude that while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions.
Abstract: Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this paper, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.

9 citations


Journal ArticleDOI
08 Apr 2021
TL;DR: Inference query explanation seeks to explain unexp... as discussed by the authors provides an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data.
Abstract: Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain unexp...

5 citations


Posted Content
TL;DR: Frog as discussed by the authors is a federated training data debugging framework tailored for federated learning, which can automatically remove the label errors from training data such that the unexpected behavior disappears in the retrained model.
Abstract: How can we debug a logistical regression model in a federated learning setting when seeing the model behave unexpectedly (e.g., the model rejects all high-income customers' loan applications)? The SQL-based training data debugging framework has proved effective to fix this kind of issue in a non-federated learning setting. Given an unexpected query result over model predictions, this framework automatically removes the label errors from training data such that the unexpected behavior disappears in the retrained model. In this paper, we enable this powerful framework for federated learning. The key challenge is how to develop a security protocol for federated debugging which is proved to be secure, efficient, and accurate. Achieving this goal requires us to investigate how to seamlessly integrate the techniques from multiple fields (Databases, Machine Learning, and Cybersecurity). We first propose FedRain, which extends Rain, the state-of-the-art SQL-based training data debugging framework, to our federated learning setting. We address several technical challenges to make FedRain work and analyze its security guarantee and time complexity. The analysis results show that FedRain falls short in terms of both efficiency and security. To overcome these limitations, we redesign our security protocol and propose Frog, a novel SQL-based training data debugging framework tailored for federated learning. Our theoretical analysis shows that Frog is more secure, more accurate, and more efficient than FedRain. We conduct extensive experiments using several real-world datasets and a case study. The experimental results are consistent with our theoretical analysis and validate the effectiveness of Frog in practice.

Posted Content
TL;DR: PI2 as mentioned in this paper is a system that generates fully functional visual analysis interfaces from an example sequence of analysis queries using a novel Difftree structure that encodes systematic variations between query abstract syntax trees.
Abstract: Interactive visual analysis interfaces are critical in nearly every data task. Yet creating new interfaces is deeply challenging, as it requires the developer to understand the queries needed to express the desired analysis task, design the appropriate interface to express those queries for the task, and implement the interface using a combination of visualization, browser, server, and database technologies. Although prior work generates a set of interactive widgets that can express an input query log, this paper presents PI2, the first system to generate fully functional visual analysis interfaces from an example sequence of analysis queries. PI2 analyzes queries syntactically, and represents a set of queries using a novel Difftree structure that encodes systematic variations between query abstract syntax trees. PI2 then maps each Difftree to a visualization that renders its results, the variations in each Difftree to interactions, and generates a good layout for the interface. We show that PI2 can express data-oriented interactions in existing visualization interaction taxonomies, can reproduce or improve several real-world visual analysis interfaces, generates interfaces in 2-19s (median 6s), and scales linearly with the number of queries.

Posted Content
TL;DR: In this paper, the authors conduct a series of Mechanical Turk experiments to study the potential of dynamic axis rescaling and the factors that affect its effectiveness, and find that the appropriate rescaling policy is both task and data-dependent, and do not find one clear policy choice for all situations.
Abstract: Animated and interactive data visualizations dynamically change the data rendered in a visualization (e.g., bar chart). As the data changes, the y-axis may need to be rescaled as the domain of the data changes. Each axis rescaling potentially improves the readability of the current chart, but may also disorient the user. In contrast to static visualizations, where there is considerable literature to help choose the appropriate y-axis scale, there is a lack of guidance about how and when rescaling should be used in dynamic visualizations. Existing visualization systems and libraries adapt a fixed global y-axis, or rescale every time the data changes. Yet, professional visualizations, such as in data journalism, do not adopt either strategy. They instead carefully and manually choose when to rescale based on the analysis task and data. To this end, we conduct a series of Mechanical Turk experiments to study the potential of dynamic axis rescaling and the factors that affect its effectiveness. We find that the appropriate rescaling policy is both task- and data-dependent, and we do not find one clear policy choice for all situations.

Posted Content
TL;DR: In this paper, the authors propose Reptile, an explanation system for hierarchical data, which recommends the next drill-down attribute, and ranks the drilldown groups based on the extent repairing the group's statistics to its expected values resolves the anomaly.
Abstract: Recent query explanation systems help users understand anomalies in aggregation results by proposing predicates that describe input records that, if deleted, would resolve the anomalies. However, it can be difficult for users to understand how a predicate was chosen, and these approaches are limited to errors that can be resolved through deletion. In contrast, data errors may be due to group-wise errors, such as missing records or systematic value errors. This paper presents Reptile, an explanation system for hierarchical data. Given an anomalous aggregate query result, Reptile recommends the next drill-down attribute,and ranks the drill-down groups based on the extent repairing the group's statistics to its expected values resolves the anomaly. Reptile efficiently trains a multi-level model that leverages the data's hierarchy to estimate the expected values, and uses a factorised representation of the feature matrix to remove redundancies due to the data's hierarchical structure. We further extend model training to support factorised data, and develop a suite of optimizations that leverage the data's hierarchical structure. Reptile reduces end-to-end runtimes by more than 6 times compared to a Matlab-based implementation, correctly identifies 21/30 data errors in John Hopkin's COVID-19 data, and correctly resolves 20/22 complaints in a user study using data and researchers from Columbia University's Financial Instruments Sector Team.

Posted Content
TL;DR: BOExplain this article is a framework for explaining inference queries using Bayesian optimization (BO) which is a technique for finding the global optimum of a black-box function and is used to find the best predicate.
Abstract: Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data Inference query explanation seeks to explain unexpected aggregate query results on inference data; such queries are challenging to explain because an explanation may need to be derived from the source, training, or inference data in an ML pipeline In this paper, we model an objective function as a black-box function and propose BOExplain, a novel framework for explaining inference queries using Bayesian optimization (BO) An explanation is a predicate defining the input tuples that should be removed so that the query result of interest is significantly affected BO - a technique for finding the global optimum of a black-box function - is used to find the best predicate We develop two new techniques (individual contribution encoding and warm start) to handle categorical variables We perform experiments showing that the predicates found by BOExplain have a higher degree of explanation compared to those found by the state-of-the-art query explanation engines We also show that BOExplain is effective at deriving explanations for inference queries from source and training data on three real-world datasets