Counterfactuals and Causability in Explainable Artificial Intelligence: Theory, Algorithms, and Applications

Open AccessPosted Content

Counterfactuals and Causability in Explainable Artificial Intelligence: Theory, Algorithms, and Applications

- 07 Mar 2021 -

TLDR

The authors performed an LDA topic modeling analysis under a PRISMA framework to find the most relevant literature articles, which resulted in a novel taxonomy that considers the grounding theories of the surveyed algorithms, together with their underlying properties and applications in real-world data.

Abstract:

There has been a growing interest in model-agnostic methods that can make deep learning models more transparent and explainable to a user. Some researchers recently argued that for a machine to achieve a certain degree of human-level explainability, this machine needs to provide human causally understandable explanations, also known as causability. A specific class of algorithms that have the potential to provide causability are counterfactuals. This paper presents an in-depth systematic review of the diverse existing body of literature on counterfactuals and causability for explainable artificial intelligence. We performed an LDA topic modelling analysis under a PRISMA framework to find the most relevant literature articles. This analysis resulted in a novel taxonomy that considers the grounding theories of the surveyed algorithms, together with their underlying properties and applications in real-world data. This research suggests that current model-agnostic counterfactual algorithms for explainable AI are not grounded on a causal theoretical formalism and, consequently, cannot promote causability to a human decision-maker. Our findings suggest that the explanations derived from major algorithms in the literature provide spurious correlations rather than cause/effects relationships, leading to sub-optimal, erroneous or even biased explanations. This paper also advances the literature with new directions and challenges on promoting causability in model-agnostic approaches for explainable artificial intelligence.

Counterfactuals and Causability in Explainable Artificial Intelligence: Theory, Algorithms, and Applications

Citations

DiCE4EL: Interpreting Process Predictions using a Milestone-Aware Counterfactual Approach

Convex optimization for actionable \& plausible counterfactual explanations.

Amortized Generation of Sequential Counterfactual Explanations for Black-box Models.

Interpreting Process Predictions using a Milestone-Aware Counterfactual Approach

Pitfalls of Explainable ML: An Industry Perspective.

References

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Visualizing and Understanding Convolutional Networks

Causality: models, reasoning, and inference

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

A Treatise of Human Nature

Related Papers (5)

Explaining by Removing: A Unified Framework for Model Explanation

Gradient-Based Attribution Methods

The challenges and scope of theoretical biology.

Constructing and refining causal explanations from an inconsistent domain theory

Knowledge of Counterfactual Interventions through Cognitive Models of Mechanisms