Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

Open AccessPosted Content

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

Alex Beutel, +3 more

- 01 Jul 2017 -

arXiv: Learning

Chats0

TLDR

An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness.

Abstract:

How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group? How can we train such a classifier when data on the protected group is difficult to attain? In many settings, finding out the sensitive input attribute can be prohibitively expensive even during model training, and sometimes impossible during model serving. For example, in recommender systems, if we want to predict if a user will click on a given recommendation, we often do not know many attributes of the user, e.g., race or age, and many attributes of the content are hard to determine, e.g., the language or topic. Thus, it is not feasible to use a different classifier calibrated based on knowledge of the sensitive attribute. Here, we use an adversarial training procedure to remove information about the sensitive attribute from the latent representation learned by a neural network. In particular, we study how the choice of data for the adversarial training effects the resulting fairness properties. We find two interesting results: a small amount of data is needed to train these adversarial models, and the data distribution empirically drives the adversary's notion of fairness.

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

Citations

Inherent Limitations of Multi-Task Fair Representations

G eneralized d emographic p arity for g roup f airness

Quantifying Information Leakage via Adversarial Loss Functions: Theory and Practice

On Disentangled and Locally Fair Representations

Exploring Spurious Learning in Self-Supervised Representations

References

UCI Machine Learning Repository

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Domain-adversarial training of neural networks

Equality of opportunity in supervised learning

Related Papers (5)

Equality of opportunity in supervised learning

Learning Fair Representations

Mitigating Unwanted Biases with Adversarial Learning

Fairness through awareness

Certifying and Removing Disparate Impact