scispace - formally typeset
Journal ArticleDOI

Debiasing Vision-Language Models via Biased Prompts

TLDR
This article proposed a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding, which reduces social bias and spurious correlation in both discriminative and generative vision language models without the need for additional data or training.
Abstract
Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Categorical Archive of ChatGPT Failures

Ali Borji
- 06 Feb 2023 - 
TL;DR: In this paper , the authors present a comprehensive analysis of ChatGPT's failures, including reasoning, factual errors, math, coding, and bias, and highlight the risks, limitations, and societal implications of chatGPT.
Journal ArticleDOI

What does CLIP know about a red circle? Visual prompt engineering for VLMs

TL;DR: In this article , the authors explore the idea of visual prompt engineering for solving computer vision tasks beyond classification by editing in image space instead of text, and show the power of this simple approach by achieving state-of-the-art in zero-shot referring expressions comprehension and strong performance in keypoint localization tasks.
Journal ArticleDOI

The Hidden Language of Diffusion Models

TL;DR: Chefer et al. as discussed by the authors decompose an input text prompt into a small set of interpretable elements and learn a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary with the objective of reconstructing the images generated for the given concept.

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

TL;DR: The authors investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs) and empirically explore these structures in CLIP's embedding and evaluate their usefulness for solving different vision language tasks such as classification, debiasing, and retrieval.

Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation

TL;DR: The authors proposed a bias-to-text (B2T) framework to identify and mitigate biases in vision models, such as image classifiers and text-toimage generative models.
References
More filters
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Proceedings ArticleDOI

Deep Learning Face Attributes in the Wild

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Journal ArticleDOI

Hierarchical Text-Conditional Image Generation with CLIP Latents

TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
Posted Content

A Survey on Bias and Fairness in Machine Learning

TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Trending Questions (1)
How to find spurious correlations in vision-language models?

The study proposes a method to debias vision-language models by projecting out biased directions in the text embedding, effectively reducing spurious correlations.