Journal ArticleDOI
Debiasing Vision-Language Models via Biased Prompts
TLDR
This article proposed a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding, which reduces social bias and spurious correlation in both discriminative and generative vision language models without the need for additional data or training.Abstract:
Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.read more
Citations
More filters
Journal ArticleDOI
A Categorical Archive of ChatGPT Failures
TL;DR: In this paper , the authors present a comprehensive analysis of ChatGPT's failures, including reasoning, factual errors, math, coding, and bias, and highlight the risks, limitations, and societal implications of chatGPT.
Journal ArticleDOI
What does CLIP know about a red circle? Visual prompt engineering for VLMs
TL;DR: In this article , the authors explore the idea of visual prompt engineering for solving computer vision tasks beyond classification by editing in image space instead of text, and show the power of this simple approach by achieving state-of-the-art in zero-shot referring expressions comprehension and strong performance in keypoint localization tasks.
Journal ArticleDOI
The Hidden Language of Diffusion Models
Hila Chefer,Oran Lang,Mor Geva,Volodymyr Polosukhin,Assaf Shocher,Michal Irani,Inbar Mosseri,L. Wof +7 more
TL;DR: Chefer et al. as discussed by the authors decompose an input text prompt into a small set of interpretable elements and learn a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary with the objective of reconstructing the images generated for the given concept.
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
TL;DR: The authors investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs) and empirically explore these structures in CLIP's embedding and evaluate their usefulness for solving different vision language tasks such as classification, debiasing, and retrieval.
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
TL;DR: The authors proposed a bias-to-text (B2T) framework to identify and mitigate biases in vision models, such as image classifiers and text-toimage generative models.
References
More filters
Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Posted Content
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy,Lucas Beyer,Alexander Kolesnikov,Dirk Weissenborn,Xiaohua Zhai,Thomas Unterthiner,Mostafa Dehghani,Matthias Minderer,Georg Heigold,Sylvain Gelly,Jakob Uszkoreit,Neil Houlsby +11 more
TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Proceedings ArticleDOI
Deep Learning Face Attributes in the Wild
TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Journal ArticleDOI
Hierarchical Text-Conditional Image Generation with CLIP Latents
TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
Posted Content
A Survey on Bias and Fairness in Machine Learning
TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.