scispace - formally typeset
Open AccessPosted Content

Datasheets for Datasets

Reads0
Chats0
TLDR
Documentation to facilitate communication between dataset creators and consumers and consumers is presented.
Abstract
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

read more

Citations
More filters
Posted Content

A Survey on Bias and Fairness in Machine Learning

TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Proceedings ArticleDOI

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

TL;DR: This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.
Journal Article

OPT: Open Pre-trained Transformer Language Models

TL;DR: This work presents Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which they aim to fully and responsibly share with interested researchers.
Proceedings ArticleDOI

Model Cards for Model Reporting

TL;DR: This work proposes model cards, a framework that can be used to document any trained machine learning model in the application fields of computer vision and natural language processing, and provides cards for two supervised models: One trained to detect smiling faces in images, and one training to detect toxic comments in text.
References
More filters

Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments

TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
Proceedings ArticleDOI

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Journal ArticleDOI

Semantics derived automatically from language corpora contain human-like biases

TL;DR: This article showed that applying machine learning to ordinary human language results in human-like semantic biases and replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web.
Related Papers (5)
Trending Questions (1)
How do datasheets be used for compliance with the AI Act?

The provided paper does not mention the AI Act or how datasheets can be used for compliance with it.