Open AccessPosted Content
Datasheets for Datasets
Timnit Gebru,Jamie Morgenstern,Briana Vecchione,Jennifer Wortman Vaughan,Hanna Wallach,Hal Daumé,Kate Crawford +6 more
Reads0
Chats0
TLDR
Documentation to facilitate communication between dataset creators and consumers and consumers is presented.Abstract:
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.read more
Citations
More filters
Posted Content
A Survey on Bias and Fairness in Machine Learning
TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Journal Article
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery,Sharan Narang,Jacob Devlin,Maarten Bosma,Gaurav Mishra,Adam Roberts,Paul Barham,Hyung Won Chung,Charles Sutton,Sebastian Gehrmann,Parker Schuh,Kensen Shi,Sasha Tsvyashchenko,Joshua Maynez,Abhishek Rao,Parker Barnes,Yi Tay,Noam Shazeer,Velu Prabhakaran,Emily Reif,Nan Du,B. C. Hutchinson,Reiner Pope,James Bradbury,Jacob Austin,Michael Isard,Guy Gur-Ari,Peng Yin,Toju Duke,Anselm Levskaya,Sanjay Ghemawat,Sunipa Dev,Henryk Michalewski,Xavier Garcia,Vedant Misra,Kevin Robinson,L Fedus,Denny Zhou,Daphne Ippolito,David Luan,Hyeontaek Lim,Barret Zoph,Alexander Spiridonov,Ryan Sepassi,David Dohan,Shivani Agrawal,Mark Omernick,Andrew M. Dai,Thanumalayan Sankaranarayana Pillai,Marie Pellat,Aitor Lewkowycz,Erica Oliveira Moreira,Rewon Child,Oleksandr Polozov,Katherine Lee,Zong Tuan Zhou,Xuezhi Wang,Brennan Saeta,Mark Díaz,Orhan Firat,M. Catasta,Jason Loh Seong Wei,Kathleen S. Meier-Hellstern,Douglas Eck,Jeffrey Dean,Slav Petrov,Noah Fiedel +66 more
TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Proceedings ArticleDOI
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia,V. K. Chan,Saurabh Saxena,Lala Li,Jay Whang,Emily Denton,Seyed Kamyar Seyed Ghasemipour,Burcu Karagol Ayan,Seyedeh Sara Mahdavi,Raphael Gontijo Lopes,Tim Salimans,Jonathan Ho,David J. Fleet,Mahmood Norouzi +13 more
TL;DR: This work presents Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding, and finds that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.
Journal Article
OPT: Open Pre-trained Transformer Language Models
Susan Zhang,Stephen Roller,Naman Goyal,Mikel Artetxe,Moya Chen,Shuohui Chen,Christopher Dewan,Mona Zidan Diab,Xian Li,Xi Victoria Lin,Todor Mihaylov,Myle Ott,Sam Shleifer,Kurt Shuster,Daniel Simig,Punit Singh Koura,Anjali Sridhar,Tianlu Wang,Luke Zettlemoyer +18 more
TL;DR: This work presents Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which they aim to fully and responsibly share with interested researchers.
Proceedings ArticleDOI
Model Cards for Model Reporting
Margaret Mitchell,Simone Wu,Andrew Zaldivar,Parker Barnes,Lucy Vasserman,Ben Hutchinson,Elena Spitzer,Inioluwa Deborah Raji,Timnit Gebru +8 more
TL;DR: This work proposes model cards, a framework that can be used to document any trained machine learning model in the application fields of computer vision and natural language processing, and provides cards for two supervised models: One trained to detect smiling faces in images, and one training to detect toxic comments in text.
References
More filters
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments
TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
Journal ArticleDOI
Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.
Alvis Brazma,Pascal Hingamp,John Quackenbush,Gavin Sherlock,Paul T. Spellman,Chris Stoeckert,John Aach,Wilhelm Ansorge,Catherine A. Ball,Helen C. Causton,Terry Gaasterland,Patrick Glenisson,Frank C. P. Holstege,Irene F. Kim,Victor Markowitz,John C. Matese,Helen Parkinson,Alan J. Robinson,Ugis Sarkans,Steffen Schulze-Kremer,Jason E. Stewart,Ronald C. Taylor,Jaak Vilo,Martin Vingron +23 more
TL;DR: The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools.
Proceedings ArticleDOI
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
Bo Pang,Lillian Lee +1 more
TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
Joy Buolamwini,Timnit Gebru +1 more
TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.
Journal ArticleDOI
Semantics derived automatically from language corpora contain human-like biases
TL;DR: This article showed that applying machine learning to ordinary human language results in human-like semantic biases and replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web.