Word embeddings quantify 100 years of gender and ethnic stereotypes.

doi:10.1073/PNAS.1720347115

Open AccessJournal ArticleDOI

Word embeddings quantify 100 years of gender and ethnic stereotypes.

Nikhil Garg, +3 more

- 17 Apr 2018 -

Proceedings of the National Academy of S...

- Vol. 115, Iss: 16, pp 201720347

Chats0

TLDR

A framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States is developed.

Abstract:

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

Citations

PDF

Open Access

More filters

Posted Content

Language (Technology) is Power: A Critical Survey of "Bias" in NLP

Su Lin Blodgett, +3 more

- 28 May 2020 -

arXiv: Computation and Language

TL;DR: The authors survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing bias is an inherently normative process.

...read moreread less

Journal ArticleDOI

Unraveling the “Model Minority” Stereotype: Listening to Asian American Youth.

Robert B. Everhart

- 01 Mar 1998 -

Anthropology & Education Quarterly

TL;DR: Lee et al. as discussed by the authors discuss listening to Asian American youth and uncovering the model minority stereotype of Asian-Americans, and present a model minority model for Asian-American youth.

...read moreread less

大規模要約資源としてのNew York Times Annotated Corpus

菊池悠太, +3 more

Journal ArticleDOI

AI can be sexist and racist — it’s time to make it fair

James Zou, +1 more

- 01 Jul 2018 -

Nature

TL;DR: Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger.

...read moreread less

Proceedings ArticleDOI

Mitigating Gender Bias in Natural Language Processing: Literature Review

Tony Sun, +9 more

TL;DR: This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013 -

arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Journal Article

Natural Language Processing (Almost) from Scratch

Ronan Collobert, +5 more

- 01 Feb 2011 -

Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.

...read moreread less

Collapse

Related Papers (5)

Semantics derived automatically from language corpora contain human-like biases

Aylin Caliskan, +3 more

- 14 Apr 2017 -

Science

Word embeddings quantify 100 years of gender and ethnic stereotypes.

Citations

Language (Technology) is Power: A Critical Survey of "Bias" in NLP

Unraveling the “Model Minority” Stereotype: Listening to Asian American Youth.

大規模要約資源としてのNew York Times Annotated Corpus

AI can be sexist and racist — it’s time to make it fair

Mitigating Gender Bias in Natural Language Processing: Literature Review

References

Glove: Global Vectors for Word Representation

Distributed Representations of Words and Phrases and their Compositionality

Efficient Estimation of Word Representations in Vector Space

Distributed Representations of Words and Phrases and their Compositionality

Natural Language Processing (Almost) from Scratch

Related Papers (5)

Semantics derived automatically from language corpora contain human-like biases

Man is to computer programmer as woman is to homemaker? debiasing word embeddings

Glove: Global Vectors for Word Representation

Distributed Representations of Words and Phrases and their Compositionality

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding