Home
/
Authors
/
Isabel Papadimitriou

Author

Isabel Papadimitriou

Bio: Isabel Papadimitriou is an academic researcher from Stanford University. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 5, co-authored 8 publications receiving 102 citations.

Papers

PDF

Open Access

More filters

Posted Content•

On the Opportunities and Risks of Foundation Models.

[...]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel¹, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang - Show less +110 more•Institutions (1)

Stanford University¹

16 Aug 2021-arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

...read moreread less

76 citations

Posted Content•

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

[...]

Isabel Papadimitriou¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

30 Apr 2020-arXiv: Computation and Language

TL;DR: Experiments on transfer between natural languages show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced from natural languages correspond to the cross-linguistic syntactic properties studied in linguistic typology.

...read moreread less

Abstract: We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models. We train LSTMs on non-linguistic data and evaluate their performance on natural language to assess which kinds of data induce generalizable structural features that LSTMs can use for natural language. We find that training on non-linguistic data with latent structure (MIDI music or Java code) improves test performance on natural language, despite no overlap in surface form or vocabulary. To pinpoint the kinds of abstract structure that models may be encoding to lead to this improvement, we run similar experiments with two artificial parentheses languages: one which has a hierarchical recursive structure, and a control which has paired tokens but no recursion. Surprisingly, training a model on either of these artificial languages leads to the same substantial gains when testing on natural language. Further experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced by pre-training correspond to the cross-linguistic syntactic properties. Our results provide insights into the ways that neural models represent abstract syntactic structure, and also about the kind of structural inductive biases which allow for natural language acquisition.

...read moreread less

32 citations

Posted Content•

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

[...]

Isaac Caswell¹, Julia Kreutzer¹, Lisa Wang¹, Ahsan Wahab, Daan van Esch¹, Nasanbayar Ulzii-Orshikh, Allahsera Auguste Tapo, Nishant Subramani², Artem Sokolov¹, Claytone Sikasote³, Monang Setyawan¹, Supheakmungkol Sarin, Sokhar Samb⁴, Benoît Sagot, Clara E. Rivera¹, Annette Rios⁵, Isabel Papadimitriou⁶, Salomey Osei⁷, Pedro Javier Ortiz Suárez⁸, Iroro Orife, Kelechi Ogueji⁹, Rubungo Andre Niyongabo¹⁰, Toan Q. Nguyen¹¹, Mathias Müller⁵, André Müller⁵, Shamsuddeen Hassan Muhammad¹², Nanda Muhammad¹, Ayanda Mnyakeni¹, Jamshidbek Mirzakhalov¹³, Tapiwanashe Matangira¹, Colin Leong, Nze Lawson¹, Sneha Kudugunta¹, Yacine Jernite, Mathias Jenny⁵, Orhan Firat¹, Bonaventure F. P. Dossou¹⁴, Sakhile Dlamini¹, Nisansa de Silva¹⁵, Sakine Çabuk Ballı¹, Stella Biderman, Alessia Battisti⁵, Ahmed Baruwa¹⁶, Ankur Bapna¹, Pallavi Baljekar¹, Israel Abebe Azime⁴, Ayodele Awokoya¹⁷, Duygu Ataman⁵, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal¹⁸, Mofetoluwa Adeyemi - Show less +48 more•Institutions (18)

Google¹, Intel², University of Zambia³, African Institute for Mathematical Sciences⁴, University of Zurich⁵, Stanford University⁶, Kwame Nkrumah University of Science and Technology⁷, University of Paris⁸, University of Waterloo⁹, University of Electronic Science and Technology of China¹⁰, University of Notre Dame¹¹, Bayero University Kano¹², University of South Florida¹³, Jacobs University Bremen¹⁴, University of Moratuwa¹⁵, Obafemi Awolowo University¹⁶, University of Ibadan¹⁷, University of Maryland, Baltimore¹⁸

23 Mar 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).

...read moreread less

Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actually contain content in the languages they claim to represent. In this work, we manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4), and audit the correctness of language codes in a sixth (JW300). We find that lower-resource corpora have systematic issues: at least 15 corpora are completely erroneous, and a significant fraction contains less than 50% sentences of acceptable quality. Similarly, we find 82 corpora that are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-speakers of the languages in question, and supplement the human judgements with automatic analyses. Inspired by our analysis, we recommend techniques to evaluate and improve multilingual corpora and discuss the risks that come with low-quality data releases.

...read moreread less

24 citations

Proceedings Article•DOI•

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

[...]

Isabel Papadimitriou¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

01 Nov 2020

TL;DR: This article proposed transfer learning as a method for analyzing the encoding of grammatical structure in neural language models, finding that training on non-linguistic data with latent structure improves test performance on natural language, despite no overlap in surface form or vocabulary.

...read moreread less

Abstract: We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models. We train LSTMs on non-linguistic data and evaluate their performance on natural language to assess which kinds of data induce generalizable structural features that LSTMs can use for natural language. We find that training on non-linguistic data with latent structure (MIDI music or Java code) improves test performance on natural language, despite no overlap in surface form or vocabulary. To pinpoint the kinds of abstract structure that models may be encoding to lead to this improvement, we run similar experiments with two artificial parentheses languages: one which has a hierarchical recursive structure, and a control which has paired tokens but no recursion. Surprisingly, training a model on either of these artificial languages leads the same substantial gains when testing on natural language. Further experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced by pre-training correspond to the cross-linguistic syntactic properties. Our results provide insights into the ways that neural models represent abstract syntactic structure, and also about the kind of structural inductive biases which allow for natural language acquisition.

...read moreread less

19 citations

Posted Content•

Pretraining on Non-linguistic Structure as a Tool for Analyzing Learning Bias in Language Models

[...]

Isabel Papadimitriou, Dan Jurafsky

30 Apr 2020

TL;DR: It is found that models trained on structured data such as music and Java code have internal representations that help in modelling human language, and that, surprisingly, adding minimal amounts of structure to the training data makes a large difference in transfer to natural language.

...read moreread less

Abstract: We propose a novel methodology for analyzing the encoding of grammatical structure in neural language models through transfer learning. We test how a language model can leverage its internal representations to transfer knowledge across languages and symbol systems. We train LSTMs on non-linguistic, structured data and test their performance on human language to assess which kinds of data induce generalizable encodings that LSTMs can use for natural language. We find that models trained on structured data such as music and Java code have internal representations that help in modelling human language, and that, surprisingly, adding minimal amounts of structure to the training data makes a large difference in transfer to natural language. Further experiments on transfer between human languages show that zero-shot performance on a test language is highly correlated with syntactic similarity to the training language, even after removing any vocabulary overlap. This suggests that the internal representations induced from natural languages are typologically coherent: they encode the features and differences outlined in typological studies. Our results provide insights into how neural networks represent linguistic structure, and also about the kinds of structural biases that give learners the ability to model language.

...read moreread less

13 citations

Cited by

PDF

Open Access

More filters

영어의 능격성(Ergativity)

[...]

김상혁 ( Kim Sang Hyeog )

01 Jun 2003

625 citations

Proceedings Article•DOI•

Can language models learn from explanations in context?

[...]

Andrew K. Lampinen, Ishita Dasgupta, Stephanie C.Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill - Show less +5 more

05 Apr 2022

TL;DR: Investigating whether explanations of few-shot examples can help in-context learning of large LMs on challenging tasks finds that explanations can improve performance—even without tuning.

...read moreread less

Abstract: Language Models (LMs) can perform new tasks by adapting to a few in-context examples. For humans, explanations that connect examples to task principles can improve learning. We therefore investigate whether explanations of few-shot examples can help LMs. We annotate questions from 40 challenging tasks with answer explanations, and various matched control explanations. We evaluate how different types of explanations, instructions, and controls affect zero- and few-shot performance. We analyze these results using statistical multilevel modeling techniques that account for the nested dependencies among conditions, tasks, prompts, and models. We find that explanations can improve performance -- even without tuning. Furthermore, explanations hand-tuned for performance on a small validation set offer substantially larger benefits, and building a prompt by selecting examples and explanations together substantially improves performance over selecting examples alone. Finally, even untuned explanations outperform carefully matched controls, suggesting that the benefits are due to the link between an example and its explanation, rather than lower-level features. However, only large models benefit. In summary, explanations can support the in-context learning of large LMs on challenging tasks.

...read moreread less

104 citations

Posted Content•

When BERT Plays the Lottery, All Tickets Are Winning

[...]

Sai Prasanna, Anna Rogers¹, Anna Rumshisky²•Institutions (2)

University of Copenhagen¹, University of Massachusetts Lowell²

01 May 2020-arXiv: Computation and Language

TL;DR: It is shown that the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful.

...read moreread less

Abstract: Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the "good" subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.

...read moreread less

104 citations

DOI•

Data and its (dis)contents: A survey of dataset development and use in machine learning research.

[...]

Amandalynne Paullada¹, Inioluwa Deborah Raji², Emily M. Bender¹, Emily Denton³, Alex Hanna³ - Show less +1 more•Institutions (3)

University of Washington¹, Mozilla Foundation², Google³

12 Nov 2021

TL;DR: In this paper, the limitations of predominant practices for dataset collection and use in the field of machine learning are surveyed and a wide range of approaches to filtering and augmenting data and modeling techniques aimed at mitigating the impact of bias in datasets are discussed.

...read moreread less

Abstract: Summary In this work, we survey a breadth of literature that has revealed the limitations of predominant practices for dataset collection and use in the field of machine learning. We cover studies that critically review the design and development of datasets with a focus on negative societal impacts and poor outcomes for system performance. We also cover approaches to filtering and augmenting data and modeling techniques aimed at mitigating the impact of bias in datasets. Finally, we discuss works that have studied data practices, cultures, and disciplinary norms and discuss implications for the legal, ethical, and functional challenges the field continues to face. Based on these findings, we advocate for the use of both qualitative and quantitative approaches to more carefully document and analyze datasets during the creation and usage phases.

...read moreread less

94 citations

Posted Content•

On the Opportunities and Risks of Foundation Models.

[...]

Stanford University¹

16 Aug 2021-arXiv: Learning

...read moreread less

76 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Collapse