scispace - formally typeset
Search or ask a question
Author

Richard Futrell

Bio: Richard Futrell is an academic researcher from University of California, Irvine. The author has contributed to research in topics: Computer science & Syntax. The author has an hindex of 23, co-authored 88 publications receiving 1801 citations. Previous affiliations of Richard Futrell include Massachusetts Institute of Technology & Apple Inc..

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: Using parsed corpora of 37 diverse languages, it is shown that overall dependency lengths for all languages are shorter than conservative random baselines, suggesting that dependency length minimization is a universal quantitative property of human languages.
Abstract: Explaining the variation between human languages and the constraints on that variation is a core goal of linguistics. In the last 20 y, it has been claimed that many striking universals of cross-linguistic variation follow from a hypothetical principle that dependency length—the distance between syntactically related words in a sentence—is minimized. Various models of human sentence production and comprehension predict that long dependencies are difficult or inefficient to process; minimizing dependency length thus enables effective communication without incurring processing difficulty. However, despite widespread application of this idea in theoretical, empirical, and practical work, there is not yet large-scale evidence that dependency length is actually minimized in real utterances across many languages; previous work has focused either on a small number of languages or on limited kinds of data about each language. Here, using parsed corpora of 37 diverse languages, we show that overall dependency lengths for all languages are shorter than conservative random baselines. The results strongly suggest that dependency length minimization is a universal quantitative property of human languages and support explanations of linguistic variation in terms of general properties of human information processing.

278 citations

Journal ArticleDOI
TL;DR: These studies show how a pervasive pressure for efficiency guides the forms of natural language and indicate that a rich future for language research lies in connecting linguistics to cognitive psychology and mathematical theories of communication and inference.

182 citations

Journal ArticleDOI
TL;DR: The authors performed an exhaustive meta-analysis of 73 peer-reviewed journal articles on syntactic priming from the seminal Bock (1986) paper through 2013 and found a robust effect with an average weighted odds ratio of 1.67 when there was no lexical overlap and 3.26 when there is.

166 citations

Journal ArticleDOI
TL;DR: It is suggested that the cross-linguistic similarity in color-naming efficiency reflects colors of universal usefulness and provides an account of a principle (color use) that governs how color categories come about.
Abstract: What determines how languages categorize colors? We analyzed results of the World Color Survey (WCS) of 110 languages to show that despite gross differences across languages, communication of chromatic chips is always better for warm colors (yellows/reds) than cool colors (blues/greens). We present an analysis of color statistics in a large databank of natural images curated by human observers for salient objects and show that objects tend to have warm rather than cool colors. These results suggest that the cross-linguistic similarity in color-naming efficiency reflects colors of universal usefulness and provide an account of a principle (color use) that governs how color categories come about. We show that potential methodological issues with the WCS do not corrupt information-theoretic analyses, by collecting original data using two extreme versions of the color-naming task, in three groups: the Tsimane', a remote Amazonian hunter-gatherer isolate; Bolivian-Spanish speakers; and English speakers. These data also enabled us to test another prediction of the color-usefulness hypothesis: that differences in color categorization between languages are caused by differences in overall usefulness of color to a culture. In support, we found that color naming among Tsimane' had relatively low communicative efficiency, and the Tsimane' were less likely to use color terms when describing familiar objects. Color-naming among Tsimane' was boosted when naming artificially colored objects compared with natural objects, suggesting that industrialization promotes color usefulness.

162 citations

Proceedings ArticleDOI
31 Aug 2018
TL;DR: These studies demonstrates that state-of-the-art RNN models are able to learn and generalize about empty syntactic positions, and shows that RNNs show evidence for wh-islands, adjunct islands, and complex NP islands.
Abstract: RNN language models have achieved state-of-the-art perplexity results and have proven useful in a suite of NLP tasks, but it is as yet unclear what syntactic generalizations they learn. Here we investigate whether state-of-the-art RNN language models represent long-distance filler–gap dependencies and constraints on them. Examining RNN behavior on experimentally controlled sentences designed to expose filler–gap dependencies, we show that RNNs can represent the relationship in multiple syntactic positions and over large spans of text. Furthermore, we show that RNNs learn a subset of the known restrictions on filler–gap dependencies, known as island constraints: RNNs show evidence for wh-islands, adjunct islands, and complex NP islands. These studies demonstrates that state-of-the-art RNN models are able to learn and generalize about empty syntactic positions.

146 citations


Cited by
More filters
01 Jan 2005
TL;DR: In “Constructing a Language,” Tomasello presents a contrasting theory of how the child acquires language: It is not a universal grammar that allows for language development, but two sets of cognitive skills resulting from biological/phylogenetic adaptations are fundamental to the ontogenetic origins of language.
Abstract: Child psychiatrists, pediatricians, and other child clinicians need to have a solid understanding of child language development. There are at least four important reasons that make this necessary. First, slowing, arrest, and deviation of language development are highly associated with, and complicate the course of, child psychopathology. Second, language competence plays a crucial role in emotional and mood regulation, evaluation, and therapy. Third, language deficits are the most frequent underpinning of the learning disorders, ubiquitous in our clinical populations. Fourth, clinicians should not confuse the rich linguistic and dialectal diversity of our clinical populations with abnormalities in child language development. The challenge for the clinician becomes, then, how to get immersed in the captivating field of child language acquisition without getting overwhelmed by its conceptual and empirical complexity. In the past 50 years and since the seminal works of Roger Brown, Jerome Bruner, and Catherine Snow, child language researchers (often known as developmental psycholinguists) have produced a remarkable body of knowledge. Linguists such as Chomsky and philosophers such as Grice have strongly influenced the science of child language. One of the major tenets of Chomskian linguistics (known as generative grammar) is that children’s capacity to acquire language is “hardwired” with “universal grammar”—an innate language acquisition device (LAD), a language “instinct”—at its core. This view is in part supported by the assertion that the linguistic input that children receive is relatively dismal and of poor quality relative to the high quantity and quality of output that they manage to produce after age 2 and that only an advanced, innate capacity to decode and organize linguistic input can enable them to “get from here (prelinguistic infant) to there (linguistic child).” In “Constructing a Language,” Tomasello presents a contrasting theory of how the child acquires language: It is not a universal grammar that allows for language development. Rather, human cognition universals of communicative needs and vocal-auditory processing result in some language universals, such as nouns and verbs as expressions of reference and predication (p. 19). The author proposes that two sets of cognitive skills resulting from biological/phylogenetic adaptations are fundamental to the ontogenetic origins of language. These sets of inherited cognitive skills are intentionreading on the one hand and pattern-finding, on the other. Intention-reading skills encompass the prelinguistic infant’s capacities to share attention to outside events with other persons, establishing joint attentional frames, to understand other people’s communicative intentions, and to imitate the adult’s communicative intentions (an intersubjective form of imitation that requires symbolic understanding and perspective-taking). Pattern-finding skills include the ability of infants as young as 7 months old to analyze concepts and percepts (most relevant here, auditory or speech percepts) and create concrete or abstract categories that contain analogous items. Tomasello, a most prominent developmental scientist with research foci on child language acquisition and on social cognition and social learning in children and primates, succinctly and clearly introduces the major points of his theory and his views on the origins of language in the initial chapters. In subsequent chapters, he delves into the details by covering most language acquisition domains, namely, word (lexical) learning, syntax, and morphology and conversation, narrative, and extended discourse. Although one of the remaining domains (pragmatics) is at the core of his theory and permeates the text throughout, the relative paucity of passages explicitly devoted to discussing acquisition and proBOOK REVIEWS

1,757 citations

Journal ArticleDOI
TL;DR: The author guides the reader in about 350 pages from descriptive and basic statistical methods over classification and clustering to (generalised) linear and mixed models to enable researchers and students alike to reproduce the analyses and learn by doing.
Abstract: The complete title of this book runs ‘Analyzing Linguistic Data: A Practical Introduction to Statistics using R’ and as such it very well reflects the purpose and spirit of the book. The author guides the reader in about 350 pages from descriptive and basic statistical methods over classification and clustering to (generalised) linear and mixed models. Each of the methods is introduced in the context of concrete linguistic problems and demonstrated on exciting datasets from current research in the language sciences. In line with its practical orientation, the book focuses primarily on using the methods and interpreting the results. This implies that the mathematical treatment of the techniques is held at a minimum if not absent from the book. In return, the reader is provided with very detailed explanations on how to conduct the analyses using R [1]. The first chapter sets the tone being a 20-page introduction to R. For this and all subsequent chapters, the R code is intertwined with the chapter text and the datasets and functions used are conveniently packaged in the languageR package that is available on the Comprehensive R Archive Network (CRAN). With this approach, the author has done an excellent job in enabling researchers and students alike to reproduce the analyses and learn by doing. Another quality as a textbook is the fact that every chapter ends with Workbook sections where the user is invited to exercise his or her analysis skills on supplemental datasets. Full solutions including code, results and comments are given in Appendix A (30 pages). Instructors are therefore very well served by this text, although they might want to balance the book with some more mathematical treatment depending on the target audience. After the introductory chapter on R, the book opens on graphical data exploration. Chapter 3 treats probability distributions and common sampling distributions. Under basic statistical methods (Chapter 4), distribution tests and tests on means and variances are covered. Chapter 5 deals with clustering and classification. Strangely enough, the clustering section has material on PCA, factor analysis, correspondence analysis and includes only one subsection on clustering, devoted notably to hierarchical partitioning methods. The classification part deals with decision trees, discriminant analysis and support vector machines. The regression chapter (Chapter 6) treats linear models, generalised linear models, piecewise linear models and a substantial section on models for lexical richness. The final chapter on mixed models is particularly interesting as it is one of the few text book accounts that introduce the reader to using the (innovative) lme4 package of Douglas Bates which implements linear mixed-effects models. Moreover, the case studies included in this

1,679 citations

01 Jan 2016

933 citations

Journal ArticleDOI
TL;DR: This article used a corpus of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature to test the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence.
Abstract: This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al. (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.’s models and ours learn systematic generalizations like subject-verb-object order. However, all models we test perform far below human level on a wide range of grammatical constructions.

903 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A structural probe is proposed, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space, and shows that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax Trees are embedded implicitly in deep models’ vector geometry.
Abstract: Recent work has improved our ability to detect linguistic knowledge in word representations. However, current methods for detecting syntactic knowledge do not test whether syntax trees are represented in their entirety. In this work, we propose a structural probe, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space. The probe identifies a linear transformation under which squared L2 distance encodes the distance between words in the parse tree, and one in which squared L2 norm encodes depth in the parse tree. Using our probe, we show that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax trees are embedded implicitly in deep models’ vector geometry.

888 citations