Topic
Word order
About: Word order is a research topic. Over the lifetime, 5051 publications have been published within this topic receiving 130611 citations.
Papers published on a yearly basis
Papers
More filters
•
05 Dec 2013TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.
An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
24,012 citations
•
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
11,343 citations
•
01 Jan 1984
TL;DR: Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 1984.
Abstract: Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 1984.
1,086 citations
••
TL;DR: In this paper, the authors investigated the behavior of subjects in Germanic, Celtic/Arabic, Romance, and Greek, and showed that Germanic and Greek are two major classes of move/merge X0 languages.
Abstract: The paper investigates a number of asymmetries in the behavior of subjects in Germanic, Celtic/Arabic, Romance, and Greek. The languages under investigation divide into two main groups with respect to a cluster of properties, including the availability of pro-drop with referential subjects, the possibility of VSO/VOS orders, the A/A′ status of subjects in SVO orders, the presence/absence of Definiteness Restriction (DR)-effects in unaccusative constructions, the existence of verb-raising independently of V-2, and others. We argue that the key factor in this split is a parametrization in the way the Extended Projection Principle (EPP) is checked: move/merge XP vs. move/merge X0. The first option is taken in Germanic, the second in Celtic, Greek, and Romance. According to our proposal, the EPP relates to checking of a nominal feature of AGR (cf. Chomsky 1995), and move/merge X0 languages satisfy the EPP via V-raising, as their verbal agreement morphology includes the requisite nominal feature (cf. Taraldsen 1978). Moreover, we demonstrate that the further differences that exist between Celtic/Arabic on the one hand and Romance/Greek on the other are related to the parametric availability of Spec,TP for subjects (cf. Jonas and Bobaljik 1993, Bobaljik and Jonas 1996). In Celtic and Arabic, Spec,TP for subjects is licensed, resulting in VSO orders with VP external subjects. In Greek and Romance, Spec,TP is not licensed, resulting in 'subject inverted' orders with VP internal subjects. In other words, we show that within the class of move/merge X0 languages, a further partition emerges which is due to the same parameter dividing Germanic languages into two major classes. We demonstrate that combining the proposed EPP/AGR parameter with the Spec,TP parameter gives four language-types with distinct properties.
769 citations
••
TL;DR: This second edition has been revised and updated to take full account of new research in universals and typology in the past decade, and more generally to consider how the approach advocated here relates to recent advances in generative grammatical theory.
Abstract: Since its first publication, "Language Universals and Linguistic Typology" has become established as the leading introductory account of one of the most productive areas of linguistics-the analysis, comparison, and classification of the common features and forms of the organization of languages. Adopting an approach to the subject pioneered by Greenberg and others, Bernard Comrie is particularly concerned with syntactico-semantic universals, devoting chapters to word order, case making, relative clauses, and causative constructions. His book is informed throughout by the conviction that an exemplary account of universal properties of human language cannot restrict itself to purely formal aspects, nor focus on analysis of a single language. Rather, it must also consider language use, relate formal properties to testable claims about cognition and cognitive development, and treat data from a wide range of languages. This second edition has been revised and updated to take full account of new research in universals and typology in the past decade, and more generally to consider how the approach advocated here relates to recent advances in generative grammatical theory.
739 citations