scispace - formally typeset
Search or ask a question

Showing papers in "Corpus Linguistics and Linguistic Theory in 2016"


Journal ArticleDOI
TL;DR: The authors argue that fluctuating frequencies of grammatical variants in real time are a function not only of changing grammars but also conditioned by what they call "environmental" changes (for example, content changes) that affect the textual habitat.
Abstract: This paper is concerned with the limitations of inferring grammar change from variable text frequencies in historical corpus data. We argue that fluctuating frequencies of grammatical variants in real time are a function not only of changing grammars but are also conditioned by what we call ‘environmental’ changes (for example, content changes) that affect the textual habitat. As a case study, we explore the English genitive alternation in the Late Modern English period and demonstrate that the English s-genitive is and always has been preferably used with animate possessors; if for some reason animate NPs are rare in some specific historical period or text, this will trivially depress s-genitive rates and boost of-genitive rates. Against this backdrop, the paper advocates probing the probabilistic underpinning of grammatical variability in diachrony, for the sake of keeping apart trivial habitat-induced frequency change and grammar change proper.

38 citations


Journal ArticleDOI
TL;DR: A research environment is described that aims at making large-scale annotation of, and research into, speech acts and other linguistic levels possible in an efficient manner, and how the resulting annotations represent an improvement over existing models in the form of a brief case study.
Abstract: Abstract Corpus-based research into pragmatics is suffering from a distinct lack of suitably annotated corpora. This dilemma has so far generally forced researchers in corpus-based pragmatics to focus on well-known fixed expressions (e. g. discourse markers, politeness formulae, etc.) in their research, rather than being able to investigate interaction on the level of speech acts and other pragmatics-relevant features on a larger scale. This article describes a research environment that aims at remedying this problem (currently for English only) by making large-scale annotation of, and research into, speech acts and other linguistic levels possible in an efficient manner, at the same time discussing the difficulties and complexities inherent in such an endeavour. It then goes on to illustrate the efficiency of the approach, and how the resulting annotations represent an improvement over existing models in the form of a brief case study. The latter includes an illustrative discussion of the performance of the tool in annotating a subset of 100 files from the Switchboard corpus, plus a more detailed comparison of the automatically annotated version of one of the files with its original, manually annotated, version.

23 citations


Journal ArticleDOI
TL;DR: In this article, a combination of symmetric and asymmetric association measures is used to study the productivity of A as NP, which is not a construction but merely a pattern of coining due to its limited type productivity.
Abstract: Non-redundant taxonomic models of construction grammar posit that only fully productive patterns qualify as constructions because they license an infinity of expressions. Redundant models claim that, despite subregularities and exceptions, partially productive patterns also count as constructions , providing the overall meanings of such patterns are not the strict sums of their parts. Because productivity is a major bone of contention between redundant and non-redundant construction grammar taxonomies, I examine the productivity of A as NP which, according to Kay (2013), is not a 'construction' but merely a 'pattern of coining' due to its limited type productivity. Expanding on Gries (2013), this paper explores how a combination of symmetric and asymmet-ric association measures can contribute to the study of the 'Productivity Complex' described in Zeldes (2012). Although the productivity of A as NP is admittedly limited at its most schematic level, some partially-filled subschemas such as white/black as NP or A as hell/death are arguably productive.

21 citations


Journal ArticleDOI
TL;DR: This paper presents ongoing work on Säily and Suomela’s (2009) method of comparing type frequencies across subcorpora and raises methodological issues involving periodization, multiple hypothesis testing, and the need for an interactive tool.
Abstract: Abstract This paper presents ongoing work on Säily and Suomela’s (2009) method of comparing type frequencies across subcorpora. The method is here used to study variation in the productivity of the suffixes -ness and -ity in the eighteenth-century sections of the Corpora of Early English Correspondence and of the Old Bailey Corpus (OBC). Unlike the OBC, the eighteenth-century section of the letter corpora differs from previously studied materials in that there is no significant gender difference in the productivity of -ity. The study raises methodological issues involving periodization, multiple hypothesis testing, and the need for an interactive tool. Several improvements have been implemented in a new version of our software.

15 citations


Journal ArticleDOI
TL;DR: In this paper, the authors trace the development and distribution of dummy it with the aim of shedding new light on the role of transitivity in language change and show that dummy it modulates transitivity according to the changing entrenchment of the verb it is used with.
Abstract: Abstract While dummy it in subject position (It is raining) has received much scholarly attention, its use in object position has rarely been investigated. When considering examples such as to leg it, to snuff it, we are faced with the question of what motivates the occurrence of dummy it. Using corpus-based techniques that permit context-sensitive retrieval in historical data the present paper traces the development and distribution of dummy it with the aim of shedding new light on the role of transitivity in language change. The occurrence of non-referential it can be related to a series of (de-)transitivization processes that have recently been observed for weakly entrenched verbs or verb senses. Defining transitivity with Hopper and Thompson (1980: 251) as the effectiveness with which an action takes place, it is argued that one function of it is to enhance a verb’s transitivity by equipping it with a pseudo-object. Such moderately transitive uses have also been observed for other verbs that do not normally take direct objects, e.g. verbs occurring with cognate objects, way-constructions or reflexive structures. This article presents corpus-based findings showing that dummy it modulates transitivity in accordance with the changing entrenchment of the verb it is used with.

12 citations


Journal ArticleDOI
TL;DR: The authors examined the diachronic development of the that/zero alternation with three verbs of cognition, viz. think, believe, and suppose by means of a stepwise logistic regression analysis.
Abstract: This corpus-based study examines the diachronic development of the that/zero alternation with three verbs of cognition, viz. think, believe, and suppose by means of a stepwise logistic regression analysis. The data comprised a total of (n = 9,720) think, (n = 4,767) believe, and (n = 4,083) suppose tokens from both spoken and written corpora from 1560 to 2012. We test the effect of 11 structural features that have been claimed to predict the presence of the zero complementizer form. Taking our cue from previous research suggesting that there has been a diachronic increase in zero use and applying a rigorous quantitative method to a large set of diachronic data, we examine (i) whether there is indeed a diachronic trend toward more zero use, (ii) whether the conditioning factors proposed in the literature indeed predict the zero form, (iii) to what extent these factors interact, and (iv) whether the predictive power of the conditioning factors becomes stronger or weaker over time. The analysis shows that, contrary to the aforementioned belief that the zero form has been on the increase, there is in fact a steady decrease in zero use. The extent of this decrease is not the same for all verbs. Also, the analysis of interactions with verb type indicates differences between verbs in terms of the predictive power of the conditioning factors. Additional significant interactions emerged, notably with verb, mode (i.e., spoken or written data), and period. The interactions with period show that certain factors that are good predictors of the zero form overall lose predictive power over time.

10 citations


Journal ArticleDOI
TL;DR: It is shown that significant benefits can be gained from the integration of corpus linguistics and grammaticalization theory, two subfields which, despite sharing considerable common ground, tended to remain as separate areas of linguistic analysis until quite recently.
Abstract: The article shows that significant benefits can be gained from the integration of corpus linguistics and grammaticalization theory, two subfields which, despite sharing considerable common ground, tended to remain as separate areas of linguistic analysis until quite recently. Making use of diachronic and contemporary corpora of English, such as the Helsinki Corpus, ARCHER, and COCA, the article illustrates how standard corpus practices can indeed contribute to our understanding of grammaticalization and related processes of language change. The selected case studies deal with the origin of existential there, the development of like-parentheticals in contemporary American English, and the history of the marker of expository apposition namely.

7 citations


Journal ArticleDOI
TL;DR: A large-scale multivariate statistical analysis of the choice of subject expression in the 1st person singular in spontaneous Finnish conversation, with a focus on the choice between pronominal and zero subject indicates that the choices are affected by both constructional and cognitive/discourse factors.
Abstract: Abstract The variability of subject expression has been extensively investigated across languages. We present a large-scale multivariate statistical analysis of the choice of subject expression in the 1st person singular in spontaneous Finnish conversation, with a focus on the choice between pronominal and zero subject. Spoken Finnish represents an interesting case, as the dominant type of subject expression is double marking, i. e. the combination of a pronominal subject marker (subject pronoun) and a verbal subject marker (person marking). Siewierska (1999, From anaphoric pronoun to grammatical agreement marker: Why objects don’t make it. Folia Linguistica 33(2). 225–251) notes that this type of marking is typologically rare. Our findings indicate that the choice of subject expression is affected by both constructional and cognitive/discourse factors, and that an important role in the choice of subject expression is played by the sequential structure of the conversation.

5 citations


Journal ArticleDOI
TL;DR: In an alternative corpus analysis, it has been found that, when these and other shortcomings in their research are dealt with, morphological irregularity and frequency are indeed strongly correlated variables also in Spanish.
Abstract: Fratini et al. (Fratini et al. 2014, Frequency and morphological irregularity are independent variables. Evidence from a corpus study of Spanish verbs. Corpus Linguistics and Linguistic Theory 10[2]. 289–314) concluded that frequency and morphological irregularity are in Spanish, unlike in English, independent variables. In this paper, I take issue with that claim. On the one hand, it is argued that the borders between regularity and irregularity are diffuse, many of the verbs they classify as irregular might, therefore, not be so. In addition, the choice of lexemes they analyzed was far from adequate. Their set of irregular verbs contained many verbs formed by adding some prefix to a more frequent irregular verb (e. g. a-venir, a-tener, con-decir, con-mover, etc.) and many highly infrequent lexemes in general, barely in use in the speech community (e. g. abnegar, ablandecer, amoblar, amodorrecer, etc.). In an alternative corpus analysis, it has been found that, when these and other shortcomings in their research are dealt with, morphological irregularity and frequency are indeed strongly correlated variables also in Spanish.

5 citations


Journal ArticleDOI
TL;DR: Findings show that the more verbal the head is, the more likely the structure of the phrase is governed by specifically the principle of complements-first, and this claim has consequences for considerations of prototypicality affecting verbal and nominal heads.
Abstract: Abstract This paper examines the design of verb phrases and noun phrases, focusing on the diachronic tendencies observed in the data in Middle English, Early Modern, and Late Modern English. The approach is corpus-based and the data, representing different periods and text types, is taken from the Penn-Helsinki Parsed Corpus of Middle English, the Penn-Helsinki Parsed Corpus of Early Modern English, and the Penn Parsed Corpus of Modern British English. The aim of this study is to look at the consequences that the placement of adjuncts (or modifiers) and complements has for the parsing of phrases in which they occur. First, I will examine whether the historical English data are in keeping with two determinants of word order, complements-first (complement plus adjunct) and end-weight. Second, I will consider the connection between the type of head and the distribution of its adjuncts and complements in noun phrases and verb phrases. My findings show that the more verbal the head is, the more likely the structure of the phrase is governed by specifically the principle of complements-first. On theoretical grounds, this claim has consequences for considerations of prototypicality affecting verbal and nominal heads. Third, I will show that a significant increase of complement-first phrases takes place when word order has become fixed in the language and is thus in keeping with the process of syntacticization of English word order.

4 citations


Journal ArticleDOI
TL;DR: This article brought together six contributions that showcase different corpus-based approaches to the study of historical developments in English, but when viewed in their mutual contexts, the papers serve to illuminate the unifying question that is given in the title of this introduction.
Abstract: This special issue brings together six contributions that showcase different corpusbased approaches to the study of historical developments in English. Each of the studies offers new empirical results on a given phenomenon of language change, but when viewed in their mutual contexts, the papers serve to illuminate the unifying question that is given in the title of this introduction. It is clear that during recent years, both corpus-linguistic resources and analytical techniques have been evolving at a remarkable rate. What is perhaps less clear is how the use of new resources and the application of new techniques can be put into the service of transforming our knowledge of how the English language changes. Beyond giving us more depth and precision, what do larger corpora and more sophisticated methodologies bring to the table in terms of description and theory? Despite all innovations, it is important to remember that the current developments in English historical corpus linguistics form part of a tradition that has been on-going for some time, and that owes much to the creation of the Helsinki corpus (Kytö 1991), and also to the long and fruitful connection between corpus linguistics and grammaticalization studies (Lindquist and Mair 2004). More and more diachronic resources have become available in the meantime, among them ARCHER (Biber et al. 1994), the Penn Parsed Corpora (Kroch et al. 1997), the

Journal ArticleDOI
TL;DR: The authors argued that the grammatical properties of Welsh grammatical gender are such that its unusual statistical properties follow, based on a series of corpus investigations using techniques from statistical natural language processing.
Abstract: Abstract Welsh grammatical gender exhibits several unusual properties. This paper argues that these properties are necessarily connected. The argument is based on a series of corpus investigations using techniques from statistical natural language processing, specifically distinguishing properties that exhibit significant statistical patterns from those which can be used to make useable predictions. Specifically, it’s shown that the grammatical properties of Welsh gender are such that its unusual statistical properties follow.

Journal ArticleDOI
TL;DR: In line with the hypothesis, inalienability had a weaker effect on the choice of construction in younger than in older bloggers, which suggests that the change is best viewed as semantic bleaching of PD rather than as a process in which PD is gaining ground at the expense of OP.
Abstract: Hebrew has two constructions that are used to convey possessive relations: ordinary possession (OP) and possessive dative (PD). PD is most often used when the 5 possessor is perceived as affected by the action or state described in the sentence. This study investigates the possibility that this tendency is gradually diminishing - in other words, that unaffected possessors in PD are in the process of becoming more accept- able. This hypothesis was evaluated in a blog corpus study, which focused on a central correlate of possessor affectedness: whether or not the possessed object was a body 10 part (inalienability). In line with the hypothesis, inalienability had a weaker effect on the choice of construction in younger than in older bloggers. The overall proportion of PD constructions was similar across age groups. This suggests that the change is best viewed as semantic bleaching of PD rather than as a process in which PD is gaining ground at the expense of OP. 15

Journal ArticleDOI
TL;DR: This paper examined twenty coercing verbs in Chinese, creating a coercion profile for each verb and conducting a cluster analysis based on the coercion profiles, finding that semantically related verbs tend to have similar coercion profiles.
Abstract: Abstract While much attention has been paid to the complement coercion operation in English (e.g., began a book), the same phenomenon in Chinese is still under-researched. Our study examines twenty coercing verbs in Chinese, creating a coercion profile for each verb and conducting a cluster analysis based on the coercion profiles. The results suggest that semantically related verbs in Chinese tend to have similar coercion profiles. We also identify a diverse range of nouns that can be coerced in Chinese. Finally, it is demonstrated that generative approaches to the complement coercion operation in Chinese can be complemented by cognitive-functional approaches.