scispace - formally typeset
Search or ask a question
JournalISSN: 0268-1145

Literary and Linguistic Computing 

Oxford University Press
About: Literary and Linguistic Computing is an academic journal published by Oxford University Press. The journal publishes majorly in the area(s): Corpus linguistics & Stylometry. It has an ISSN identifier of 0268-1145. Over the lifetime, 854 publications have been published receiving 21124 citations. The journal is also known as: Literary and linguistic computing & Literary & linguistic computing.


Papers
More filters
Journal ArticleDOI
TL;DR: At its best (in my opinion mainly the first four chapters), it provides a broad overview of humanities computing and literature in that field and at its worst, it threatens to prove that a reader does not necessarily need electronic texts to get lost in his/her quest for relevant information.
Abstract: methodologies, advances and insights in the different research areas that are discussed. However, some critical notes can be made about the organization of the materials. Of course, each of the subjects addressed fills whole bookshelves of monographs (and many of those are mentioned and listed in the bibliography). Yet, the limited space in some places downgrades the abundant amount of research reviews to a shallowly annotated list of references (for example, chapter 5 mentions about thirty studies in only eighteen pages). The lack of graphical materials (actually, chapter 4 is the only one containing figures, in the form of tables) is sometimes annoying, especially in descriptions of electronic products and their interfaces. Also, the combination of a loose structure and interdisciplinary approach is not always successful. Often Hockey hooks her narrative to methodological pegs, other times the reader has to notice a transition to a more thematic organization. This also causes some methodological-theoretical issues, like for example corpus design, to reappear in different places throughout the book, whereas it would perhaps be more suitable to treat them in a less fragmented manner. The boundaries between the different research areas in the chapters are not always very clear either. Some topics in the fifth chapter on literary analysis (echoed phrases, genre, gender analysis) and the sixth chapter on linguistic analysis (analysis of lexical features to characterize textual styles) seem to overlap with the seventh chapter on stylometry and attribution studies (in which only the latter subject is highlighted, however). This may result from an imbalance in assumed background knowledge about the techn(olog)ical aspects of electronic texts on the one hand, and the different research areas that make use of them on the other hand. Sometimes the narrative prevails over clear structuring of and motivation for the items covered. Hockey very rigidly introduces the technological concepts in a very comprehensible manner, yet neglects to introduce the chapters discussing their application in the different research fields with a clear theoretical definition of those fields. This leads me to a (maybe provoking, but hopefully balanced) conclusion about the book. At its best (in my opinion mainly the first four chapters), it provides a broad overview of humanities computing and literature in that field. At its worst (chapters 5 to 9), it threatens to prove that a reader does not necessarily need electronic texts to get lost in his/her quest for relevant information.

1,042 citations

Journal ArticleDOI
TL;DR: The paper distinguishes among various ways that linguistic features can be distributed within and across texts; it analyzes the distributions of several particular features, and it discusses the implications of these distributions for corpus design.
Abstract: The present paper addresses a number of issues related to achieving ‘representativeness’ in linguistic corpus design, including: discussion of what it means to `represent’ a language, definition of the target population, stratified versus proportional sampling of a language, sampling within texts, and issues relating to the required sample size (number of texts) of a corpus. The paper distinguishes among various ways that linguistic features can be distributed within and across texts; it analyzes the distributions of several particular features, and it discusses the implications of these distributions for corpus design.

864 citations

Journal ArticleDOI

764 citations

Journal ArticleDOI
TL;DR: It is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy.
Abstract: The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98 per cent accuracy.

667 citations

Journal ArticleDOI
TL;DR: A new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship, which offers a simple but comparatively accurate addition to current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length.
Abstract: This paper is a companion to my 'Questions of authorship: attribution and beyond', in which I sketched a new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship. The main emphasis of that paper was not on the new procedure but on the broader consequences of our increasing sophistication in making such comparisons and the increasing (although never absolute) reliability of our inferences about authorship. My present objects, accordingly, are to give a more complete account of the procedure itself; to report the outcome of an extensive set of trials; and to consider the strengths and limitations of the new procedure. The procedure offers a simple but comparatively accurate addition to our current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length. It is of even greater value as a method of reducing the field of likely candidates for texts of as little as 100 words in length. Not unexpectedly, it works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career. Its possible use for other classificatory tasks has not yet been investigated.

457 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20221
20152
201444
201360
201239
201137