Journal•ISSN: 0010-4817

Computers and The Humanities

Springer Nature

About: Computers and The Humanities is an academic journal. The journal publishes majorly in the area(s): Computational linguistics & Natural language. It has an ISSN identifier of 0010-4817. Over the lifetime, 894 publications have been published receiving 13691 citations.

...read moreread less

Topics: Computational linguistics, Natural language, Higher education, Word processing, Literary criticism ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

GATE, a General Architecture for Text Engineering

[...]

Hamish Cunningham¹•Institutions (1)

University of Sheffield¹

01 May 2002-Computers and The Humanities

TL;DR: GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

...read moreread less

Abstract: This paper presents the design, implementation and evaluation of GATE, a General Architecture for Text Engineering.GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

...read moreread less

634 citations

Journal Article•DOI•

A method for disambiguating word senses in a large corpus

[...]

William A. Gale, Kenneth Church, David Yarowsky

01 Dec 1992-Computers and The Humanities

TL;DR: The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval.

...read moreread less

Abstract: Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Both quantitive and qualitative methods have been tried, but much of this work has been stymied by difficulties in acquiring appropriate lexical resources. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 92% accuracy in discriminating between two very distinct senses of a noun. In the training phase, we collect a number of instances of each sense of the polysemous noun. Then in the testing phase, we are given a new instance of the noun, and are asked to assign the instance to one of the senses. We attempt to answer this question by comparing the context of the unknown instance with contexts of known instances using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval. The proposed method is probably most appropriate for those aspects of sense disambiguation that are closest to the information retrieval task. In particular, the proposed method was designed to disambiguate senses that are usually associated with different topics.

...read moreread less

614 citations

Journal Article•DOI•

"I don't believe in word senses"*

[...]

Adam Kilgarriff¹•Institutions (1)

University of Brighton¹

23 Dec 1997-Computers and The Humanities

TL;DR: In this paper, an analysis is presented in which wordsenses are abstractions from clusters of corpus citations, inaccordance with current lexicographic practice, where the corpus citations are the basic objects in the ontology.

...read moreread less

Abstract: Word sense disambiguation assumes word senses. Withinthe lexicography and linguistics literature, they areknown to bevery slippery entities. The first part of the paperlooks at problemswith existing accounts of ‘word sense’ and describesthe various kinds of ways in which a word's meaning candeviate from its coremeaning. An analysis is presented in which wordsenses areabstractions from clusters of corpus citations, inaccordance withcurrent lexicographic practice. The corpus citations,not the wordsenses, are the basic objects in the ontology. Thecorpus citationswill be clustered into senses according to thepurposes of whoever or whatever does the clustering. In theabsence of suchpurposes, word senses do not exist. Word sense disambiguation also needs a set of wordsenses todisambiguate between. In most recent work, the sethas been takenfrom a general-purpose lexical resource, with theassumption that thelexical resource describes the word senses ofEnglish/French/...,between which NLP applications will need todisambiguate. Theimplication of the first part of the paper is, bycontrast, that wordsenses exist only relative to a task. Thefinal part of the paper pursues this, exploring, bymeans of asurvey, whether and how word sense ambiguity is infact a problem forcurrent NLP applications.

...read moreread less

419 citations

Journal Article•DOI•

How variable may a constant be? Measures of lexical richness in perspective

[...]

Fiona J. Tweedie¹, R. Harald Baayen²•Institutions (2)

University of Glasgow¹, Max Planck Society²

01 Sep 1998-Computers and The Humanities

TL;DR: The results suggest that the empirical trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.

...read moreread less

Abstract: A well-known problem in the domain of quantitative linguistics and stylistics concerns the evaluation of the lexical richness of texts. Since the most obvious measure of lexical richness, the vocabulary size (the number of different word types), depends heavily on the text length (measured in word tokens), a variety of alternative measures has been proposed which are claimed to be independent of the text length. This paper has a threefold aim. Firstly, we have investigated to what extent these alternative measures are truly textual constants. We have observed that in practice all measures vary substantially and systematically with the text length. We also show that in theory, only three of these measures are truly constant or nearly constant. Secondly, we have studied the extent to which these measures tap into different aspects of lexical structure. We have found that there are two main families of constants, one measuring lexical richness and one measuring lexical repetition. Thirdly, we have considered to what extent these measures can be used to investigate questions of textual similarity between and within authors. We propose to carry out such comparisons by means of the empirical trajectories of texts in the plane spanned by the dimensions of lexical richness and lexical repetition, and we provide a statistical technique for constructing confidence intervals around the empirical trajectories of texts. Our results suggest that the trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.

...read moreread less

391 citations

Journal Article•DOI•

C-rater: Automated Scoring of Short-Answer Questions

[...]

Claudia Leacock¹, Martin Chodorow²•Institutions (2)

Princeton University¹, City University of New York²

01 Nov 2003-Computers and The Humanities

TL;DR: C-rater is an automated scoring engine that has been developed to score responses to content-based short answer questions using predicateargument structure, pronominal reference, morphological analysis and synonyms to assign full or partial credit to a short answer question.

...read moreread less

Abstract: C-rater is an automated scoringengine that has been developed to scoreresponses to content-based short answerquestions. It is not simply a stringmatching program – instead it uses predicateargument structure, pronominal reference,morphological analysis and synonyms to assignfull or partial credit to a short answerquestion. C-rater has been used in two studies:National Assessment for Educational Progress(NAEP) and a statewide assessment in Indiana.In both studies, c-rater agreed with humangraders about 84% of the time.

...read moreread less

363 citations

Collapse

Network Information

Related Journals (5)

Computational Linguistics

1.4K papers, 154.8K citations

5.9K papers, 268.3K citations

78% related

Information Processing and Management

3.8K papers, 151.6K citations

77% related

arXiv: Computation and Language

24.8K papers, 481.5K citations

3.5K papers, 217.5K citations

73% related

Performance

Metrics

894

Papers

14,141

Citations

No. of papers from the Journal in previous years
Year	Papers
2004	23
2003	32
2002	25
2001	24
2000	38
1999	28