•Journal•ISSN: 0891-2017

Computational Linguistics

Association for Computational Linguistics

About: Computational Linguistics is an academic journal published by Association for Computational Linguistics. The journal publishes majorly in the area(s): Parsing & Machine translation. It has an ISSN identifier of 0891-2017. It is also open access. Over the lifetime, 1464 publications have been published receiving 154866 citations.

...read moreread less

Topics: Parsing, Machine translation, Natural language, Computer science, Computational linguistics ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Report•DOI•

Building a large annotated corpus of English: the penn treebank

[...]

Mitchell Marcus¹, Mary Ann Marcinkiewicz¹, Beatrice Santorini²•Institutions (2)

University of Pennsylvania¹, Northwestern University²

01 Jun 1993-Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Abstract: : As a result of this grant, the researchers have now published oil CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, with over 3 million words of that material assigned skeletal grammatical structure This material now includes a fully hand-parsed version of the classic Brown corpus About one half of the papers at the ACL Workshop on Using Large Text Corpora this past summer were based on the materials generated by this grant

...read moreread less

8,377 citations

Journal Article•

The mathematics of statistical machine translation: parameter estimation

[...]

Peter Fitzhugh Brown¹, Vincent J. Della Pietra¹, Stephen A. Della Pietra¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

01 Jun 1993-Computational Linguistics

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.

...read moreread less

Abstract: We describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another. We define a concept of word-by-word alignment between such pairs of sentences. For any given pair of such sentences each of our models assigns a probability to each of the possible word-by-word alignments. We give an algorithm for seeking the most probable of these alignments. Although the algorithm is suboptimal, the alignment thus obtained accounts well for the word-by-word relationships in the pair of sentences. We have a great deal of data in French and English from the proceedings of the Canadian Parliament. Accordingly, we have restricted our work to these two languages; but we feel that because our algorithms have minimal linguistic content they would work well on other pairs of languages. We also feel, again because of the minimal linguistic content of our algorithms, that it is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus.

...read moreread less

4,693 citations

Journal Article•DOI•

A systematic comparison of various statistical alignment models

[...]

Franz Josef Och¹, Hermann Ney²•Institutions (2)

Information Sciences Institute¹, RWTH Aachen University²

01 Mar 2003-Computational Linguistics

TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.

...read moreread less

Abstract: We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

...read moreread less

4,402 citations

Journal Article•DOI•

Word association norms, mutual information, and lexicography

[...]

Kenneth Church¹, Patrick Hanks•Institutions (1)

Bell Labs¹

01 Mar 1990-Computational Linguistics

TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

...read moreread less

Abstract: The term word association is used in a very particular sense in the psycholinguistic literature (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word) This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words

...read moreread less

4,272 citations

Journal Article•DOI•

A maximum entropy approach to natural language processing

[...]

Adam L. Berger¹, Vincent J. Della Pietra², Stephen A. Della Pietra²•Institutions (2)

Columbia University¹, Renaissance Technologies²

01 Mar 1996-Computational Linguistics

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

...read moreread less

Abstract: The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.

...read moreread less

3,392 citations

Collapse

Performance

Metrics

1,484

Papers

154,937

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	20
2022	50
2021	25
2020	25
2019	20
2018	28