Automatic evaluation of summaries using N-gram co-occurrence statistics
Chin-Yew Lin,Eduard Hovy +1 more
- pp 71-78
TLDR
The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.Abstract:
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.read more
Citations
More filters
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings Article
TextRank: Bringing Order into Text
Rada Mihalcea,Paul Tarau +1 more
TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
Journal ArticleDOI
LexRank: graph-based lexical centrality as salience in text summarization
Gunes Erkan,Dragomir R. Radev +1 more
TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI
Multimodal Machine Learning: A Survey and Taxonomy
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Journal ArticleDOI
Inter-coder agreement for computational linguistics
TL;DR: It is argued that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.
References
More filters
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Journal ArticleDOI
An algorithm for suffix stripping
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Proceedings ArticleDOI
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
Book
Evaluating Natural Language Processing Systems: An Analysis and Review
TL;DR: This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems, and provides a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations.
Proceedings Article
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
Kathleen R. McKeown,Regina Barzilay,David Evans,Vasileios Hatzivassiloglou,Judith L. Klavans,Ani Nenkova,Carl Sable,Barry Schiffman,Sergey Sigelman +8 more
TL;DR: Columbia's Newsblaster system for online news summarization is presented, a system that crawls the web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster.