scispace - formally typeset
Open AccessProceedings ArticleDOI

Automatic evaluation of summaries using N-gram co-occurrence statistics

TLDR
The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.
Abstract
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings Article

TextRank: Bringing Order into Text

Rada Mihalcea, +1 more
TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI

Multimodal Machine Learning: A Survey and Taxonomy

TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Journal ArticleDOI

Inter-coder agreement for computational linguistics

TL;DR: It is argued that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Journal ArticleDOI

An algorithm for suffix stripping

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Proceedings ArticleDOI

Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

TL;DR: NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
Book

Evaluating Natural Language Processing Systems: An Analysis and Review

TL;DR: This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems, and provides a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations.
Proceedings Article

Tracking and summarizing news on a daily basis with Columbia's Newsblaster

TL;DR: Columbia's Newsblaster system for online news summarization is presented, a system that crawls the web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster.