scispace - formally typeset
Open Access

Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation.

TLDR
It is shown that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect, and three recommendations for future work that uses RouGE to evaluate opinion summarisation are presented.
Abstract
One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally.In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.

read more

Citations
More filters
Proceedings ArticleDOI

Few-Shot Learning for Opinion Summarization

TL;DR: This work shows that even a handful of summaries is sufficient to bootstrap generation of the summary text with all expected properties, such as writing style, informativeness, fluency, and sentiment preservation.
Journal ArticleDOI

Efficient Few-Shot Fine-Tuning for Opinion Summarization

TL;DR: This work utilizes an efficient few-shot method based on adapters which can easily store in-domain knowledge and improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets, respectively.
Journal ArticleDOI

Template-based Abstractive Microblog Opinion Summarization

TL;DR: The task of microblog opinion summarisation (MOS) is introduced and a dataset of 3100 gold-standard opinion summaries is shared to facilitate research in this domain and a range of abstractive and extractive state-of-the-art summarisation models are benchmarked and good performance is achieved.
Posted Content

Learning Opinion Summarizers by Selecting Informative Reviews

TL;DR: In this paper, a large dataset of summaries paired with user reviews for over 31,000 products is collected to train a summarizer trained on random review subsets to select informative subsets of reviews and summarize the opinions expressed in these subsets.
Proceedings ArticleDOI

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

TL;DR: This paper presented a dataset based on article summaries appearing on the WikiHow website, composed of how-to articles and coherent-paragraph summaries written in plain language, which made human evaluation significantly easier and thus, more effective.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Book

Sentiment Analysis and Opinion Mining

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Proceedings ArticleDOI

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.