Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation.

Open Access

Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation.

- pp 52-60

TLDR

It is shown that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect, and three recommendations for future work that uses RouGE to evaluate opinion summarisation are presented.

Abstract:

One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally.In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Few-Shot Learning for Opinion Summarization

Arthur Bražinskas, +2 more

TL;DR: This work shows that even a handful of summaries is sufficient to bootstrap generation of the summary text with all expected properties, such as writing style, informativeness, fluency, and sentiment preservation.

...read moreread less

Journal ArticleDOI

Efficient Few-Shot Fine-Tuning for Opinion Summarization

Arthur Bravzinskas, +3 more

TL;DR: This work utilizes an efﬁcient few-shot method based on adapters which can easily store in-domain knowledge and improves summary quality over standard ﬁne-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets, respectively.

...read moreread less

Journal ArticleDOI

Template-based Abstractive Microblog Opinion Summarization

Iman Munire Bilal, +5 more

- 08 Aug 2022 -

Transactions of the Association for Comp...

TL;DR: The task of microblog opinion summarisation (MOS) is introduced and a dataset of 3100 gold-standard opinion summaries is shared to facilitate research in this domain and a range of abstractive and extractive state-of-the-art summarisation models are benchmarked and good performance is achieved.

...read moreread less

Posted Content

Learning Opinion Summarizers by Selecting Informative Reviews

Arthur Bražinskas, +2 more

- 09 Sep 2021 -

arXiv: Computation and Language

TL;DR: In this paper, a large dataset of summaries paired with user reviews for over 31,000 products is collected to train a summarizer trained on random review subsets to select informative subsets of reviews and summarize the opinions expressed in these subsets.

...read moreread less

Proceedings ArticleDOI

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

Nachshon Cohen, +3 more

TL;DR: This paper presented a dataset based on article summaries appearing on the WikiHow website, composed of how-to articles and coherent-paragraph summaries written in plain language, which made human evaluation significantly easier and thus, more effective.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Book

Sentiment Analysis and Opinion Mining

Bing Liu

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.

...read moreread less

Proceedings ArticleDOI

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Bo Pang, +1 more

TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.

...read moreread less

Related Papers (5)

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

Kavita Ganesan

- 01 Mar 2015 -

arXiv: Information Retrieval

Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation.

Citations

Few-Shot Learning for Opinion Summarization

Efficient Few-Shot Fine-Tuning for Opinion Summarization

Template-based Abstractive Microblog Opinion Summarization

Learning Opinion Summarizers by Selecting Informative Reviews

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

References

Bleu: a Method for Automatic Evaluation of Machine Translation

WordNet : an electronic lexical database

ROUGE: A Package for Automatic Evaluation of Summaries

Sentiment Analysis and Opinion Mining

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Related Papers (5)

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

Looking for a Few Good Metrics: ROUGE and its Evaluation

Looking for a Few Good Metrics: Automatic Summarization Evaluation — How Many Samples Are Enough?

ROUGE: A Package for Automatic Evaluation of Summaries

Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised