Subtopic-Focused Sentence Scoring in Multi-document Summarization

doi:10.1109/ALPIT.2007.106

Proceedings ArticleDOI

Subtopic-Focused Sentence Scoring in Multi-document Summarization

- pp 98-104

TLDR

This paper proposes a subtopic- focused model to score sentences in the extractive summarization task via a hierarchical Bayesian model, through which sentences are scored and extracted as summary.

Abstract:

In previous works, subtopics are seldom mentioned in multi-document summarization while only one topic is focused to extract summary. In this paper, we propose a subtopic- focused model to score sentences in the extractive summarization task. Different with supervised methods, it does not require costly manual work to form the training set. Multiple documents are represented as mixture over subtopics, denoted by term distributions through unsupervised learning. Our method learns the subtopic distribution over sentences via a hierarchical Bayesian model, through which sentences are scored and extracted as summary. Experiments on DUC 2006 data are performed and the ROUGE evaluation results show that the proposed method can reach the state-of-the-art performance.

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Proceedings ArticleDOI

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Dragomir R. Radev, +2 more

TL;DR: A multi-document summarizer, called MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and two new techniques, based on sentence utility and subsumption, are described.

...read moreread less

Posted Content

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Dragomir R. Radev, +2 more

- 12 May 2000 -

arXiv: Computation and Language

TL;DR: This article presented a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system, and also described two new techniques, based on sentence utility and subsumption, which have applied to the evaluation of both single and multiple document summaries.

...read moreread less

Proceedings Article

Manifold-ranking based topic-focused multi-document summarization

Xiaojun Wan, +2 more

TL;DR: A novel extractive approach based on manifold-ranking of sentences to this summarization task can significantly outperform existing approaches of the top performing systems in DUC tasks and baseline approaches.

...read moreread less

Computer Speech & Language

Subtopic-Focused Sentence Scoring in Multi-document Summarization

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Manifold-ranking based topic-focused multi-document summarization

Related Papers (5)

Automatic Text Summarization Using Unsupervised and Semi-supervised Learning

Using NMF-based text summarization to improve supervised and unsupervised classification

An experimental comparison of supervised and unsupervised approaches to text summarization

Evaluating the effectiveness of features and sampling in extractive meeting summarization

GA, MR, FFNN, PNN and GMM based models for automatic text summarization