Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Summarization with a Joint Model for Sentence Extraction and Compression

[...]

André F. T. Martins¹, Noah A. Smith¹•Institutions (1)

Carnegie Mellon University¹

04 Jun 2009

TL;DR: This work proposes a one-step approach for document summarization that jointly performs sentence extraction and compression by solving an integer linear program.

...read moreread less

Abstract: Text summarization is one of the oldest problems in natural language processing. Popular approaches rely on extracting relevant sentences from the original documents. As a side effect, sentences that are too long but partly relevant are doomed to either not appear in the final summary, or prevent inclusion of other relevant sentences. Sentence compression is a recent framework that aims to select the shortest subsequence of words that yields an informative and grammatical sentence. This work proposes a one-step approach for document summarization that jointly performs sentence extraction and compression by solving an integer linear program. We report favorable experimental results on newswire data.

...read moreread less

132 citations

Posted Content•

Faithful to the Original: Fact Aware Neural Abstractive Summarization

[...]

Ziqiang Cao¹, Furu Wei², Wenjie Li¹, Sujian Li³•Institutions (3)

Hong Kong Polytechnic University¹, Microsoft², Peking University³

13 Nov 2017-arXiv: Information Retrieval

TL;DR: This work argues that faithfulness is also a vital prerequisite for a practical abstractive summarization system and proposes a dual-attention sequence-to-sequence framework to force the generation conditioned on both the source text and the extracted fact descriptions.

...read moreread less

Abstract: Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

...read moreread less

132 citations

Proceedings Article•DOI•

Comments-oriented document summarization: understanding documents with readers' feedback

[...]

Meishan Hu¹, Aixin Sun¹, Ee-Peng Lim¹•Institutions (1)

Nanyang Technological University¹

20 Jul 2008

TL;DR: The proposed summarization methods utilizing comments showed significant improvement over those not using comments, and the methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach.

...read moreread less

Abstract: Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization In this paper, we study the problem of comments-oriented document summarization and aim to summarize a Web document (eg, a blog post) by considering not only its content, but also the comments left by its readers We identify three relations (namely, topic, quotation, and mention) by which comments can be linked to one another, and model the relations in three graphs The importance of each comment is then scored by: (i) graph-based method, where the three graphs are merged into a multi-relation graph; (ii) tensor-based method, where the three graphs are used to construct a 3rd-order tensor To generate a comments-oriented summary, we extract sentences from the given Web document using either feature-biased approach or uniform-document approach The former scores sentences to bias keywords derived from comments; while the latter scores sentences uniformly with comments In our experiments using a set of blog posts with manually labeled sentences, our proposed summarization methods utilizing comments showed significant improvement over those not using comments The methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach

...read moreread less

130 citations

Proceedings Article•DOI•

Entity-centric topic-oriented opinion summarization in twitter

[...]

Xinfan Meng¹, Furu Wei², Xiaohua Liu², Ming Zhou², Sujian Li¹, Houfeng Wang¹ - Show less +2 more•Institutions (2)

Peking University¹, Microsoft²

12 Aug 2012

TL;DR: This paper proposes an entity-centric topic-based opinion summarization framework, which aims to produce opinion summaries in accordance with topics and remarkably emphasizing the insight behind the opinions in Twitter.

...read moreread less

Abstract: Microblogging services, such as Twitter, have become popular channels for people to express their opinions towards a broad range of topics. Twitter generates a huge volume of instant messages (i.e. tweets) carrying users' sentiments and attitudes every minute, which both necessitates automatic opinion summarization and poses great challenges to the summarization system. In this paper, we study the problem of opinion summarization for entities, such as celebrities and brands, in Twitter. We propose an entity-centric topic-based opinion summarization framework, which aims to produce opinion summaries in accordance with topics and remarkably emphasizing the insight behind the opinions. To this end, we first mine topics from #hashtags, the human-annotated semantic tags in tweets. We integrate the #hashtags as weakly supervised information into topic modeling algorithms to obtain better interpretation and representation for calculating the similarity among them, and adopt Affinity Propagation algorithm to group #hashtags into coherent topics. Subsequently, we use templates generalized from paraphrasing to identify tweets with deep insights, which reveal reasons, express demands or reflect viewpoints. Afterwards, we develop a target (i.e. entity) dependent sentiment classification approach to identifying the opinion towards a given target (i.e. entity) of tweets. Finally, the opinion summary is generated through integrating information from dimensions of topic, opinion and insight, as well as other factors (e.g. topic relevancy, redundancy and language styles) in an unified optimization framework. We conduct extensive experiments on a real-life data set to evaluate the performance of individual opinion summarization modules as well as the quality of the produced summary. The promising experiment results show the effectiveness of the proposed framework and algorithms.

...read moreread less

128 citations

Journal Article•DOI•

The use of domain-specific concepts in biomedical text summarization

[...]

Lawrence H. Reeve¹, Hyoil Han¹, Ari D. Brooks¹•Institutions (1)

Drexel University¹

01 Nov 2007-Information Processing and Management

TL;DR: Two independent methods for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources are presented and it is shown that the best performance is achieved when the two methods are combined.

...read moreread less

Abstract: Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician's evaluation of three randomly-selected papers from an evaluation corpus to show that the author's abstract does not always reflect the entire contents of the full-text.

...read moreread less

128 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics