scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings Article
Sun Park1
21 Jan 2009
TL;DR: The proposed automatic personalized text summarization agent using generic relevance weight based on non-negative matrix factorization (NMF) can improve the quality of summarization since extracting sentences to reflect the inherent semantics of the search results by using the weighted NMF.
Abstract: With the fast growth of the Internet access by user, it has increased the necessity of the personalized summarization method. This paper proposes automatic personalized text summarization agent using generic relevance weight based on non-negative matrix factorization (NMF). The proposed agent uses generic relevance weight to summarize generic summary so that it can extract sentences covering the major and sub topics of the search results with respect to user interesting. Besides, it can improve the quality of summarization since extracting sentences to reflect the inherent semantics of the search results by using the weighted NMF. The experimental results demonstrate that the proposed method achieves better performance the other methods.

3 citations

Book ChapterDOI
16 Aug 2010
TL;DR: This paper explores the use of unsupervised extractive summarization as a feature selection technique for document categorization and indicates that text summarization is a competitive approach for feature selection, and shows its appropriateness for situations having small training sets, where it could clearly outperform the traditional information gain technique.
Abstract: Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and discriminative information from the defined categories. Considering that training sets are extremely small in many classification tasks, in this paper we explore the use of unsupervised extractive summarization as a feature selection technique for document categorization. Our experiments using training sets of different sizes indicate that text summarization is a competitive approach for feature selection, and show its appropriateness for situations having small training sets, where it could clearly outperform the traditional information gain technique.

3 citations

Journal Article
Zhou Feng1
TL;DR: The experiment results show that the new method can slove the imbalance problem of Abstract and reduce the redundancy of the content effectively.
Abstract: A method of automatic summarization in Web information retrieval was proposed based on the structrue of the Web document. The document was partitioned into several topic blocks through parsing the document into DOM(Document Object Model) tree and comparing the semantic similarity. The tag information was fully used to extract topic words and key sentences. Finally the Abstract was created dynamically through adjusting the weights of sentences.The experiment results show that the new method can slove the imbalance problem of Abstract and reduce the redundancy of the content effectively.

3 citations

Dissertation
01 Jan 2008
TL;DR: Investigating event-based and temporal-oriented summarization techniques are investigated and proposed and adaptive leaning-based framework is developed to incorporate various types of features under a learning-based classification framework.
Abstract: Automatic summarization aims to produce a concise summary of source documents by identifying the focused topics in documents. Normally, topics are represented by some essential events. Topics may evolve or shift over time. Tracking the trend of the topics requires anchoring events on the time line. Unfortunately, both events and their associated time features are not well studied in previous work. Investigating event-based and temporal-oriented summarization techniques are primary objectives of this study. As a matter of fact, the salience of contents could hardly be evaluated from single point of view. Exploiting a framework which can effectively integrate multiple impact factors is another objective. We define events by "action" words as well as associated named entities. Events weave documents into a map built either on event instances or event concepts. Relevance between events is exploited to identify important events. To utilize temporal information associated to events, it is necessary to extract and normalize temporal expressions. We investigate rule-based approaches for these tasks. Two statistical measures are employed to evaluate the significance of events based on their temporal distributions. Sentence selection is a complicated process. Therefore we explore various features including surface, content, event and relevance features under a learning-based classification framework. Event-based and temporal-oriented approaches are incorporated as features into this framework. The contributions of this study are listed as follows. Event-based summarization approaches are proposed. They achieve competitive results when compared with successful word-based approaches. Temporal concepts are introduced into event-based summarization and temporal information is found crucial to summarization on documents which contain evolving topics. An adaptive leaning-based framework is developed to incorporate various types of features. A system for temporal expression extraction and normalization is implemented. It is an effective tool not only practical for document summarization, but also for many other applications.

3 citations

Dissertation
01 Sep 2013
TL;DR: This thesis presents the work on sentence compression using syntactic pruning methods in order to improve automatic text summarization, and shows that the pruning techniques perform better and seem to approximate human techniques better than baseline techniques.
Abstract: Automatic text summarization is a dynamic area in Natural Language Processing that has gained much attention in the past few decades. As a vast amount of data is accumulating and becoming available online, providing automatic summaries of specific subjects/topics has become an important user requirement. To encourage the growth of this research area, several shared tasks are held annually and different types of benchmarks are made available. Early work on automatic text summarization focused on improving the relevance of the summary content but now the trend is more towards generating more abstractive and coherent summaries. As a result of this, sentence simplification has become a prominent requirement in automatic summarization. This thesis presents our work on sentence compression using syntactic pruning methods in order to improve automatic text summarization. Sentence compression has several applications in Natural Language Processing such as text simplification, topic and subtitle generation, removal of redundant information and text summarization. Effective sentence compression techniques can contribute to text summarization by simplifying texts, avoiding redundant and irrelevant information and allowing more space for useful information. In our work, we have focused on pruning individual sentences, using their phrase structure grammar representations. We have implemented several types of pruning techniques and the results were evaluated in the context of automatic summarization, using standard evaluation metrics. In addition, we have performed a series of human evaluations and a comparison with other sentence compression techniques used in automatic summarization. Our results show that our syntactic pruning techniques achieve compression rates that are similar to previous work and also with what humans achieve. However, the automatic evaluation using ROUGE shows that any type of sentence compression causes a decrease in content compared to the original summary and extra content addition does not show a significant improvement in ROUGE. The human evaluation shows that our syntactic pruning techniques remove syntactic structures that are similar to what humans remove and inter-annotator content evaluation using ROUGE shows that our techniques perform well compared to other baseline techniques. However, when we evaluate our techniques with a grammar structure based F-measure, the results show that our pruning techniques perform better and seem to approximate human techniques better than baseline techniques.

3 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852