Showing papers on "Multi-document summarization published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Multi-document summarization using cluster-based link analysis

[...]

Xiaojun Wan¹, Jianwu Yang¹•Institutions (1)

20 Jul 2008

TL;DR: Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of the proposed summarization models and demonstrate that the ClusterCMRW model is more robust than the ClusterHITS model, with respect to different cluster numbers.

...read moreread less

Abstract: The Markov Random Walk model has been recently exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. However, a given document set usually covers a few topic themes with each theme represented by a cluster of sentences. The topic themes are usually not equally important and the sentences in an important theme cluster are deemed more salient than the sentences in a trivial theme cluster. This paper proposes the Cluster-based Conditional Markov Random Walk Model (ClusterCMRW) and the Cluster-based HITS Model (ClusterHITS) to fully leverage the cluster-level information. Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of our proposed summarization models. The results also demonstrate that the ClusterCMRW model is more robust than the ClusterHITS model, with respect to different cluster numbers.

...read moreread less

312 citations

Proceedings Article•DOI•

Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

[...]

Dingding Wang¹, Tao Li¹, Shenghuo Zhu, Chris Ding²•Institutions (2)

Florida International University¹, University of Texas at Arlington²

20 Jul 2008

TL;DR: A new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization is proposed, which aims to create a compressed summary while retaining the main characteristics of the original set of documents.

...read moreread less

Abstract: Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization. We first calculate sentence-sentence similarities using semantic analysis and construct the similarity matrix. Then symmetric matrix factorization, which has been shown to be equivalent to normalized spectral clustering, is used to group sentences into clusters. Finally, the most informative sentences are selected from each group to form the summary. Experimental results on DUC2005 and DUC2006 data sets demonstrate the improvement of our proposed framework over the implemented existing summarization systems. A further study on the factors that benefit the high performance is also conducted.

...read moreread less

304 citations

Journal Article•

Overview of the TAC 2008 Update Summarization Task.

[...]

Hoa Trang Dang¹, Karolina Owczarzak¹•Institutions (1)

National Institute of Standards and Technology¹

01 Jan 2008-Theory and Applications of Categories

TL;DR: While all of the 71 submitted runs were automatically scored with the ROUGE and BE metrics, NIST assessors manually evaluated only 57 of the submitted runs for readability, content, and overall responsiveness.

...read moreread less

Abstract: The summarization track at the Text Analysis Conference (TAC) is a direct continuation of the Document Understanding Conference (DUC) series of workshops, focused on providing common data and evaluation framework for research in automatic summarization. In the TAC 2008 summarization track, the main task was to produce two 100-word summaries from two related sets of 10 documents, where the second summary was an update summary. While all of the 71 submitted runs were automatically scored with the ROUGE and BE metrics, NIST assessors manually evaluated only 57 of the submitted runs for readability, content, and overall responsiveness.

...read moreread less

215 citations

Proceedings Article•DOI•

Comments-oriented document summarization: understanding documents with readers' feedback

[...]

Meishan Hu¹, Aixin Sun¹, Ee-Peng Lim¹•Institutions (1)

Nanyang Technological University¹

20 Jul 2008

TL;DR: The proposed summarization methods utilizing comments showed significant improvement over those not using comments, and the methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach.

...read moreread less

Abstract: Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization In this paper, we study the problem of comments-oriented document summarization and aim to summarize a Web document (eg, a blog post) by considering not only its content, but also the comments left by its readers We identify three relations (namely, topic, quotation, and mention) by which comments can be linked to one another, and model the relations in three graphs The importance of each comment is then scored by: (i) graph-based method, where the three graphs are merged into a multi-relation graph; (ii) tensor-based method, where the three graphs are used to construct a 3rd-order tensor To generate a comments-oriented summary, we extract sentences from the given Web document using either feature-biased approach or uniform-document approach The former scores sentences to bias keywords derived from comments; while the latter scores sentences uniformly with comments In our experiments using a set of blog posts with manually labeled sentences, our proposed summarization methods utilizing comments showed significant improvement over those not using comments The methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach

...read moreread less

130 citations

Proceedings Article•DOI•

Latent dirichlet allocation based multi-document summarization

[...]

Rachit Arora¹, Balaraman Ravindran¹•Institutions (1)

Indian Institute of Technology Madras¹

24 Jul 2008

TL;DR: This article uses Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events and shows that the algorithms gave significantly better ROUGE-1 recall measures compared to DUC 2002 winners.

...read moreread less

Abstract: Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

...read moreread less

114 citations

Proceedings Article•DOI•

FastSum: Fast and Accurate Query-based Multi-document Summarization

[...]

Frank Schilder¹, Ravi Kondadadi¹•Institutions (1)

Thomson Corporation¹

16 Jun 2008

TL;DR: A fast query-based multi-document summarizer based solely on word-frequency features of clusters, documents and topics called FastSum, which can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds.

...read moreread less

Abstract: We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive NLP techniques such as parsing, tagging of names or even part of speech information. Still, the achieved accuracy is comparable to the best systems presented in recent academic competitions (i.e., Document Understanding Conference (DUC)). Because of a detailed feature analysis using Least Angle Regression (LARS), FastSum can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds.

...read moreread less

101 citations

Proceedings Article•DOI•

Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation

[...]

Vivi Nastase

25 Oct 2008

TL;DR: The hypothesis that encyclopedic knowledge is a useful addition to a summarization system is confirmed by the system implemented, which ranks high compared to the participating systems in the DUC competitions.

...read moreread less

Abstract: Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic -- a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To "understand" the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC summarization data. The system implemented ranks high compared to the participating systems in the DUC competitions, confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system.

...read moreread less

99 citations

Journal Article•DOI•

Single-document and multi-document summarization techniques for email threads using sentence compression

[...]

David Zajic¹, Bonnie J. Dorr¹, Jimmy Lin¹•Institutions (1)

University of Maryland, College Park¹

01 Jul 2008-Information Processing and Management

TL;DR: It is found that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre.

...read moreread less

Abstract: We present two approaches to email thread summarization: collective message summarization (CMS) applies a multi-document summarization approach, while individual message summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in our general framework driven by sentence compression. Instead of a purely extractive approach, we employ linguistic and statistical methods to generate multiple compressions, and then select from those candidates to produce a final summary. We demonstrate these ideas on the Enron email collection - a very challenging corpus because of the highly technical language. Experimental results point to two findings: that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre.

...read moreread less

91 citations

Proceedings Article•DOI•

An Exploration of Document Impact on Graph-Based Multi-Document Summarization

[...]

Xiaojun Wan¹•Institutions (1)

Peking University¹

25 Oct 2008

TL;DR: A document-based graph model is proposed to incorporate the document-level information and the sentence-to-document relationship into the graph-based ranking process and the results show the robustness of the proposed model.

...read moreread less

Abstract: The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under the assumption that all the sentences are indistinguishable. However, given a document set to be summarized, different documents are usually not equally important, and moreover, different sentences in a specific document are usually differently important. This paper aims to explore document impact on summarization performance. We propose a document-based graph model to incorporate the document-level information and the sentence-to-document relationship into the graph-based ranking process. Various methods are employed to evaluate the two factors. Experimental results on the DUC2001 and DUC2002 datasets demonstrate that the good effectiveness of the proposed model. Moreover, the results show the robustness of the proposed model.

...read moreread less

91 citations

Proceedings Article•DOI•

Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization

[...]

Furu Wei¹, Wenjie Li¹, Qin Lu¹, Yanxiang He²•Institutions (2)

Hong Kong Polytechnic University¹, Wuhan University²

20 Jul 2008

TL;DR: The MR is extended to the mutual reinforcement chain (MRC) of three different text granularities, i.e., document, sentence and terms, and a query-sensitive similarity is developed to measure the affinity between the pair of texts.

...read moreread less

Abstract: Sentence ranking is the issue of most concern in document summarization. Early researchers have presented the mutual reinforcement principle (MR) between sentence and term for simultaneous key phrase and salient sentence extraction in generic single-document summarization. In this work, we extend the MR to the mutual reinforcement chain (MRC) of three different text granularities, i.e., document, sentence and terms. The aim is to provide a general reinforcement framework and a formal mathematical modeling for the MRC. Going one step further, we incorporate the query influence into the MRC to cope with the need for query-oriented multi-document summarization. While the previous summarization approaches often calculate the similarity regardless of the query, we develop a query-sensitive similarity to measure the affinity between the pair of texts. When evaluated on the DUC 2005 dataset, the experimental results suggest that the proposed query-sensitive MRC (Qs-MRC) is a promising approach for summarization.

...read moreread less

90 citations

Journal Article•DOI•

Using only cross-document relationships for both generic and topic-focused multi-document summarizations

[...]

Xiaojun Wan¹•Institutions (1)

Peking University¹

01 Feb 2008-Information Retrieval

TL;DR: This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization.

...read moreread less

Abstract: In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the "votings" or "recommendations" between sentences to evaluate the importance of the sentences in the documents. This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization. The contributions of this study are two-fold: (1) For generic multi-document summarization, we apply the graph-based ranking algorithm based on each kind of sentence relationship and explore their relative importance for summarization performance. (2) For topic-focused multi-document summarization, we propose to integrate the relevance of the sentences to the specified topic into the graph-ranking based method. Each individual kind of sentence relationship is also differentiated and investigated in the algorithm. Experimental results on DUC 2002---DUC 2005 data demonstrate the great importance of the cross-document relationships between sentences for both generic and topic-focused multi-document summarizations. Even the approach based only on the cross-document relationships can perform better than or at least as well as the approaches based on both kinds of relationships between sentences.

...read moreread less

Book Chapter•DOI•

Arabic/English multi-document summarization with CLASSY: the past and the future

[...]

Judith D. Schlesinger, Dianne P. O'Leary¹, John M. Conroy•Institutions (1)

University of Maryland, College Park¹

17 Feb 2008

TL;DR: A description of CLASSY (Clustering, Linguistics, And Statistics for Summarization Yield) is an automatic, extract-generating, summarization system that uses linguistic trimming and statistical methods to generate generic or topic-driven summaries for single documents or clusters of documents.

...read moreread less

Abstract: Automatic document summarization has become increasingly important due to the quantity of written material generated worldwide. Generating good quality summaries enables users to cope with larger amounts of information. English-document summarization is a difficult task. Yet it is not sufficient. Environmental, economic, and other global issues make it imperative for English speakers to understand how other countries and cultures perceive and react to important events. CLASSY (Clustering, Linguistics, And Statistics for Summarization Yield) is an automatic, extract-generating, summarization system that uses linguistic trimming and statistical methods to generate generic or topic(/query)-driven summaries for single documents or clusters of documents. CLASSY has performed well in the Document Understanding Conference (DUC) evaluations and the Multi-lingual (Arabic/English) Summarization Evaluations (MSE). We present a description of CLASSY. We follow this with experiments and results from the MSE evaluations and conclude with a discussion of on-going work to improve the quality of the summaries-both Englishonly and multi-lingual-that CLASSY generates.

...read moreread less

Proceedings Article•DOI•

Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization

[...]

Rachit Arora¹, Balaraman Ravindran¹•Institutions (1)

Indian Institutes of Technology¹

15 Dec 2008

TL;DR: This work presents the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries and shows that the algorithms gave significantly better RouGE-1 recall measures.

...read moreread less

Abstract: Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

...read moreread less

Journal Article•DOI•

An R&D knowledge management method for patent document summarization

[...]

Amy J.C. Trappey¹, Charles V. Trappey•Institutions (1)

National Tsing Hua University¹

21 Mar 2008-Industrial Management and Data Systems

TL;DR: An automatic patent summarization method for accurate knowledge abstraction and effective R&D knowledge management combining the concepts of key phrase recognition and significant information density is developed.

...read moreread less

Abstract: Purpose – In an era of rapidly expanding digital content, the number of e‐documents and the amount of knowledge frequently overwhelm the R&D teams and often impede intellectual property management. The purpose of this paper is to develop an automatic patent summarization method for accurate knowledge abstraction and effective R&D knowledge management.Design/methodology/approach – This paper develops an integrated approach for automatic patent summary generation combining the concepts of key phrase recognition and significant information density. Significant information density is defined based on the domain‐specific key concepts/phrases, relevant phrases, title phrases, indicator phrases and topic sentences of a given patent document.Findings – The document compression ratio and the knowledge retention ratio are used to measure both quantitative and qualitative outcomes of the new summarization methodology. Both measurements indicate the significant benefits and superior results of the method.Research lim...

...read moreread less

Proceedings Article•DOI•

Concept-Graph Based Biomedical Automatic Summarization Using Ontologies

[...]

Laura Plaza¹, Alberto Díaz¹, Pablo Gervás¹•Institutions (1)

Complutense University of Madrid¹

24 Aug 2008

TL;DR: This paper introduces an ontology-based extractive method for summarization, based on mapping the text to concepts and representing the document and its sentences as graphs, and applies it to summarize biomedical literature.

...read moreread less

Abstract: One of the main problems in research on automatic summarization is the inaccurate semantic interpretation of the source. Using specific domain knowledge can considerably alleviate the problem. In this paper, we introduce an ontology-based extractive method for summarization. It is based on mapping the text to concepts and representing the document and its sentences as graphs. We have applied our approach to summarize biomedical literature, taking advantages of free resources as UMLS. Preliminary empirical results are presented and pending problems are identified.

...read moreread less

WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System

[...]

Dragomir R. Radev, Weiguo Fan

01 Jan 2008

TL;DR: This paper addresses some of the design issues to improve the scalability and readability of the multi-document summarizer included in WebInEssence.

...read moreread less

Abstract: In this paper, we present our recent work on the development of a scalable personalized web-based multi-document summarization and recommendation system: WebInEssence. WebInEssence is designed to help end users effectively search for useful information and automatically summarize selected documents based on the users’ personal profiles. We address some of the design issues to improve the scalability and readability of our multi-document summarizer included in WebInEssence. Some evaluation results with different configurations are also presented.

...read moreread less

Proceedings Article•DOI•

PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization

[...]

Wenjie Li¹, Furu Wei¹, Furu Wei², Qin Lu¹, Yanxiang He² - Show less +1 more•Institutions (2)

Hong Kong Polytechnic University¹, Wuhan University²

18 Aug 2008

TL;DR: This paper proposes a novel graph based sentence ranking algorithm, namely PNR2, for update summarization, inspired by the intuition that "a sentence receives a positive influence from the sentences that correlate to it in the same collection, whereas a sentence receiving a negative influence fromthe sentences that correlates to itIn the different (perhaps previously read) collection".

...read moreread less

Abstract: Query-oriented update summarization is an emerging summarization task very recently. It brings new challenges to the sentence ranking algorithms that require not only to locate the important and query-relevant information, but also to capture the new information when document collections evolve. In this paper, we propose a novel graph based sentence ranking algorithm, namely PNR2, for update summarization. Inspired by the intuition that "a sentence receives a positive influence from the sentences that correlate to it in the same collection, whereas a sentence receives a negative influence from the sentences that correlates to it in the different (perhaps previously read) collection", PNR2 models both the positive and the negative mutual reinforcement in the ranking process. Automatic evaluation on the DUC 2007 data set pilot task demonstrates the effectiveness of the algorithm.

...read moreread less

Proceedings Article•DOI•

A novel approach to enable semantic and visual image summarization for exploratory image search

[...]

Jianping Fan¹, Yuli Gao¹, Hangzai Luo¹, Daniel A. Keim², Zongmin Li³ - Show less +1 more•Institutions (3)

University of North Carolina at Charlotte¹, University of Konstanz², China University of Petroleum³

30 Oct 2008

TL;DR: A novel scheme to incorporate topic network and representativeness-based sampling for achieving semantic and visual summarization and visualization of large-scale collections of Flickr images and experiments on large- scale image collections with diverse semantics have provided very positive results.

...read moreread less

Abstract: In this paper, we have developed a novel scheme to incorporate topic network and representativeness-based sampling for achieving semantic and visual summarization and visualization of large-scale collections of Flickr images. First, topic network is automatically generated for summarizing and visualizing large-scale collections of Flickr images at a semantic level, so that users can select more suitable keywords for more precise query formulation. Second, the diverse visual similarities between the semantically-similar images are characterized more precisely by using a mixture-of-kernels and a representativeness-based image sampling algorithm is developed to achieve similarity-based summarization and visualization of large amounts of images under the same topic, so that users can find some particular images of interest more effectively. Our experiments on large-scale image collections with diverse semantics have provided very positive results.

...read moreread less

Journal Issue•DOI•

Hierarchical summarization of large documents

[...]

Christopher C. Yang¹, Fu Lee Wang²•Institutions (2)

Drexel University¹, City University of Hong Kong²

01 Apr 2008-Journal of the Association for Information Science and Technology

TL;DR: The fractal summarization model is the first attempt to apply fractal theory to document summarization and significantly improves the divergence of information coverage of summary and the precision of summary.

...read moreread less

Abstract: Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals—mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents. © 2008 Wiley Periodicals, Inc.

...read moreread less

Proceedings Article•DOI•

Integrating clustering and multi-document summarization to improve document understanding

[...]

Dingding Wang¹, Shenghuo Zhu, Tao Li¹, Yun Chi, Yihong Gong - Show less +1 more•Institutions (1)

Florida International University¹

26 Oct 2008

TL;DR: This paper proposes a new language model to simultaneously cluster and summarize the documents and makes a better document clustering method with more meaningful interpretation and a better documents summarization method taking the document context information into consideration.

...read moreread less

Abstract: Document understanding techniques such as document clustering and multi-document summarization have been receiving much attention in recent years Current document clustering methods usually represent documents as a term-document matrix and perform clustering algorithms on it Although these clustering methods can group the documents satisfactorily, it is still hard for people to capture the meanings of the documents since there is no satisfactory interpretation for each document cluster In this paper, we propose a new language model to simultaneously cluster and summarize the documents By utilizing the mutual influence of the document clustering and summarization, our method makes (1) a better document clustering method with more meaningful interpretation and (2) a better document summarization method taking the document context information into consideration

...read moreread less

Book Chapter•DOI•

Text Summarization by Sentence Extraction Using Unsupervised Learning

[...]

René Arnulfo García-Hernández¹, Romyna Montiel¹, Yulia Ledeneva¹, Eréndira Rendón¹, Alexander Gelbukh¹, Rafael Cruz¹ - Show less +2 more•Institutions (1)

Instituto Politécnico Nacional¹

27 Oct 2008

TL;DR: The hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences) and, for composing the summary, the most representative sentence is selected from each cluster.

...read moreread less

Abstract: The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Although, some approaches claim being domain and language independent, they use high dependence knowledge like key-phrases or golden samples for machine-learning approaches. In this work, we propose a language- and domain-independent automatic text summarization approach by sentence extraction using an unsupervised learning algorithm. Our hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences). Then, for composing the summary, the most representative sentence is selected from each cluster. Several experiments in the standard DUC-2002 collection show that the proposed method obtains more favorable results than other approaches.

...read moreread less

Journal Article•DOI•

Design and development of a concept-based multi-document summarization system for research abstracts

[...]

Shiyan Ou¹, Christopher S. G. Khoo¹, Dion Hoe-Lian Goh¹•Institutions (1)

Nanyang Technological University¹

01 Jun 2008-Journal of Information Science

TL;DR: A new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration and the user evaluation carried out in the study indicated that the majority of subjects preferred the concept- based summaries generated using the system to the sentence-based summariesgenerated using traditional sentence extraction techniques.

...read moreread less

Abstract: This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps — (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.

...read moreread less

Proceedings Article•DOI•

Text summarization with harmony search algorithm-based sentence extraction

[...]

Ehsan Shareghi¹, Leila Sharif Hassanabadi¹•Institutions (1)

Shahid Beheshti University¹

28 Oct 2008

TL;DR: This research introduces a method to make extractions based on three factors of Readability, Cohesion and Topic relation to create a summary of text summarization.

...read moreread less

Abstract: Currently vast amounts of textual information exist in large repositories such as Web. To processes such a huge amount of information, automatic text summarization has been of great interests. Unlike many approaches which focus on sentence or paragraph extraction, in this research, we introduce a method to make extractions based on three factors of Readability, Cohesion and Topic relation. We use Harmony Search-based sentence selection to make such a summary. Once the summary is created, it is evaluated using a fitness function based on those three factors. The evaluation of the algorithm on a test collection is also presented in the paper. Our results indicate that the extracted summaries by our proposed scheme have better precision and recall than the other approaches.

...read moreread less

Proceedings Article•DOI•

A keyphrase based approach to interactive meeting summarization

[...]

Korbinian Riedhammer¹, Benoit Favre, Dilek Hakkani-Tur•Institutions (1)

University of Erlangen-Nuremberg¹

01 Dec 2008

TL;DR: This work introduces a simple yet robust algorithm to automatically extract keyphrases (KP) from a meeting which can then be used as a query in the MMR algorithm, and shows that the KP based system significantly outperforms both baseline and centroid based systems.

...read moreread less

Abstract: Rooted in multi-document summarization, maximum marginal relevance (MMR) is a widely used algorithm for meeting summarization (MS). A major problem in extractive MS using MMR is finding a proper query: the centroid based query which is commonly used in the absence of a manually specified query, can not significantly outperform a simple baseline system. We introduce a simple yet robust algorithm to automatically extract keyphrases (KP) from a meeting which can then be used as a query in the MMR algorithm. We show that the KP based system significantly outperforms both baseline and centroid based systems. As human refined KPs show even better summarization performance, we outline how to integrate the KP approach into a graphical user interface allowing interactive summarization to match the user's needs in terms of summary length and topic focus.

...read moreread less

Journal Article•DOI•

PeRSSonal's core functionality evaluation: Enhancing text labeling through personalized summaries

[...]

Christos Bouras¹, Vassilis Poulopoulos¹, Vassilis Tsogkas¹•Institutions (1)

Research Academic Computer Technology Institute¹

01 Jan 2008

TL;DR: A combination of the algorithms, which achieves co-operation of the categorization and summarization mechanisms, is introduced in order to enhance text labeling through the personalized summaries that are constructed.

...read moreread less

Abstract: In this manuscript we present the summarization and categorization subsystems of a complete mechanism that begins with web-page fetching and concludes with representation of the collected data to the end users through a personalized portal. The system intends to collect articles from major news portals and, following an algorithmic procedure, to create a more user friendly and personalized ''view'' of the articles. Before presenting the information back to the end user, the core of our mechanism automatically categorizes data and then extracts personalized summaries. We focalize to the core of the mechanism and more specifically, we present the algorithms used for the summarization and the categorization of texts. The algorithms are not utilized only for producing isolated data, targeted to a specific subsystem, but a combination of the algorithms, which achieves co-operation of the categorization and summarization mechanisms, is introduced in order to enhance text labeling through the personalized summaries that are constructed.

...read moreread less

Proceedings Article•

Entity-driven Rewrite for Multi-document Summarization

[...]

Ani Nenkova¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2008

TL;DR: Sumaries produced using entity-driven rewrite have higher linguistic quality than a comparison non-extractive system and some improvement is also seen in content selection over extractive summarization as measured by pyramid method evaluation.

...read moreread less

Abstract: In this paper we explore the benefits from and shortcomings of entity-driven noun phrase rewriting for multi-document summarization of news. The approach leads to 20% to 50% different content in the summary in comparison to an extractive summary produced using the same underlying approach, showing the promise the technique has to offer. In addition, summaries produced using entity-driven rewrite have higher linguistic quality than a comparison non-extractive system. Some improvement is also seen in content selection over extractive summarization as measured by pyramid method evaluation.

...read moreread less

Journal Article•DOI•

Automatic Text Documents Summarization through Sentences Clustering

[...]

Rasim M. Alguliev¹, R. M. Alyguliev¹•Institutions (1)

Azerbaijan National Academy of Sciences¹

01 Jan 2008-Journal of Automation and Information Sciences

Proceedings Article•

Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization

[...]

Ani Nenkova¹, Annie Louis¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2008

TL;DR: The results of a quantitative analysis on data from large-scale evaluations of multi-document summarization confirm that features measuring the cohesiveness of the input are highly correlated with eventual summary quality and that it is possible to use these as features to predict the difficulty of new, unseen, summarization inputs.

...read moreread less

Abstract: Different summarization requirements could make the writing of a good summary more difficult, or easier. Summary length and the characteristics of the input are such constraints influencing the quality of a potential summary. In this paper we report the results of a quantitative analysis on data from large-scale evaluations of multi-document summarization, empirically confirming this hypothesis. We further show that features measuring the cohesiveness of the input are highly correlated with eventual summary quality and that it is possible to use these as features to predict the difficulty of new, unseen, summarization inputs.

...read moreread less

Proceedings Article•

Sentence Ordering based on Cluster Adjacency in Multi-Document Summarization

[...]

Donghong Ji, Yu Nie¹•Institutions (1)

Agency for Science, Technology and Research¹

01 Jan 2008

TL;DR: Experiments and evaluations on DUC04 data show that this cluster-adjacency based method to order sentences for multi-document summarization tasks gets better performance than other existing sentence ordering methods.

...read moreread less

Abstract: In this paper, we propose a cluster-adjacency based method to order sentences for multi-document summarization tasks. Given a group of sentences to be organized into a summary, each sentence was mapped to a theme in source documents by a semi-supervised classification method, and adjacency of pairs of sentences is learned from source documents based on adjacency of clusters they belong to. Then the ordering of the summary sentences can be derived with the first sentence determined. Experiments and evaluations on DUC04 data show that this method gets better performance than other existing sentence ordering methods.

...read moreread less

Proceedings Article•DOI•

ROUGE-C: A fully automated evaluation method for multi-document summarization

[...]

Tingting He¹, Jinguang Chen¹, Liang Ma¹, Zhuoming Gui¹, Fang Li¹, Wei Shao¹, Qian Wang¹ - Show less +3 more•Institutions (1)

Central China Normal University¹

31 Oct 2008

TL;DR: ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization.

...read moreread less

Abstract: This paper presents how to use ROUGE to evaluate summaries without human reference summaries. ROUGE is a widely used evaluation tool for multi-document summarization and has great advantages in the areas of summarization evaluation. However, manual reference summaries written beforehand by assessors are indispensable for a ROUGE test. There was still no research on ROUGEpsilas abilities of evaluating summaries without manual reference summaries. By considering summary as consensus speaker for the original input information, we discovered and developed ROUGE-C. ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization. Experiments conducted on the 2001 to 2005 DUC data showed that, with restraint of appropriate condition and some acceptable decreased efficiency, ROUGE-C correlated well with methods that depend on reference summaries, including human judgments.

...read moreread less