Showing papers on "Multi-document summarization published in 2010"

PDF

Open Access

Journal Article•DOI•

A Survey of Text Summarization Extractive Techniques

[...]

Vishal Gupta¹, Gurpreet Singh Lehal•Institutions (1)

08 Jan 2010-Journal of Emerging Technologies in Web Intelligence

TL;DR: A Survey of Text Summarization Extractive techniques has been presented and it is shown that extracting important sentences, paragraphs etc. from the source text and concatenating them into shorter form conveys the most important information from the original text document.

...read moreread less

Abstract: Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. It is very difficult for human beings to manually summarize large documents of text. Text Summarization methods can be classified into extractive and Abstractive summarization. An extractive summarization method consists of selecting important sentences, paragraphs etc. from the original document and concatenating them into shorter form. The importance of sentences is decided based on statistical and linguistic features of sentences. An Abstractive summarization method consists of understanding the original text and re-telling it in fewer words. It uses linguistic methods to examine and interpret the text and then to find the new concepts and expressions to best describe it by generating a new shorter text that conveys the most important information from the original text document. In this paper, a Survey of Text Summarization Extractive techniques has been presented.

...read moreread less

559 citations

Proceedings Article•

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions

[...]

Kavita Ganesan¹, ChengXiang Zhai¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

23 Aug 2010

TL;DR: A novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions that have better agreement with human summaries compared to the baseline extractive method.

...read moreread less

Abstract: We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

...read moreread less

500 citations

Proceedings Article•

Multi-document Summarization via Budgeted Maximization of Submodular Functions

[...]

Hui Lin¹, Jeff A. Bilmes¹•Institutions (1)

University of Washington¹

02 Jun 2010

TL;DR: It is shown, both theoretically and empirically, that a modified greedy algorithm can efficiently solve the budgeted submodular maximization problem near-optimally, and derive new approximation bounds in doing so.

...read moreread less

Abstract: We treat the text summarization problem as maximizing a submodular function under a budget constraint. We show, both theoretically and empirically, a modified greedy algorithm can efficiently solve the budgeted submodular maximization problem near-optimally, and we derive new approximation bounds in doing so. Experiments on DUC'04 task show that our approach is superior to the best-performing method from the DUC'04 evaluation on ROUGE-1 scores.

...read moreread less

432 citations

Proceedings Article•DOI•

On the Use of Automated Text Summarization Techniques for Summarizing Source Code

[...]

Sonia Haiduc¹, Jairo Aponte, Laura Moreno, Andrian Marcus¹•Institutions (1)

Wayne State University¹

13 Oct 2010

TL;DR: The paper presents a solution which mitigates the two approaches, i.e., short and accurate textual descriptions that illustrate the software entities without having to read the details of the implementation.

...read moreread less

Abstract: During maintenance developers cannot read the entire code of large systems. They need a way to get a quick understanding of source code entities (such as, classes, methods, packages, etc.), so they can efficiently identify and then focus on the ones related to their task at hand. Sometimes reading just a method header or a class name does not tell enough about its purpose and meaning, while reading the entire implementation takes too long. We study a solution which mitigates the two approaches, i.e., short and accurate textual descriptions that illustrate the software entities without having to read the details of the implementation. We create such descriptions using techniques from automatic text summarization. The paper presents a study that investigates the suitability of various such techniques for generating source code summaries. The results indicate that a combination of text summarization techniques is most appropriate for source code summarization and that developers generally agree with the summaries produced.

...read moreread less

356 citations

Proceedings Article•

A Hybrid Hierarchical Model for Multi-Document Summarization

[...]

Asli Celikyilmaz¹, Dilek Hakkani-Tur²•Institutions (2)

University of California, Berkeley¹, International Computer Science Institute²

11 Jul 2010

TL;DR: This paper forms extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference based on the lexical and structural characteristics of the sentences.

...read moreread less

Abstract: Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ~7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.

...read moreread less

156 citations

Proceedings Article•

Discourse indicators for content selection in summarization

[...]

Annie Louis¹, Aravind K. Joshi¹, Ani Nenkova¹•Institutions (1)

University of Pennsylvania¹

24 Sep 2010

TL;DR: The results establish the usefulness of discourse features and find that lexical overlap provides a simple and cheap alternative to discourse for computing text structure with comparable performance for the task of content selection.

...read moreread less

Abstract: We present analyses aimed at eliciting which specific aspects of discourse provide the strongest indication for text importance. In the context of content selection for single document summarization of news, we examine the benefits of both the graph structure of text provided by discourse relations and the semantic sense of these relations. We find that structure information is the most robust indicator of importance. Semantic sense only provides constraints on content selection but is not indicative of important content by itself. However, sense features complement structure information and lead to improved performance. Further, both types of discourse information prove complementary to non-discourse features. While our results establish the usefulness of discourse features, we also find that lexical overlap provides a simple and cheap alternative to discourse for computing text structure with comparable performance for the task of content selection.

...read moreread less

155 citations

Proceedings Article•

Multi-Document Summarization via the Minimum Dominating Set

[...]

Chao Shen, Tao Li

23 Aug 2010

TL;DR: It is shown that four well-known summarization tasks including generic, query-focused, update, and comparative summarization can be modeled as different variations derived from the proposed framework.

...read moreread less

Abstract: Multi-document summarization has been an important problem in information retrieval. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a sentence graph generated from a set of documents where vertices represent sentences and edges indicate that the corresponding vertices are similar, the extracted summary can be described using the idea of graph domination. In this paper, we propose a new principled and versatile framework for multi-document summarization using the minimum dominating set. We show that four well-known summarization tasks including generic, query-focused, update, and comparative summarization can be modeled as different variations derived from the proposed framework. Approximation algorithms for performing summarization are also proposed and empirical experiments are conducted to demonstrate the effectiveness of our proposed framework.

...read moreread less

150 citations

Proceedings Article•

Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

[...]

Emily Pitler¹, Annie Louis¹, Ani Nenkova¹•Institutions (1)

University of Pennsylvania¹

11 Jul 2010

TL;DR: This work presents the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text, and trains and test linguistic quality models on consecutive years of NIST evaluation data to show the generality of results.

...read moreread less

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference information, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

...read moreread less

105 citations

Proceedings Article•

Citation Summarization Through Keyphrase Extraction

[...]

Vahed Qazvinian¹, Dragomir R. Radev¹, Arzucan Özgür¹•Institutions (1)

University of Michigan¹

23 Aug 2010

TL;DR: Comparisons show how this methodology excels at the task of single paper summarization, and how it out-performs other multi-document summarization methods.

...read moreread less

Abstract: This paper presents an approach to summarize single scientific papers, by extracting its contributions from the set of citation sentences written in other papers. Our methodology is based on extracting significant keyphrases from the set of citation sentences and using these keyphrases to build the summary. Comparisons show how this methodology excels at the task of single paper summarization, and how it out-performs other multi-document summarization methods.

...read moreread less

103 citations

Proceedings Article•

Cross-Language Document Summarization Based on Machine Translation Quality Prediction

[...]

Xiaojun Wan¹, Huiying Li¹, Jianguo Xiao¹•Institutions (1)

Peking University¹

11 Jul 2010

TL;DR: This paper proposes to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process, and suggests that the English sentences with high translation quality and high informative-ness are selected and translated to form the Chinese summary.

...read moreread less

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informative-ness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach.

...read moreread less

98 citations

Journal Article•DOI•

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

[...]

Xiaojun Wan¹, Jianguo Xiao¹•Institutions (1)

Peking University¹

10 Jun 2010-ACM Transactions on Information Systems

TL;DR: This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues.

...read moreread less

Abstract: Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.

...read moreread less

Proceedings Article•

Multilingual Summarization Evaluation without Human Models

[...]

Horacio Saggion¹, Juan-Manuel Torres Moreno², Iria da Cunha², Eric SanJuan², Patricia Velázquez-Morales - Show less +1 more•Institutions (2)

Pompeu Fabra University¹, University of Avignon²

23 Aug 2010

TL;DR: This work applies a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions in text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summary in French and Spanish.

...read moreread less

Abstract: We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and Rouge studying their associations in various text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summarization in French and Spanish. The research is carried out using a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions.

...read moreread less

Proceedings Article•

Non-Expert Evaluation of Summarization Systems is Risky

[...]

Daniel Gillick¹, Yang Liu²•Institutions (2)

University of California, Berkeley¹, University of Texas at Dallas²

06 Jun 2010

TL;DR: It is provided evidence that intrinsic evaluation of summaries using Amazon's Mechanical Turk is quite difficult and that non-expert judges are not able to recover system rankings derived from experts.

...read moreread less

Abstract: We provide evidence that intrinsic evaluation of summaries using Amazon's Mechanical Turk is quite difficult. Experiments mirroring evaluation at the Text Analysis Conference's summarization track show that non-expert judges are not able to recover system rankings derived from experts.

...read moreread less

Journal Article•DOI•

A document-sensitive graph model for multi-document summarization

[...]

Furu Wei¹, Wenjie Li¹, Qin Lu¹, Yanxiang He²•Institutions (2)

Hong Kong Polytechnic University¹, Wuhan University²

22 Feb 2010-Knowledge and Information Systems

TL;DR: A novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation and develops an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking), which outperforms previous graph-based models in both generic and query-oriented summarization tasks.

...read moreread less

Abstract: In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.

...read moreread less

Journal Article•DOI•

A bottom-up approach to sentence ordering for multi-document summarization

[...]

Danushka Bollegala¹, Naoaki Okazaki¹, Mitsuru Ishizuka¹•Institutions (1)

University of Tokyo¹

01 Jan 2010-Information Processing and Management

TL;DR: The average continuity, an automatic evaluation measure of sentence ordering in a summary, is introduced and its appropriateness for this task is investigated.

...read moreread less

Abstract: Ordering information is a difficult but important task for applications generating natural language texts such as multi-document summarization, question answering, and concept-to-text generation. In multi-document summarization, information is selected from a set of source documents. However, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. Therefore, it is vital to properly order the information in multi-document summarization. We present a bottom-up approach to arrange sentences extracted for multi-document summarization. To capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. We evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. We introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task.

...read moreread less

Journal Article•DOI•

Summary Evaluation with and without References

[...]

Juan-Manuel Torres-Moreno¹, Horacio Saggion¹, Iria da Cunha², Eric SanJuan¹, Patricia Velázquez-Morales³ - Show less +1 more•Institutions (3)

University of Avignon¹, Pompeu Fabra University², VM Labs³

31 Dec 2010

TL;DR: A new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings is studied and a variety of divergences among probability distributions are computed.

...read moreread less

Abstract: We study a new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings The research is carried out using a new content–based evaluation framework called Fresa to compute a variety of divergences among probability distributions We apply our comparison framework to various well–established content–based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi–document summarization in English and French, focus–based multi–document summarization in English and generic single–document summarization in French and Spanish

...read moreread less

Proceedings Article•

Text Summarization of Turkish Texts using Latent Semantic Analysis

[...]

Makbule Gulcin Ozsoy, Ilyas Cicekli¹, Ferda Nur Alpaslan•Institutions (1)

Bilkent University¹

23 Aug 2010

TL;DR: Two new LSA based summarization algorithms are proposed and their performances are compared using their ROUGE-L scores to find out well-formed summaries.

...read moreread less

Abstract: Text summarization solves the problem of extracting important information from huge amount of text data. There are various methods in the literature that aim to find out well-formed summaries. One of the most commonly used methods is the Latent Semantic Analysis (LSA). In this paper, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish documents, and their performances are compared using their ROUGE-L scores. One of our algorithms produces the best scores.

...read moreread less

Proceedings Article•DOI•

Document update summarization using incremental hierarchical clustering

[...]

Dingding Wang¹, Tao Li¹•Institutions (1)

Florida International University¹

26 Oct 2010

TL;DR: A new summarization method based on an incremental hierarchical clustering framework to update summaries as soon as a new document arrives to demonstrate the effectiveness and efficiency of this proposed method.

...read moreread less

Abstract: Document summarization has become a hot topic in recent years. However, most of existing summarization methods work on a batch of documents and do not consider that documents may arrive in a sequence and the corresponding summaries need to be updated in real time. In this paper, we propose a new summarization method based on an incremental hierarchical clustering framework to update summaries as soon as a new document arrives. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method.

...read moreread less

Proceedings Article•DOI•

Graph-Based Algorithms for Text Summarization

[...]

Khushboo S. Thakkar, Rajiv V. Dharaskar, Manoj Chandak

19 Nov 2010

TL;DR: Innovative unsupervised methods for automatic sentence extraction using graph-based ranking algorithms and shortest path algorithm are presented.

...read moreread less

Abstract: Summarization is a brief and accurate representation of input text such that the output covers the most important concepts of the source in a condensed manner. Text Summarization is an emerging technique for understanding the main purpose of any kind of documents. To visualize a large text document within a short duration and small visible area like PDA screen, summarization provides a greater flexibility and convenience. This paper presents innovative unsupervised methods for automatic sentence extraction using graph-based ranking algorithms and shortest path algorithm.

...read moreread less

Journal Article•DOI•

Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization

[...]

Ramiz M. Aliguliyev¹•Institutions (1)

Azerbaijan National Academy of Sciences¹

01 Nov 2010

TL;DR: The experimental results on open benchmark data sets from DUC2005 and DUC2007 show that the proposed generic multi‐ document summarization method significantly outperforms the baseline methods for multi‐document summarization.

...read moreread less

Abstract: Multi-document summarization is a process of automatic creation of a compressed version of a given collection of documents that provides useful information to users. In this article we propose a generic multi-document summarization method based on sentence clustering. We introduce five clustering methods, which optimize various aspects of intra-cluster similarity, inter-cluster dissimilarity and their combinations. To solve the clustering problem a modification of discrete particle swarm optimization algorithm has been proposed. The experimental results on open benchmark data sets from DUC2005 and DUC2007 show that our method significantly outperforms the baseline methods for multi-document summarization.

...read moreread less

Proceedings Article•

Towards Automated Related Work Summarization

[...]

Cong Duy Vu Hoang¹, Min-Yen Kan¹•Institutions (1)

National University of Singapore¹

23 Aug 2010

TL;DR: The prototype Related Work Summarization system, ReWoS, takes in set of keywords arranged in a hierarchical fashion that describes a target paper's topics to drive the creation of an extractive summary using two different strategies for locating appropriate sentences for general topics as well as detailed ones.

...read moreread less

Abstract: We introduce the novel problem of automatic related work summarization. Given multiple articles (e.g., conference/journal papers) as input, a related work summarization system creates a topic-biased summary of related work specific to the target paper. Our prototype Related Work Summarization system, ReWoS, takes in set of keywords arranged in a hierarchical fashion that describes a target paper's topics, to drive the creation of an extractive summary using two different strategies for locating appropriate sentences for general topics as well as detailed ones. Our initial results show an improvement over generic multi-document summarization baselines in a human evaluation.

...read moreread less

Proceedings Article•

A Study on Position Information in Document Summarization

[...]

You Ouyang¹, Wenjie Li¹, Qin Lu¹, Renxian Zhang¹•Institutions (1)

Hong Kong Polytechnic University¹

23 Aug 2010

TL;DR: An extractive summarization model is proposed to provide an evaluation framework for the position information and results show that word position information is more effective and adaptive than sentence position information.

...read moreread less

Abstract: Position information has been proved to be very effective in document summarization, especially in generic summarization. Existing approaches mostly consider the information of sentence positions in a document, based on a sentence position hypothesis that the importance of a sentence decreases with its distance from the beginning of the document. In this paper, we consider another kind of position information, i.e., the word position information, which is based on the ordinal positions of word appearances instead of sentence positions. An extractive summarization model is proposed to provide an evaluation framework for the position information. The resulting systems are evaluated on various data sets to demonstrate the effectiveness of the position information in different summarization tasks. Experimental results show that word position information is more effective and adaptive than sentence position information.

...read moreread less

Proceedings Article•DOI•

Discover Information and Knowledge from Websites Using an Integrated Summarization and Visualization Framework

[...]

Chun Che Fung¹, Wigrai Thanadechteemapat¹•Institutions (1)

Murdoch University¹

09 Jan 2010

TL;DR: This paper proposes a knowledge discovery approach on the Web by providing an overview of the information on a Website using an integration of summarization and visualization techniques, which is capable to reduce the time required to identify and search for information or knowledge from the Web.

...read moreread less

Abstract: The number of Web sites has noticeably increased to roughly 225 million in the last ten years. This means there is a rapid growth of knowledge and information on the Internet. Although search engines can help users to filter their desired information based on key words, the searched result is normally presented in the form of a list, and users have to visit each Web page in order to determine the appropriateness of the result. A considerable amount of time therefore has to be spent on finding the required information. To address this issue, this paper proposes a knowledge discovery approach on the Web by providing an overview of the information on a Website using an integration of summarization and visualization techniques. This includes text summarization, tag cloud, Document Type View, and interactive features such as drill down and thumbnails. This approach is capable to reduce the time required to identify and search for information or knowledge from the Web.

...read moreread less

Journal Article•DOI•

Improving supervised learning for meeting summarization using sampling and regression

[...]

Shasha Xie¹, Yang Liu¹•Institutions (1)

University of Texas at Dallas¹

01 Jul 2010-Computer Speech & Language

TL;DR: This paper reframe the extractive summarization task using a regression scheme instead of binary classification, and evaluates the approaches using the ICSI meeting corpus on both the human transcripts and speech recognition output, and shows performance improvement using different sampling methods and regression model.

...read moreread less

Proceedings Article•DOI•

Modeling Document Summarization as Multi-objective Optimization

[...]

Lei Huang¹, Yanxiang He¹, Furu Wei¹, Wenjie Li•Institutions (1)

Wuhan University¹

02 Apr 2010

TL;DR: This paper considers document summarization as a multi-objective optimization problem involving four objective functions, namely information coverage, significance, redundancy and text coherence, and chooses the DUC 2005 and 2006 query-oriented summarization tasks to exam the proposed model.

...read moreread less

Abstract: In this paper, we consider document summarization as a multi-objective optimization problem involving four objective functions, namely information coverage, significance, redundancy and text coherence. These functions measure the possible summaries based on the identified core terms and main topics (i.e. a cluster of semantically or statistically related core terms). We choose the DUC 2005 and 2006 query-oriented summarization tasks to exam the proposed model. The encouraging results indicate that the multi-objective optimization based framework for document summarization is truly a promising research direction.

...read moreread less

Journal Article•DOI•

Using topic themes for multi-document summarization

[...]

Sanda M. Harabagiu¹, Finley Lacatusu¹•Institutions (1)

University of Texas at Dallas¹

02 Jul 2010-ACM Transactions on Information Systems

TL;DR: This article presents eight different methods of generating multidocument summaries and evaluates each of these methods on a large set of topics used in past DUC workshops, showing a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

Abstract: The problem of using topic representations for multidocument summarization (MDS) has received considerable attention recently. Several topic representations have been employed for producing informative and coherent summaries. In this article, we describe five previously known topic representations and introduce two novel representations of topics based on topic themes. We present eight different methods of generating multidocument summaries and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

Proceedings Article•

Experiments with CST-Based Multidocument Summarization

[...]

María Lucía Del Rosario Castro Jorge¹, Thiago Alexandre Salgueiro Pardo¹•Institutions (1)

University of São Paulo¹

16 Jul 2010

TL;DR: This work evaluates deep content selection methods for multidocument summarization based on the CST model (Cross-document Structure Theory) and shows that the use of CST model helps to improve informativeness and quality in automatic summaries.

...read moreread less

Abstract: Recently, with the huge amount of growing information in the web and the little available time to read and process all this information, automatic summaries have become very important resources. In this work, we evaluate deep content selection methods for multidocument summarization based on the CST model (Cross-document Structure Theory). Our methods consider summarization preferences and focus on the overall main problems of multidocument treatment: redundancy, complementarity, and contradiction among different information sources. We also evaluate the impact of the CST model over superficial summarization systems. Our results show that the use of CST model helps to improve informativeness and quality in automatic summaries.

...read moreread less

Proceedings Article•DOI•

Ontology-enriched multi-document summarization in disaster management

[...]

Lei Li¹, Dingding Wang¹, Chao Shen¹, Tao Li¹•Institutions (1)

Florida International University¹

19 Jul 2010

TL;DR: Evaluation results on the collection of press releases by Miami-Dade County Department of Emergency Management during Hurricane Wilma in 2005 demonstrate the efficacy of Ontology-enriched Multi-Document Summarization.

...read moreread less

Abstract: In this poster, we propose a novel document summarization approach named Ontology-enriched Multi-Document Summarization(OMS) for utilizing background knowledge to improve summarization results. OMS first maps the sentences of input documents onto an ontology, then links the given query to a specific node in the ontology, and finally extracts the summary from the sentences in the subtree rooted at the query node. By using the domain-related ontology, OMS can better capture the semantic relevance between the query and the sentences, and thus lead to better summarization results. As a byproduct, the final summary generated by OMS can be represented as a tree showing the hierarchical relationships of the extracted sentences. Evaluation results on the collection of press releases by Miami-Dade County Department of Emergency Management during Hurricane Wilma in 2005 demonstrate the efficacy of OMS.

...read moreread less

Proceedings Article•

Multi-Document Summarization Using A* Search and Discriminative Learning

[...]

Ahmet Aker¹, Trevor Cohn¹, Robert Gaizauskas¹•Institutions (1)

University of Sheffield¹

09 Oct 2010

TL;DR: This paper proposes an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run, and proposes a discriminative training algorithm which directly maximises the quality of the best summary.

...read moreread less

Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality of the best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.

...read moreread less

Proceedings Article•

Topic-Based Bengali Opinion Summarization

[...]

Amitava Das¹, Sivaji Bandyopadhyay¹•Institutions (1)

Jadavpur University¹

23 Aug 2010

TL;DR: The development of an opinion summarization system that works on Bengali News corpus and building of annotated gold standard corpus and acquisition of linguistics tools for lexico-syntactic, syntactic and discourse level features extraction are described.

...read moreread less

Abstract: In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present system follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse level theme identification and the topic-sentiment aggregation is achieved by theme clustering (k-means) and Document level Theme Relational Graph representation. The Document Level Theme Relational Graph is finally used for candidate summary sentence selection by standard page rank algorithms used in Information Retrieval (IR). As Bengali is a resource constrained language, the building of annotated gold standard corpus and acquisition of linguistics tools for lexico-syntactic, syntactic and discourse level features extraction are described in this paper. The reported accuracy of the Theme detection technique is 83.60% (precision), 76.44% (recall) and 79.85% (F-measure). The summarization system has been evaluated with Precision of 72.15%, Recall of 67.32% and F-measure of 69.65%.

...read moreread less