Showing papers on "Multi-document summarization published in 2005"

PDF

Open Access

Proceedings Article•

A Language Independent Algorithm for Single and Multiple Document Summarization

[...]

Rada Mihalcea¹, Paul Tarau•Institutions (1)

01 Oct 2005

TL;DR: This paper discusses a language independent algorithm for single and multiple document summarization that is independent of the language used for summarization in this paper.

...read moreread less

Abstract: This paper discusses a language independent algorithm for single and multiple document summarization.

...read moreread less

209 citations

Proceedings Article•DOI•

Extracting knowledge from evaluative text

[...]

Giuseppe Carenini¹, Raymond T. Ng¹, Ed Zwart¹•Institutions (1)

University of British Columbia¹

02 Oct 2005

TL;DR: An improved method for feature extraction that draws on an existing unsupervised method is introduced that turns the task of feature extraction into one of term similarity by mapping crude (learned) features into a user-defined taxonomy of the entity's features.

...read moreread less

Abstract: Capturing knowledge from free-form evaluative texts about an entity is a challenging task. New techniques of feature extraction, polarity determination and strength evaluation have been proposed. Feature extraction is particularly important to the task as it provides the underpinnings of the extracted knowledge. The work in this paper introduces an improved method for feature extraction that draws on an existing unsupervised method. By including user-specific prior knowledge of the evaluated entity, we turn the task of feature extraction into one of term similarity by mapping crude (learned) features into a user-defined taxonomy of the entity's features. Results show promise both in terms of the accuracy of the mapping as well as the reduction in the semantic redundancy of crude features.

...read moreread less

209 citations

Journal Article•DOI•

Summarization from medical documents: a survey

[...]

Stergos D. Afantenos, Vangelis Karkaletsis, Panagiotis Stamatopoulos¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Feb 2005-Artificial Intelligence in Medicine

TL;DR: The paper discusses thoroughly the promising paths for future research in medical documents summarization, including the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.

...read moreread less

201 citations

Proceedings Article•DOI•

Topic themes for multi-document summarization

[...]

Sanda M. Harabagiu¹, Finley Lacatusu¹•Institutions (1)

Language Computer Corporation¹

15 Aug 2005

TL;DR: This paper presents eight different methods of generating MDS and evaluates each of these methods on a large set of topics used in past DUC workshops, showing a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

Abstract: The problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic themes. We present eight different methods of generating MDS and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

174 citations

Proceedings Article•DOI•

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization

[...]

Sameer Maskey, Julia Hirschberg

04 Sep 2005

TL;DR: It is shown that a summarization system that uses a combination of lexical, prosodic, structural and discourse features produces the most accurate summaries, and that a combinations of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.

...read moreread less

Abstract: We present results of an empirical study of the usefulness of different types of features in selecting extractive summaries of news broadcasts for our Broadcast News Summarization System. We evaluate lexical, prosodic, structural and discourse features as predictors of those news segments which should be included in a summary. We show that a summarization system that uses a combination of these feature sets produces the most accurate summaries, and that a combination of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.

...read moreread less

169 citations

Proceedings Article•DOI•

Automatic text summarization of newswire: lessons learned from the document understanding conference

[...]

Ani Nenkova¹•Institutions (1)

Columbia University¹

09 Jul 2005

TL;DR: An overview of the achieved results in the different types of summarization tasks, comparing both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic).

...read moreread less

Abstract: Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text summarization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been addressed--single document summarization, multi-document summarization, summarization focused by question, and headline generation. This paper is an overview of the achieved results in the different types of summarization tasks. We compare both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic). An analysis of variance model is fitted, with summarizer and input set as independent variables, and the coverage score as the dependent variable, and simulation-based multiple comparisons were performed. The results document the progress in the field as a whole, rather then focusing on a single system, and thus can serve as a future reference on the work done up to date, as well as a starting point in the formulation of future tasks. Results also indicate that most progress in the field has been achieved in generic multi-document summarization and that the most challenging task is that of producing a focused summary in answer to a question/topic.

...read moreread less

167 citations

Journal Article•DOI•

NewsInEssence: summarizing online news topics

[...]

Dragomir R. Radev¹, Jahna Otterbacher¹, Adam Winkel, Sasha Blair-Goldensohn²•Institutions (2)

University of Michigan¹, Columbia University²

01 Oct 2005-Communications of The ACM

TL;DR: A news delivery and summarization system, acting as a user's agent, gathers and recaps news items based on specifications and interests.

...read moreread less

Abstract: A news delivery and summarization system, acting as a user's agent, gathers and recaps news items based on specifications and interests.

...read moreread less

135 citations

Journal Article•DOI•

Summarization - compressing data into an informative representation

[...]

Varun Chandola¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

27 Nov 2005

TL;DR: This paper formulate the problem of summarization of a data set of transactions with categorical attributes as an optimization problem involving two objective functions – compaction gain and information loss and proposes metrics to characterize the output of any summarization algorithm.

...read moreread less

Abstract: In this paper, we formulate the problem of summarization of a dataset of transactions with categorical attributes as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent item sets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program.

...read moreread less

117 citations

Patent•

System and method for document collection, grouping and summarization

[...]

Kathleen R. McKeown, Regina Barzilay, Dave Evans, Vasileios Hatzivassiloglou, Judith L. Klavans, Ani Nenkova, Barry Schiffman - Show less +3 more

04 Mar 2005

TL;DR: In this paper, a system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality related documents stored in electronic form.

...read moreread less

Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.

...read moreread less

91 citations

Journal Article•DOI•

Customization in a unified framework for summarizing medical literature

[...]

Noémie Elhadad¹, Min-Yen Kan², Judith L. Klavans¹, Kathleen R. McKeown¹•Institutions (2)

Columbia University¹, National University of Singapore²

01 Feb 2005-Artificial Intelligence in Medicine

TL;DR: The research shows that customization is feasible in a medical digital library and employs a unified user model to create a tailored summary of relevant documents for either a physician or lay person.

...read moreread less

83 citations

Proceedings Article•DOI•

Major topic detection and its application to opinion summarization

[...]

Lun-Wei Ku¹, Li-Ying Lee¹, Tung-Ho Wu¹, Hsin-Hsi Chen¹•Institutions (1)

National Taiwan University¹

15 Aug 2005

Proceedings Article•

Multi-document Biography Summarization

[...]

Liang Zhou¹, Miruna Ticrea, Eduard Hovy•Institutions (1)

Information Sciences Institute¹

26 Jan 2005

TL;DR: The authors used sentence classification and ideas from information retrieval to generate multi-document biographies and achieved the top performance in task 5-short summaries focused by person questions in DUC2004.

...read moreread less

Abstract: In this paper we describe a biography summarization system using sentence classification and ideas from information retrieval. Although the individual techniques are not new, assembling and applying them to generate multi-document biographies is new. Our system was evaluated in DUC2004. It is among the top performers in task 5–short summaries focused by person questions.

...read moreread less

Proceedings Article•DOI•

Effective Summarization Method of Text Documents

[...]

Rasim M. Alguliev¹, Ramiz M. Aliguliyev¹•Institutions (1)

Azerbaijan National Academy of Sciences¹

19 Sep 2005

TL;DR: Text summarization method is proposed that creates text summary by definition of the relevance score of each sentence and extracting sentences from the original documents using genetic algorithms.

...read moreread less

Abstract: In this paper, we propose text summarization method that creates text summary by definition of the relevance score of each sentence and extracting sentences from the original documents. While summarization this method takes into account weight of each sentence in the document. The essence of the method suggested is in preliminary identification of every sentence in the document with characteristic vector of words, which appear in the document, and calculation of relevance score for each sentence. The relevance score of sentence is determined through its comparison with all the other sentences in the document and with the document title by cosine measure. Prior to application of this method the scope of features is defined and then the weight of each word in the sentence is calculated with account of those features. The weights of features, influencing relevance of words, are determined using genetic algorithms.

...read moreread less

Journal Article•DOI•

Visualization-enabled multi-document summarization by Iterative Residual Rescaling

[...]

Rie Ando¹, Branimir Boguraev¹, Roy J. Byrd¹, Mary S. Neff¹•Institutions (1)

IBM¹

01 Mar 2005-Natural Language Engineering

TL;DR: A novel approach to multi-document summarization, which explicitly addresses the problem of detecting, and retaining for the summary, multiple themes in document collections, and applies Iterative Residual Rescaling (IRR).

...read moreread less

Abstract: This paper describes a novel approach to multi-document summarization, which explicitly addresses the problem of detecting, and retaining for the summary, multiple themes in document collections. We place equal emphasis on the processes of theme identification and theme presentation. For the former, we apply Iterative Residual Rescaling (IRR); for the latter, we argue for graphical display elements. IRR is an algorithm designed to account for correlations between words and to construct multi-dimensional topical space indicative of relationships among linguistic objects (documents, phrases, and sentences). Summaries are composed of objects with certain properties, derived by exploiting the many-to-many relationships in such a space. Given their inherent complexity, our multi-faceted summaries benefit from a visualization environment. We discuss some essential features of such an environment.

...read moreread less

The Embra System at DUC 2005: Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space

[...]

Ben Hachey¹, Gabriel Murray¹, David Reitter¹•Institutions (1)

University of Edinburgh¹

01 Jan 2005

TL;DR: The Embra system is presented, a rst-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions.

...read moreread less

Abstract: We present the Embra system, a rst-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions. The system takes a novel approach to relevance and redundancy, modeling sentence similarity using a latent semantic space constructed over a very large corpus. We present a simple approach to modeling specicity based on named entities which shows a small improvement over baseline. Finally, we discuss coherence and present a sentence reordering algorithm with a componentlevel evaluation demonstrating a positive effect.

...read moreread less

Proceedings Article•DOI•

AcceSS: accessibility through simplification & summarization

[...]

Bambang Parmanto¹, Reza Ferrydiansyah¹, Andi Saptono¹, Lijing Song¹, I Wayan Sugiantara¹, Stephanie Hackett¹ - Show less +2 more•Institutions (1)

University of Pittsburgh¹

10 May 2005

TL;DR: The goal of this project is to make the Web more accessible by providing some of the features naturally available to sighted users to users with visual impairments, which can emerge from simplification and summarization.

...read moreread less

Abstract: The goal of this project is to make the Web more accessible by providing some of the features naturally available to sighted users to users with visual impairments. These features are direct access and gestalt understanding, which can emerge from simplification and summarization. Simplification is achieved by retaining sections of the web page that are considered important while removing the clutter. The purpose of summarization is to provide the users with a preview of the web page. Simplification and summarization are implemented as a "guide dog" that helps users navigate the entire web site.

...read moreread less

Journal Article•DOI•

Induction of Word and Phrase Alignments for Automatic Document Summarization

[...]

Hal Daumé¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

01 Dec 2005-Computational Linguistics

TL;DR: In this article, an extension of the standard hidden Markov model is proposed to generate word-to-word and phrase-tophrase alignments between documents and their human-written abstracts.

...read moreread less

Abstract: Current research in automatic single-document summarization is dominated by two effective, yet naive approaches: summarization by sentence extraction and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase alignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of humans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of document, abstract pairs.

...read moreread less

Journal Article•DOI•

Generic technologies for single- and multi-document summarization

[...]

Marie-Francine Moens¹, Roxana Angheluta¹, Jos Dumortier¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 May 2005-Information Processing and Management

TL;DR: The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

...read moreread less

Abstract: The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

...read moreread less

Journal Article•DOI•

Multi-answer-focused multi-document summarization using a question-answering engine

[...]

Tatsunori Mori¹, Masanori Nozawa¹, Yoshiaki Asada¹•Institutions (1)

Yokohama National University¹

01 Sep 2005-ACM Transactions on Asian Language Information Processing

TL;DR: This paper proposed a method to calculate sentence importance using scores, for responses to multiple questions, generated by a Question-Answering engine, and described the integration of this method with a generic multi-document summarization system.

...read moreread less

Abstract: In recent years, answer-focused summarization has gained attention as a technology complementary to information retrieval and question answering. In order to realize multi-document summarization focused by multiple questions, we propose a method to calculate sentence importance using scores, for responses to multiple questions, generated by a Question-Answering engine. Further, we describe the integration of this method with a generic multi-document summarization system. The evaluation results demonstrate that the performance of the proposed method is better than not only several baselines but also other participants' systems at the evaluation workshop NTCIR4 TSC3 Formal Run. However, it should be noted that some of the other systems do not use the information of questions.

...read moreread less

Proceedings Article•DOI•

A cue-based hub-authority approach for multi-document text summarization

[...]

Junlin Zhang¹, Le Sun¹, Quan Zhou¹•Institutions (1)

Chinese Academy of Sciences¹

30 Oct 2005

TL;DR: A new approach under the hub-authority framework is proposed that combines the text content with some cues and explores the sub-topics in the multi-documents by bringing the features of these sub- topics into graph-based sentence ranking algorithms.

...read moreread less

Abstract: Multi-document extractive summarization relies on the concept of sentence centrality to identify the most important sentences in a document. Although some research has introduced the graph-based ranking algorithms such as PageRank and HITS into the text summarization, we propose a new approach under the hub-authority framework in this paper. Our approach combines the text content with some cues such as "cue phrase", "sentence length" and "first sentence" and explores the sub-topics in the multi-documents by bringing the features of these sub-topics into graph-based sentence ranking algorithms. We provide an evaluation of our method on DUC 2004 data. The results show that our approach is an effective graph-ranking schema in multi-document generic text summarization.

...read moreread less

Proceedings Article•DOI•

Structure-based query-specific document summarization

[...]

Ramakrishna Varadarajan¹, Vagelis Hristidis¹•Institutions (1)

Florida International University¹

31 Oct 2005

TL;DR: This work presents a method to create query-specific summaries by adding structure to documents by extracting associations between their fragments.

...read moreread less

Abstract: Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequences of words and create query-independent summaries. However, ignoring the structure of the document degrades the quality of summaries. Furthermore, the popularity of web search engines requires query-specific summaries. We present a method to create query-specific summaries by adding structure to documents by extracting associations between their fragments.

...read moreread less

MuST: A Workshop on Multimodal Summarization for Trend Information

[...]

Kato Tsuneaki¹, Matsushita Mitsunori, Kando Noriko²•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

01 Jan 2005

TL;DR: The MuST (Multimodal Summarization for Trend Information) workshop was designed to encourage cooperative and competitive studies on summarization and visualization for trend information and is expected to encourage studies in a wide variety of research fields.

...read moreread less

Abstract: The MuST (Multimodal Summarization for Trend Information) workshop was designed to encourage cooperative and competitive studies on summarization and visualization for trend information. The main objective of the workshop is to develop technologies using multimedia presentation that will allow intelligent systems to provide users with appropriate answers to their queries on trend information. These technologies not only could be considered as multimedia presentation generation and multimodal dialogue processing, but also rely largely on information access technologies such as automatic summarization and information extraction, and information visualization. Therefore, the workshop is expected to encourage studies in a wide variety of research fields. A noteworthy feature of the workshop is that the participants share the same research resource, whereby they address common or related themes, with the expectation of encouraging active research and discussion, conforming communities, and constructing and accumulating resources such as tools and corpora.

...read moreread less

Multi-Document Summarization with Subjectivity Analysis at DUC 2005

[...]

Yohei Seki, Koji Eguchi, Noriko Kando, Masaki Aono

01 Jan 2005

TL;DR: This paper presents the team TUT/NII results at DUC 2005 and additional experiments on improving multi-document summarization, and investigated improvements of ROUGE and BE scores with the approach based on sentence extraction, weighted by sentence type annotation and combined with polarity term frequencies.

...read moreread less

Abstract: In this paper, we present our team TUT/NII results at DUC 2005 and additional experiments on improving multi-document summarization. Summarization systems have typically focused on the factual aspect of information needs. Subjectivity analysis is another essential aspect for better understanding of information needs. Our approach is based on sentence extraction, weighted by sentence type annotation, and combined with polarity term frequencies. We selected 10 topics related to subjectivity with analysis of “narratives”, and investigated improvements of ROUGE (RecallOriented Understudy for Gisting Evaluation) and BE (Basic Elements) scores with our approach. In addition, the factual aspect of information needs was also investigated.

...read moreread less

DOI•

[...]

David Evans, Kathleen R. McKeown, Judith L. Klavans

01 Jan 2005

TL;DR: This work presents a new approach to multilingual multi-document summarization that uses text similarity to choose sentences from English documents based on the content of the machine translated documents.

...read moreread less

Abstract: We present a new approach for summarizing clusters of documents on the same event, some of which are machine translations of foreign-language documents and some of which are English. Our approach to multilingual multi-document summarization uses text similarity to choose sentences from English documents based on the content of the machine translated documents. A manual evaluation shows that 68% of the sentence replacements improve the summary, and the overall summarization approach outperforms first-sentence extraction baselines in automatic ROUGEbased evaluations.

...read moreread less

Patent•

Method and system for summarizing a document

[...]

Benyu Zhang¹, Dou Shen¹, Hua-Jun Zeng¹, Wei-Ying Ma¹, Zheng Chen¹ - Show less +1 more•Institutions (1)

Microsoft¹

10 Aug 2005

TL;DR: In this article, a method and system for calculating the significance of a sentence within a document is provided, which can then be used to identify significant sentences of a document based on the important words that a sentence contains and select significant sentences as a summary of the document.

...read moreread less

Abstract: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the "important" words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.

...read moreread less

Journal Article•DOI•

An intelligent summarization system based on cognitive psychology

[...]

Yi Guo¹, George K Stylios¹•Institutions (1)

Heriot-Watt University¹

28 Jun 2005-Information Sciences

TL;DR: An intelligent system, the event indexing and summarization (EIS) system, for automatic document summarization, which is based on a cognitive psychology model (the event-indexing model) and the roles and importance of sentences and their syntax in document understanding is introduced.

...read moreread less

CATS a topic-oriented multi-document summarization system at DUC 2005

[...]

Atefeh Farzindar, Guy Lapalme Rali-Diro

01 Jan 2005

TL;DR: CATS is a multidocument summarizing document that produces an integrated summary of the need for information at a given level of granularity from a set of topic related documents developed at the Universit´ e de Montrfor DUC2005.

...read moreread less

Abstract: CATS is a multidocument summarizing sys- tem developed at the Universit´ e de Montrfor DUC2005. From a set of topic related docu- ments, it produces an integrated summary an- swering the need for information at a given level of granularity. It starts from a thematic analysis of the documents to identify a list of text segments containing interesting aspects re- lated to the subject. It then matches these themes with the ones detected in the question. The very good results obtained at the DUC competition are described and discussed. Abstract

...read moreread less

Proceedings Article•DOI•

Multi-document summarization based on lexical chains

[...]

Yan-Min Chen¹, Xiaolong Wang¹, Bingquan Liu¹•Institutions (1)

Harbin Institute of Technology¹

07 Nov 2005

TL;DR: This paper for the first time investigates using lexical chains as a model of multiple documents written in Chinese to generate an indicative, moderately fluent summary and finds that lexical Chains are effective for multidocument summarization.

...read moreread less

Abstract: This paper for the first time investigates using lexical chains as a model of multiple documents written in Chinese to generate an indicative, moderately fluent summary. The algorithm which computes lexical chains based on the HowNet knowledge database is modified to improve the performance and suit Chinese summarization. Based on an analysis of semanteme, the algorithm can remove redundant similarities and remain differences in information content among multiple documents. The method pre-processes the text first, then constructs lexical chains and identifies strong chains. Then significant sentences are extracted from each document and are ordered, and redundant information are recognized and removed. Finally, the summary is generated in chronological order, and the anaphora resolution technology is applied to improve the fluency of the summary. Evaluation results show that the performance of the presented system is obviously better than that of the baseline system, and lexical chains are effective for multidocument summarization.

...read moreread less

Proceedings Article•DOI•

Automatically Learning Cognitive Status for Multi-Document Summarization of Newswire

[...]

Ani Nenkova¹, Advaith Siddharthan¹, Kathleen R. McKeown¹•Institutions (1)

Columbia University¹

06 Oct 2005

TL;DR: This paper presents an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input, and examines two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new)

...read moreread less

Abstract: Machine summaries can be improved by using knowledge about the cognitive status of news article referents. In this paper, we present an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input. We focus on modeling references to people, both because news often revolve around people and because existing natural language tools for named entity identification are reliable. We examine two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new) and whether a person is a major character in the news story. We report on machine learning experiments that show that these distinctions can be learned with high accuracy, and validate our approach using human subjects.

...read moreread less

A Sentence-Trimming Approach to Multi-Document Summarization

[...]

David Zajic, Bonnie J. Dorr, Jimmy Lin, Christof Monz, Richard Schwartz - Show less +1 more

01 Jan 2005

TL;DR: It is demonstrated that the initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks was able to port Trimmer easily and the direct impact of sentence trimming was minimal compared to other features used in the system.

...read moreread less

Abstract: We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called MultiDocument Trimmer (MDT), by using sentence trimming as both a preprocessing stage and a feature for sentence ranking. We demonstrate that we were able to port Trimmer easily to this new problem. Although the direct impact of sentence trimming was minimal compared to other features used in the system, the interaction of the other features resulted in trimmed sentences accounting for nearly half of the selected summary sentences.

...read moreread less