Showing papers on "Multi-document summarization published in 2004"

PDF

Open Access

Proceedings Article•

ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin¹•Institutions (1)

25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

...read moreread less

9,293 citations

Journal Article•DOI•

LexRank: graph-based lexical centrality as salience in text summarization

[...]

Gunes Erkan¹, Dragomir R. Radev¹•Institutions (1)

University of Michigan¹

01 Jul 2004-Journal of Artificial Intelligence Research

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.

...read moreread less

Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

...read moreread less

2,367 citations

Journal Article•DOI•

Centroid-based summarization of multiple documents

[...]

Dragomir R. Radev¹, Hongyan Jing², Małgorzata Styś², Daniel Tam¹•Institutions (2)

University of Michigan¹, IBM²

01 Nov 2004-Information Processing and Management

TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.

...read moreread less

Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

...read moreread less

1,121 citations

Proceedings Article•DOI•

Evaluating Content Selection in Summarization: The Pyramid Method

[...]

Ani Nenkova, Rebecca J. Passonneau

01 Jan 2004

TL;DR: It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

...read moreread less

Abstract: We present an empirically grounded method for evaluating content selection in summarization. It incorporates the idea that no single best model summary for a collection of documents exists. Our method quantifies the relative importance of facts to be conveyed. We argue that it is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

...read moreread less

640 citations

Proceedings Article•DOI•

MEAD - A Platform for Multidocument Multilingual Text Summarization

[...]

Dragomir R. Radev¹, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda Çelebi, Stanko Dimitrov, Elliott F. Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, Zhu Zhang - Show less +13 more•Institutions (1)

University of Michigan¹

01 May 2004

TL;DR: The functionality of MEAD is described, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations.

...read moreread less

Abstract: This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

...read moreread less

378 citations

Proceedings Article•DOI•

Web-page classification through summarization

[...]

Dou Shen¹, Zheng Chen², Qiang Yang³, Hua-Jun Zeng², Benyu Zhang², Yuchang Lu¹, Wei-Ying Ma² - Show less +3 more•Institutions (3)

Tsinghua University¹, Microsoft², Hong Kong University of Science and Technology³

25 Jul 2004

TL;DR: This paper gives empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web- page classification algorithms and proposes a new Web summarization-based classification algorithm that achieves an approximately 8.8% improvement over pure-text based methods.

...read moreread less

Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.

...read moreread less

204 citations

Proceedings Article•DOI•

Abstraction summarization for managing the biomedical research literature

[...]

Marcelo Fiszman¹, Thomas C. Rindflesch¹, Halil Kilicoglu¹•Institutions (1)

National Institutes of Health¹

06 May 2004

TL;DR: A semantic abstraction approach to automatic summarization in the biomedical domain relies on a semantic processor that functions as the source interpreter and produces a list of predications, ultimately generating a conceptual condensate for a disorder input topic.

...read moreread less

Abstract: We explore a semantic abstraction approach to automatic summarization in the biomedical domain. The approach relies on a semantic processor that functions as the source interpreter and produces a list of predications. A transformation stage then generalizes and condenses this list, ultimately generating a conceptual condensate for a disorder input topic. The final condensate is displayed in graphical form. We provide a set of principles for the transformation stage and describe the application of this approach to multidocument input. Finally, we examine the characteristics and quality of the condensates produced.

...read moreread less

126 citations

Task-Focused Summarization of Email

[...]

Simon Corston-Oliver, Eric K. Ringger, Michael Gamon, Richard John Campbell¹•Institutions (1)

Microsoft¹

01 Jul 2004

TL;DR: SmartMail, a prototype system for automatically identifying action items (tasks) in email messages, presents the user with a task-focused summary of a message that contains a list of action items extracted from the message.

...read moreread less

Abstract: We describe SmartMail, a prototype system for automatically identifying action items (tasks) in email messages. SmartMail presents the user with a task-focused summary of a message. The summary consists of a list of action items extracted from the message. The user can add these action items to their “to do” list.

...read moreread less

119 citations

Proceedings Article•DOI•

Syntactic simplification for improving content selection in multi-document summarization

[...]

Advaith Siddharthan¹, Ani Nenkova¹, Kathleen R. McKeown¹•Institutions (1)

Columbia University¹

23 Aug 2004

TL;DR: It is shown how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information.

...read moreread less

Abstract: In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant improvement on the automated evaluation metric Rouge.

...read moreread less

106 citations

Proceedings Article•DOI•

Columbia Newsblaster: multilingual news summarization on the web

[...]

David Evans¹, Judith L. Klavans¹, Kathleen R. McKeown¹•Institutions (1)

Columbia University¹

02 May 2004

TL;DR: The new multilingual version of the Columbia Newsblaster news summarization system automatically collects, organizes, and summarizes news in multiple source languages, allowing the user to browse news topics with English summaries, and compare perspectives from different countries on the topics.

...read moreread less

Abstract: We present the new multilingual version of the Columbia Newsblaster news summarization system. The system addresses the problem of user access to browsing news from multiple languages from multiple sites on the internet. The system automatically collects, organizes, and summarizes news in multiple source languages, allowing the user to browse news topics with English summaries, and compare perspectives from different countries on the topics.

...read moreread less

84 citations

Journal Article•DOI•

Multidocument summarization: An added value to clustering in interactive retrieval

[...]

Manuel J. Maña-López¹, Manuel de Buenaga², José M. Gómez-Hidalgo²•Institutions (2)

University of Vigo¹, European University of Madrid²

01 Apr 2004-ACM Transactions on Information Systems

TL;DR: This article proposes in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques.

...read moreread less

Abstract: A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar documents, one or several particular aspects. This kind of task, called instance or aspectual retrieval, has been explored in several TREC Interactive Tracks. In this article, we propose in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques. Two kinds of summaries are provided. The first one covers the similarities of each cluster of documents retrieved. The second one shows the particularities of each document with respect to the common topic in the cluster. The document multitopic structure has been used in order to determine similarities and differences of topics in the cluster of documents. The system is independent of document domain and genre. An evaluation of the proposed system with users proves significant improvements in effectiveness. The results of previous experiments that have compared clustering algorithms are also reported.

...read moreread less

LetSum, an automatic Legal Text Summarizing system

[...]

Atefeh Farzindar¹, Guy Lapalme¹•Institutions (1)

Université de Montréal¹

01 Jan 2004

TL;DR: LetSum (Legal text Sum- marizer), a prototype system, is described, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION, which identifies the relevant sentences for each theme.

...read moreread less

Abstract: This paper presents our work on the development of a new methodology for automatic summarization of justice decision. We describe LetSum (Legal text Sum- marizer), a prototype system, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION. Then it identifies the relevant sentences for each theme. We discuss the evaluation of produced summaries with statistical method and also human evaluation based on jurist judgment. The results so far indicate good performance of the system when compared with other summarization technologies.

...read moreread less

Journal Article•DOI•

Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3

[...]

Manabu Okumura¹, Takahiro Fukusima², Hidetsugu Nanba³, Tsutomu Hirao⁴•Institutions (4)

Tokyo Institute of Technology¹, Otemon Gakuin University², Hiroshima City University³, Nippon Telegraph and Telephone⁴

01 Jul 2004

TL;DR: The outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3, is reported.

...read moreread less

Abstract: We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summarization Challenge (TSC1) as introduction to TSC2. Then we explain TSC2 including the participants, the two tasks in TSC2, data used, evaluation methods for each task, and brief report on the results. Lastly we describe plans for the next evaluation, TSC3.

...read moreread less

Lakhas, an Arabic summarization system

[...]

Fouad Soufiane Douzidia¹, Guy Lapalme•Institutions (1)

Université de Montréal¹

01 Jan 2004

TL;DR: The structure of the system and the various compaction techniques developed in order to produce 10 words summaries of news articles are described and the score obtained using two different machine translation systems are presented.

...read moreread less

Abstract: This paper describes the Arabic summarization system that we have developed and evaluated on the very short summary of noisy text task of DUC2004. We describe the structure of the system and the various compaction techniques we developed in order to produce 10 words summaries of news articles. We also present the score we obtained using two different machine translation systems.

...read moreread less

Evaluation of Automatic Text Summarization

[...]

Martin Hassel

01 Jan 2004

Multi-document summarization by cluster/prole relevance and redundancy removal

[...]

Horacio Saggion¹, Robert Gaizauskas•Institutions (1)

University of Sheffield¹

01 Jan 2004

TL;DR: A sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person.

...read moreread less

Abstract: We describe a sentence extraction system that produces two sorts of multi-document summaries: the rst is a general-purpose summary of a cluster of related documents while the second is an entity-based summary of documents related to a particular person. The general-purpose summary is generated by a process that ranks sentences based on their document and cluster \worthiness". The personality-based summary is constructed by a process that ranks sentences according to a metric that uses coreference and lexical information in a person prole. In both cases, a process of redundancy removal is applied to exclude repeated information.

...read moreread less

Proceedings Article•DOI•

FarsiSum: a Persian text summarizer

[...]

Martin Hassel¹, Nima Mazdak²•Institutions (2)

Royal Institute of Technology¹, Stockholm University²

28 Aug 2004

TL;DR: FarsiSum is an attempt to create an automatic text summarization system for Persian that uses modules implemented in an existing summarizer geared towards the Germanic languages, a Persian stop-list in Unicode format and a small set of heuristic rules.

...read moreread less

Abstract: FarsiSum is an attempt to create an automatic text summarization system for Persian. The system is implemented as a HTTP client/server application written in Perl. It uses modules implemented in an existing summarizer geared towards the Germanic languages, a Persian stop-list in Unicode format and a small set of heuristic rules.

...read moreread less

Book Chapter•DOI•

Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection

[...]

Carlos N. Silla¹, Gisele L. Pappa², Alex A. Freitas², Celso A. A. Kaestner¹•Institutions (2)

Pontifícia Universidade Católica do Paraná¹, University of Kent²

22 Nov 2004

TL;DR: The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task.

...read moreread less

Abstract: The task of automatic text summarization consists of generating a summary of the original text that allows the user to obtain the main pieces of information available in that text, but with a much shorter reading time. This is an increasingly important task in the current era of information overload, given the huge amount of text available in documents. In this paper the automatic text summarization is cast as a classification (supervised learning) problem, so that machine learning-oriented classification methods are used to produce summaries for documents based on a set of attributes describing those documents. The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task. Computational results are reported for experiments with a document base formed by news extracted from The Wall Street Journal of the TIPSTER collection–a collection that is often used as a benchmark in the text summarization literature.

...read moreread less

Proceedings Article•DOI•

Using N-Grams to understand the nature of summaries

[...]

Michele Banko¹, Lucy Vanderwende¹•Institutions (1)

Microsoft¹

02 May 2004

TL;DR: Empirically characterize human-written summaries provided in a widely used summarization corpus and suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.

...read moreread less

Abstract: Although single-document summarization is a well-studied task, the nature of multi-document summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of human-written multi-document summaries have not been quantified. In this paper, we empirically characterize human-written summaries provided in a widely used summarization corpus by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than single-document summaries? Our results suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.

...read moreread less

Evaluation of automatic text summarizaiton : a practical implementation

[...]

Martin Hassel

01 Jan 2004

Proceedings Article•DOI•

A study of Chinese text summarization using adaptive clustering of paragraphs

[...]

Po Hu¹, Tingting He¹, Donghong Ji², Meng Wang¹•Institutions (2)

Central China Normal University¹, Institute for Infocomm Research Singapore²

14 Sep 2004

TL;DR: Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.

...read moreread less

Abstract: Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a clustering analysis algorithm, we can capture the number of different latent topic regions in a document adoptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.

...read moreread less

Proceedings Article•DOI•

Corpus and evaluation measures for multiple document summarization with multiple sources

[...]

Tsutomu Hirao, Takahiro Fukusima¹, Manabu Okumura², Chikashi Nobata, Hidetsugu Nanba³ - Show less +1 more•Institutions (3)

Otemon Gakuin University¹, Tokyo Institute of Technology², Hiroshima City University³

23 Aug 2004

TL;DR: A large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus, which annotates not only the important sentences in a document set, but also those among them that have the same content.

...read moreread less

Abstract: In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.

...read moreread less

Book Chapter•DOI•

News video summarization based on spatial and motion feature analysis

[...]

Wen-Nung Lie¹, Chun-Ming Lai¹•Institutions (1)

National Chung Cheng University¹

30 Nov 2004

TL;DR: The Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video.

...read moreread less

Abstract: In this paper, an efficient and effective summarization algorithm based on the extraction and analysis of spatial and motion features for MPEG news video is proposed. We focus on video feature analysis techniques based on the compressed domain (i.e., MVs and DCT coefficients), without the need of transformation back to the pixel domain. To give the viewers a quick and enough browse of the news content, we adopted a new strategy that the anchor audio is overlaid with the summarized news video. Hence, the detection of anchor shots and the summarization of news segment subject to a time-budget constraint constitute the two main works in this paper. In summarization of news segments, the Lagrangian multiplier approach was employed to build optimization in allocating time-lengths for all the segmented shots and getting the best perceived motion activity of the summarized video. Experiments show that our summarized news videos present an average MOS score of above 4.0 in a subjective test.

...read moreread less

Journal Article•DOI•

Usefulness of temporal information automatically extracted from news articles for topic tracking

[...]

Pyung Kim¹, Sung-Hyon Myaeng²•Institutions (2)

Chungnam National University¹, Information and Communications University²

01 Dec 2004-ACM Transactions on Asian Language Information Processing

TL;DR: A relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks and showing that time information extracted from the text does indeed help to significantly improve both precision and recall.

...read moreread less

Abstract: Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method for extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing timerevealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated and investigated the extraction and canonicalization methods for their accuracy and the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from the text does indeed help to significantly improve both precision and recall.

...read moreread less

Automatic Performance Evaluation for Video Summarization

[...]

Mei Huang¹, Ayesh B Mahajan¹, Daniel DeMenthon¹•Institutions (1)

University of Maryland, College Park¹

01 Jul 2004

TL;DR: This evaluation system can not only grade the quality of a video summary, but also compare different automatic summarization algorithms and make stepwise improvements on algorithms, without the need for new user feedback.

...read moreread less

Abstract: : This paper describes a system for automated performance evaluation of video summarization algorithms. We call it SUPERSIEV (System for Unsupervised Performance Evaluation of Ranked Summarization in Extended Videos). It is primarily designed for evaluating video summarization algorithms that perform frame ranking. The task of summarization is viewed as a kind of database retrieval, and we adopt some of the concepts developed for performance evaluation of retrieval in database systems. First, ground truth summaries are gathered in a user study from many assessors and for several video sequences. For each video sequence, these summaries are combined to generate a single reference file that represents the majority of assessors opinions. Then the system determines the best target reference frame for each frame of the whole video sequence and computes matching scores to form a lookup table that rates each frame. Given a summary from a candidate summarization algorithm, the system can then evaluate this summary from different aspects by computing recall, cumulated average precision, redundancy rate and average closeness. With this evaluation system, we can not only grade the quality of a video summary, but also (1) compare different automatic summarization algorithms and (2) make stepwise improvements on algorithms, without the need for new user feedback.

...read moreread less

Multi-document summarization using document set type classification

[...]

Junichi Fukumoto¹•Institutions (1)

Ritsumeikan University¹

01 Jan 2004

TL;DR: A summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism is proposed.

...read moreread less

Abstract: In this paper, we propose a summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarizationmechanism. This system will classify a document set into three types: (a) One topic type, (b) multi-topic type, and (c) others. These types will be identifi ed using information of high frequency nouns and Named Entity. In our multi-document summarization system, unnecessary parts are deleted after summarizing each documents and then multi-document summary is generated. In type (a), unnecessary parts are similar part between summarized documents by single document summarization. In type (b), unnecessary parts are unsimilar parts in documents. In type (c), unnecessaryparts are identified by scores used for single document summarization.

...read moreread less

Extending Document Summarization to Information Graphics

[...]

Sandra Carberry, Stephanie Elzer, Nancy L. Green, Kathleen F. McCoy, Daniel L. Chester - Show less +1 more

01 Jul 2004

TL;DR: It is argued that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and the approach to identifying this intended message and using it to construct the summary is outlined.

...read moreread less

Abstract: Information graphics (non-pictorial graphics such as bar charts or line graphs) are an important component of multimedia documents. Often such graphics convey information that is not contained elsewhere in the document. Thus document summarization must be extended to include summarization of information graphics. This paper addresses our work on graphic summarization. It argues that the message that the graphic designer intended to convey must play a major role in determining the content of the summary, and it outlines our approach to identifying this intended message and using it to construct the summary.

...read moreread less

Book Chapter•DOI•

Temporal Web Page Summarization

[...]

Adam Jatowt¹, Mitsuru Ishizuka¹•Institutions (1)

University of Tokyo¹

22 Nov 2004

TL;DR: A new method for temporal web page summarization based on trend and variance analysis is presented, which can be also used for summarization of dynamic collections of topically related web pages.

...read moreread less

Abstract: In the recent years the Web has become an important medium for communication and information storage. As this trend is predicted to continue, it is necessary to provide efficient solutions for retrieving and processing information found in WWW. In this paper we present a new method for temporal web page summarization based on trend and variance analysis. In the temporal summarization web documents are treated as dynamic objects that have changing contents and characteristics. The sequential versions of a single web page are retrieved during predefined time interval for which the summary is to be constructed. The resulting summary should represent the most popular, evolving concepts which are found in web document versions. The proposed method can be also used for summarization of dynamic collections of topically related web pages.

...read moreread less

Handling Figures in Document Summarization

[...]

Robert P. Futrelle¹•Institutions (1)

Northeastern University¹

01 Jul 2004

TL;DR: The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding, and the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.

...read moreread less

Abstract: Some document genres contain a large number of figures. This position paper outlines approaches to diagram summarization that can augment the many well-developed techniques of text summarization. We discuss figures as surrogates for entire documents, thumbnails, extraction, the relations between text and figures as well as how automation might be achieved. The focus is on diagrams (line drawings) because they allow parsing techniques to be used, in contrast to the difficulties of general image understanding. We describe the advances in raster image vectorization and parsing needed to produce corpora for diagram summarization.

...read moreread less

Summarization Experiments in DUC 2004

[...]

Kenneth C. Litkowski

01 Jan 2004

TL;DR: ClCL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts, and the Knowledge Management System was extended to include a refined capability for identifying multiword units for use in keyword generation.

...read moreread less

Abstract: CL Research's participation in the Document Understanding Conference for 2004 was primarily intended to conduct further experiments in the use of XML-tagged documents containing increasingly richer characterizations of texts. We extended the Knowledge Management System to include (1) a refined capability for identifying multiword units (phrases) for use in keyword generation, (2) the incorporation of word-sense disambiguation to tag senses and identify semantic types, and (3) the integration of question-answering functionality into the summarization framework. We did not devote much effort in refining our system to create summaries for the five tasks, but achieved reasonable levels of performance. We viewed the length restrictions imposed on the tasks as not providing sufficient flexibility to investigate different modes of summarization. We viewed the tasks of summarizing machine translations of poor quality as not very interesting. We used Tasks 1 and 3 to develop and r efine a keywor d gen er ation capa bility, ach ieving levels of four th of 18 a nd fourth of 10 priority 1 systems. In the more general summarization tasks, our performance was near the bottom of participating systems, but still achieved acceptable levels of performance. We performed much better on quality measures with our extraction-based summaries, with an overall level of third of 14 systems for Task 5. For several quality measures, our performance was somewhat less; these levels identify specifically those areas of summarization analysis where the use of an XML representation are particularly amenable to improvement. While we will continue to improve our summarization capability within the general guidelines, we believe that summarization is only one part of document understanding and may not represent needs of users for document exploration at a much deeper level.

...read moreread less