scispace - formally typeset
Search or ask a question

Showing papers on "Multi-document summarization published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work applies novel insight to develop a summarization algorithm that uses the web-image based prior information in an unsupervised manner and proposes a framework that relies on multiple summaries obtained through crowdsourcing to automatically evaluate summarization algorithms on a large scale.
Abstract: Given the enormous growth in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently. As these videos are generally of poor quality, summarization methods designed for well-produced videos do not generalize to them. To address this challenge, we propose to use web-images as a prior to facilitate summarization of user-generated videos. Our main intuition is that people tend to take pictures of objects to capture them in a maximally informative way. Such images could therefore be used as prior information to summarize videos containing a similar set of objects. In this work, we apply our novel insight to develop a summarization algorithm that uses the web-image based prior information in an unsupervised manner. Moreover, to automatically evaluate summarization algorithms on a large scale, we propose a framework that relies on multiple summaries obtained through crowdsourcing. We demonstrate the effectiveness of our evaluation framework by comparing its performance to that of multiple human evaluators. Finally, we present results for our framework tested on hundreds of user-generated videos.

318 citations


Journal ArticleDOI
TL;DR: A quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature are described and directions to improve the sentence extraction results obtained are suggested.
Abstract: Text summarization is the process of automatically creating a shorter version of one or more text documents. It is an important way of finding relevant information in large text libraries or in the Internet. Essentially, text summarization techniques are classified as Extractive and Abstractive. Extractive techniques perform text summarization by selecting sentences of documents according to some criteria. Abstractive summaries attempt to improve the coherence among sentences by eliminating redundancies and clarifying the contest of sentences. In terms of extractive summarization, sentence scoring is the technique most used for extractive text summarization. This paper describes and performs a quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature. Three different datasets (News, Blogs and Article contexts) were evaluated. In addition, directions to improve the sentence extraction results obtained are suggested.

278 citations


Journal ArticleDOI
01 Nov 2013
TL;DR: In this paper, the authors presented a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary, which can be used to generate summaries tailored to a model of the user preferences.
Abstract: In many decision-making scenarios, people can benefit from knowing what other people's opinions are. As more and more evaluative documents are posted on the Web, summarizing these useful resources becomes a critical task for many organizations and individuals. This paper presents a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary. We propose two summarizers: an extractive summarizer and an abstractive one. As an additional contribution, we show how our abstractive summarizer can be modified to generate summaries tailored to a model of the user preferences that is solidly grounded in decision theory and can be effectively elicited from users. We have tested our framework in three user studies. In the first one, we compared the two summarizers. They performed equally well relative to each other quantitatively, while significantly outperforming a baseline standard approach to multidocument summarization. Trends in the results as well as qualitative comments from participants suggest that the summarizers have different strengths and weaknesses. After this initial user study, we realized that the diversity of opinions expressed in the corpus (i.e., its controversiality) might play a critical role in comparing abstraction versus extraction. To clearly pinpoint the role of controversiality, we ran a second user study in which we controlled for the degree of controversiality of the corpora that were summarized for the participants. The outcome of this study indicates that for evaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial. In the third user study we assessed the effectiveness of our user tailoring strategy. The results of this experiment confirm that user tailored summaries are more informative than untailored ones.

202 citations


Book ChapterDOI
01 Jan 2013
TL;DR: This paper gives a short overview of summarization methods and evaluation and the number of interesting summarization topics being proposed in different contexts by end users.
Abstract: Automatic text summarization, the computer-based production of condensed versions of documents, is an important technology for the information society. Without summaries it would be practically impossible for human beings to get access to the ever growing mass of information available online. Although research in text summarization is over 50 years old, some efforts are still needed given the insufficient quality of automatic summaries and the number of interesting summarization topics being proposed in different contexts by end users (“domain-specific summaries”, “opinion-oriented summaries”, “update summaries”, etc.). This paper gives a short overview of summarization methods and evaluation.

157 citations


Patent
31 Jan 2013
TL;DR: In this article, a search head is associated with one more indexers containing event records, and queries directed towards summarizing and reporting on event records may be received at the search head.
Abstract: Embodiments are directed are towards the transparent summarization of events. Queries directed towards summarizing and reporting on event records may be received at a search head. Search heads may be associated with one more indexers containing event records. The search head may forward the query to the indexers the can resolve the query for concurrent execution. If a query is a collection query, indexers may generate summarization information based on event records located on the indexers. Event record fields included in the summarization information may be determined based on terms included in the collection query. If a query is a stats query, each indexer may generate a partial result set from previously generated summarization information, returning the partial result sets to the search head. Collection queries may be saved and scheduled to run and periodically update the summarization information.

153 citations


Proceedings ArticleDOI
20 May 2013
TL;DR: A new topic modeling based approach to source code summarization is proposed, and via a study of 14 developers, source code summaries generated using the proposed technique are evaluated.
Abstract: During software evolution a developer must investigate source code to locate then understand the entities that must be modified to complete a change task. To help developers in this task, Haiduc et al. proposed text summarization based approaches to the automatic generation of class and method summaries, and via a study of four developers, they evaluated source code summaries generated using their techniques. In this paper we propose a new topic modeling based approach to source code summarization, and via a study of 14 developers, we evaluate source code summaries generated using the proposed technique. Our study partially replicates the original study by Haiduc et al. in that it uses the objects, the instruments, and a subset of the summaries from the original study, but it also expands the original study in that it includes more subjects and new summaries. The results of our study both support the findings of the original and provide new insights into the processes and criteria that developers use to evaluate source code summaries. Based on our results, we suggest future directions for research on source code summarization.

123 citations


Proceedings Article
Lu Wang1, Hema Raghavan2, Vittorio Castelli2, Radu Florian2, Claire Cardie1 
01 Aug 2013
TL;DR: This work considers the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization, and presents a sentence-compression-based framework, and designs a series of learning-based compression models built on parse trees.
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task.

123 citations


Proceedings Article
01 Jun 2013
TL;DR: G-FLOW is evaluated on Mechanical Turk, and it is found that it generates dramatically better summaries than an extractive summarizer based on a pipeline of state-of-the-art sentence selection and reordering components, underscoring the value of the joint model.
Abstract: This paper presents G-FLOW, a novel system for coherent extractive multi-document summarization (MDS). 1 Where previous work on MDS considered sentence selection and ordering separately, G-FLOW introduces a joint model for selection and ordering that balances coherence and salience. G-FLOW’s core representation is a graph that approximates the discourse relations across sentences based on indicators including discourse cues, deverbal nouns, co-reference, and more. This graph enables G-FLOW to estimate the coherence of a candidate summary. We evaluate G-FLOW on Mechanical Turk, and find that it generates dramatically better summaries than an extractive summarizer based on a pipeline of state-of-the-art sentence selection and reordering components, underscoring the value of our joint model.

112 citations


Proceedings Article
28 Jun 2013
TL;DR: This work proposes a search and summarization framework to extract relevant representative tweets from a time-ordered sample of tweets to generate a coherent and concise summary of an event.
Abstract: Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from a time-ordered sample of tweets to generate a coherent and concise summary of an event. We introduce two topic models that take advantage of temporal correlation in the data to extract relevant tweets for summarization. The summarization framework has been evaluated using Twitter data on four real-world events. Evaluations are performed using Wikipedia articles on the events as well as using Amazon Mechanical Turk (MTurk) with human readers (MTurkers). Both experiments show that the proposed models outperform traditional LDA and lead to informative summaries.

104 citations


Journal ArticleDOI
TL;DR: A novel and general-purpose graph-based summarizer is presented, namely G raph S um (Graph-based Summarizer), which discovers and exploits association rules to represent the correlations among multiple terms that have been neglected by previous approaches.

103 citations


Journal ArticleDOI
TL;DR: Experimental results provide strong evidence that the proposed optimization-based approach is a viable method for document summarization and an improved differential evolution algorithm is created to solve the optimization problem.
Abstract: This paper proposes an optimization-based model for generic document summarization. The model generates a summary by extracting salient sentences from documents. This approach uses the sentence-to-document collection, the summary-to-document collection and the sentence-to-sentence relations to select salient sentences from given document collection and reduce redundancy in the summary. To solve the optimization problem has been created an improved differential evolution algorithm. The algorithm can adjust crossover rate adaptively according to the fitness of individuals. We implemented the proposed model on multi-document summarization task. Experiments have been performed on DUC2002 and DUC2004 data sets. The experimental results provide strong evidence that the proposed optimization-based approach is a viable method for document summarization.

Proceedings ArticleDOI
03 Dec 2013
TL;DR: A novel solution to target-oriented sentiment summarization and SA of short informal texts with a main focus on Twitter posts known as "tweets" is introduced and it is shown that the hybrid polarity detection system not only outperforms the unigram state-of-the-art baseline, but also could be an advantage over other methods when used as a part of a sentiment summarizing system.
Abstract: Sentiment Analysis (SA) and summarization has recently become the focus of many researchers, because analysis of online text is beneficial and demanded in many different applications. One such application is product-based sentiment summarization of multi-documents with the purpose of informing users about pros and cons of various products. This paper introduces a novel solution to target-oriented (i.e. aspect-based) sentiment summarization and SA of short informal texts with a main focus on Twitter posts known as "tweets". We compare different algorithms and methods for SA polarity detection and sentiment summarization. We show that our hybrid polarity detection system not only outperforms the unigram state-of-the-art baseline, but also could be an advantage over other methods when used as a part of a sentiment summarization system. Additionally, we illustrate that our SA and summarization system exhibits a high performance with various useful functionalities and features.

Journal ArticleDOI
TL;DR: SumView is developed, a Web-based review summarization system, to automatically extract the most representative expressions and customer opinions in the reviews on various product features by selecting the most Representative review sentences for each extracted product feature.
Abstract: In this paper, we develop SumView, a Web-based review summarization system, to automatically extract the most representative expressions and customer opinions in the reviews on various product features. Different from existing review analysis which makes more efforts on sentiment classification and opinion mining, our system mainly focuses on summarization, i.e., delivering the majority of information contained in the review documents by selecting the most representative review sentences for each extracted product feature. Comprehensive case studies and experiments demonstrate the effectiveness of our system, and the user study shows users' satisfaction.

01 Jan 2013
TL;DR: This article proposed a bottom-up approach to arrange sentences extracted for multi-document summarization, where chronology, topical-closeness, precedence, and succession are integrated into a criterion by a supervised learning approach.
Abstract: Ordering information is a difficult but important task for applications generating natural language texts such as multi-document summarization, question answering, and concept-to-text generation. In multi-document summarization, information is selected from a set of source documents. However, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. Therefore, it is vital to properly order the information in multi-document summarization. We present a bottom-up approach to arrange sentences extracted for multi-document summarization. To capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. We evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. We introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task.

Journal ArticleDOI
TL;DR: A two-stage scene-based movie summarization method based on mining the relationship between role-communities, which achieves better subjective performance than attention-based and role-based summarization methods in terms of semantic content preservation for a movie summary.
Abstract: Video summarization techniques aim at condensing a full-length video to a significantly shortened version that still preserves the major semantic content of the original video. Movie summarization, being a special class of video summarization, is particularly challenging since a large variety of movie scenarios and film styles complicate the problem. In this paper, we propose a two-stage scene-based movie summarization method based on mining the relationship between role-communities since the role-communities in earlier scenes are usually used to develop the role relationship in later scenes. In the analysis stage, we construct a social network to characterize the interactions between role-communities. As a result, the social power of each role-community is evaluated by the community's centrality value and the role communities are clustered into relevant groups based on the centrality values. In the summarization stage, a set of feasible summary combinations of scenes is identified and an information-rich summary is selected from these candidates based on social power preservation. Our evaluation results show that in at most test cases the proposed method achieves better subjective performance than attention-based and role-based summarization methods in terms of semantic content preservation for a movie summary.

Proceedings ArticleDOI
04 Feb 2013
TL;DR: This work studies how user influence models, which project user interaction information onto a Twitter context tree, can help Twitter context summarization within a supervised learning framework and shows that pairwise user influence signals can significantly improve the task performance.
Abstract: Twitter has become one of the most popular platforms for users to share information in real time. However, as an individual tweet is short and lacks sufficient contextual information, users cannot effectively understand or consume information on Twitter, which can either make users less engaged or even detached from using Twitter. In order to provide informative context to a Twitter user, we propose the task of Twitter context summarization, which generates a succinct summary from a large but noisy Twitter context tree. Traditional summarization techniques only consider text information, which is insufficient for Twitter context summarization task, since text information on Twitter is very sparse. Given that there are rich user interactions in Twitter, we thus study how to improve summarization methods by leveraging such signals. In particular, we study how user influence models, which project user interaction information onto a Twitter context tree, can help Twitter context summarization within a supervised learning framework. To evaluate our methods, we construct a data set by asking human editors to manually select the most informative tweets as a summary. Our experimental results based on this editorial data set show that Twitter context summarization is a promising research topic and pairwise user influence signals can significantly improve the task performance.

Journal ArticleDOI
TL;DR: Evaluated on two 100-topic datasets, the summaries generated by the novel speech act-guided summarization approach outperform two kinds of representative extractive summaries and rival human-written summaries in terms of explanatoriness and informativeness.
Abstract: With the growth of the social media service of Twitter, automatic summarization of Twitter messages (tweets) is in urgent need for efficient processing of the massive tweeted information. Unlike multi-document summarization in general, Twitter topic summarization must handle the numerous, short, dissimilar, and noisy nature of tweets. To address this challenge, we propose a novel speech act-guided summarization approach in this work. Speech acts characterize tweeters' communicative behavior and provide an organized view of their messages. Speech act recognition is a multi-class classification problem, which we solve by using word-based and symbol-based features that capture both the linguistic features of speech acts and the particularities of Twitter text. The recognized speech acts in tweets are then used to direct the extraction of key words and phrases to fill in templates designed for speech acts. Leveraging high-ranking words and phrases as well as topic information for major speech acts, we propose a round-robin algorithm to generate template-based summaries. Different from the extractive method adopted in most previous works, our summarization method is abstractive. Evaluated on two 100-topic datasets, the summaries generated by our method outperform two kinds of representative extractive summaries and rival human-written summaries in terms of explanatoriness and informativeness.

Journal ArticleDOI
TL;DR: A novel summarizer, namely Yago-based Summarizer, that relies on an ontology-based evaluation and selection of the document sentences and an established entity recognition and disambiguation step based on the Yago ontology is integrated into the summarization process.
Abstract: Sentence-based multi-document summarization is the task of generating a succinct summary of a document collection, which consists of the most salient document sentences. In recent years, the increasing availability of semantics-based models (e.g., ontologies and taxonomies) has prompted researchers to investigate their usefulness for improving summarizer performance. However, semantics-based document analysis is often applied as a preprocessing step, rather than integrating the discovered knowledge into the summarization process. This paper proposes a novel summarizer, namely Yago-based Summarizer, that relies on an ontology-based evaluation and selection of the document sentences. To capture the actual meaning and context of the document sentences and generate sound document summaries, an established entity recognition and disambiguation step based on the Yago ontology is integrated into the summarization process. The experimental results, which were achieved on the DUC'04 benchmark collections, demonstrate the effectiveness of the proposed approach compared to a large number of competitors as well as the qualitative soundness of the generated summaries.

Journal ArticleDOI
01 Nov 2013
TL;DR: Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out.
Abstract: This article analyzes the appropriateness of a text summarization system, COMPENDIUM, for generating abstracts of biomedical papers. Two approaches are suggested: an extractive (COMPENDIUM"E), which only selects and extracts the most relevant sentences of the documents, and an abstractive-oriented one (COMPENDIUM"E"-"A), thus facing also the challenge of abstractive summarization. This novel strategy combines extractive information, with some pieces of information of the article that have been previously compressed or fused. Specifically, in this article, we want to study: i) whether COMPENDIUM produces good summaries in the biomedical domain; ii) which summarization approach is more suitable; and iii) the opinion of real users towards automatic summaries. Therefore, two types of evaluation were performed: quantitative and qualitative, for evaluating both the information contained in the summaries, as well as the user satisfaction. Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out. This also confirms the suitability of our suggested approach for generating summaries following an abstractive-oriented paradigm.

Book ChapterDOI
Qi Guo1, Fernando Diaz1, Elad Yom-Tov1
24 Mar 2013
TL;DR: This work presents the problem of updating users about time critical news events, and proposes a solution which incorporates techniques from information retrieval and multi-document summarization and introduces an evaluation method which is significantly less expensive than traditional approaches to temporal summarization.
Abstract: During unexpected events such as natural disasters, individuals rely on the information generated by news outlets to form their understanding of these events. This information, while often voluminous, is frequently degraded by the inclusion of unimportant, duplicate, or wrong information. It is important to be able to present users with only the novel, important information about these events as they develop. We present the problem of updating users about time critical news events, and focus on the task of deciding which information to select for updating users as an event develops. We propose a solution to this problem which incorporates techniques from information retrieval and multi-document summarization and evaluate this approach on a set of historic events using a large stream of news documents. We also introduce an evaluation method which is significantly less expensive than traditional approaches to temporal summarization.

Journal ArticleDOI
TL;DR: The findings of this work indicate that properly summarized learning content is not only able to satisfy learning achievements, but also able to align content size with the unique characteristics and affordances of mobile devices.
Abstract: Mobile learning benefits from the unique merits of mobile devices and mobile technology to give learners capability to access information anywhere and anytime. However, mobile learning also has many challenges, especially in the processing and delivery of learning content. With the aim of making the learning content suitable for the mobile environment, this study investigates automatic text summarization to provide a tool set that reduces the quantity of textual content for mobile learning support. Text summarization is used to condense texts into the most important ideas. However, reducing the amount of content transmitted may negatively impact the meaning conveyed within. Although many solutions of text summarization have been applied by intelligent tutoring systems for learning support, few of them have been quantitatively investigated for learning achievements of learners, especially in mobile learning context. This study focuses on a methodology for investigating the effectiveness of automatic text summarization used in mobile learning context. The experimental results demonstrate that our proposed summarization approach is able to generate summaries effectively, and those generated summaries are perceived as helpful to support mobile learning. The findings of this work indicate that properly summarized learning content is not only able to satisfy learning achievements, but also able to align content size with the unique characteristics and affordances of mobile devices.

Journal ArticleDOI
TL;DR: A novel approach that directly generates clusters integrated with ranking is proposed that is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.
Abstract: Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a number of topic themes with each theme represented by a cluster of highly related sentences, sentence clustering has been explored in the literature in order to provide more informative summaries. For each topic theme, the rank of terms conditional on this topic theme should be very distinct, and quite different from the rank of terms in other topic themes. Existing cluster-based summarization approaches apply clustering and ranking in isolation, which leads to incomplete, or sometimes rather biased, analytical results. A newly emerged framework uses sentence clustering results to improve or refine the sentence ranking results. Under this framework, we propose a novel approach that directly generates clusters integrated with ranking in this paper. The basic idea of the approach is that ranking distribution of sentences in each cluster should be quite different from each other, which may serve as features of clusters and new clustering measures of sentences can be calculated accordingly. Meanwhile, better clustering results can achieve better ranking results. As a result, ranking and clustering by mutually and simultaneously updating each other so that the performance of both can be improved. The effectiveness of the proposed approach is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.

Proceedings ArticleDOI
16 Dec 2013
TL;DR: This paper intends to investigate techniques and methods used by researchers for automatic text summarization, with special attention paid to Bio-inspired methods for text summarizing.
Abstract: The existence of the World Wide Web has caused an information explosion. Readers are overloaded with lengthy text documents where a shorter version would suffice. All computer users, be it professionals or novice users, are particularly affected by this predicament. There exists an urgent need for the discovery of knowledge embedded in digital documents. This paper intends to investigate techniques and methods used by researchers for automatic text summarization. Special attention is paid to Bio-inspired methods for text summarization.

Journal ArticleDOI
TL;DR: A study of researchers' preferences in selecting information from cited papers to include in a literature review, and the kinds of transformations and editing applied to the selected information.
Abstract: Purpose – This paper aims to report a study of researchers' preferences in selecting information from cited papers to include in a literature review, and the kinds of transformations and editing applied to the selected information.Design/methodology/approach – This is a part of a larger project to develop an automatic summarization method that emulates human literature review writing behaviour. Research questions were: how are literature reviews written – where do authors select information from, what types of information do they select and how do they transform it? What is the relationship between styles of literature review (integrative and descriptive) and each of these variables (source sections, types of information and types of transformation)? The authors analysed the literature review sections of 20 articles from the Journal of the American Society for Information Science and Technology, 2001‐2008, to answer these questions. Referencing sentences were mapped to 279 source papers to determine the s...

Journal ArticleDOI
TL;DR: The proposed method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary and outperforms the existing systems to which it is compared.
Abstract: Many previous research studies on extractive text summarization consider a subset of words in a document as keywords and use a sentence ranking function that ranks sentences based on their similarities with the list of extracted keywords. But the use of key concepts in automatic text summarization task has received less attention in literature on summarization. The proposed work uses key concepts identified from a document for creating a summary of the document. We view single-word or multi-word keyphrases of a document as the important concepts that a document elaborates on. Our work is based on the hypothesis that an extract is an elaboration of the important concepts to some permissible extent and it is controlled by the given summary length restriction. In other words, our method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary. To allow diverse information in the summary, for each important concept, we select one sentence that is the best possible elaboration of the concept. Accordingly, the most important concept will contribute first to the summary, then to the second best concept, and so on. To prove the effectiveness of our proposed summarization method, we have compared it to some state-of-the art summarization systems and the results show that the proposed method outperforms the existing systems to which it is compared.

Proceedings Article
01 Aug 2013
TL;DR: A series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.
Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.

Journal Article
TL;DR: For the blessing of World Wide Web, the corpus of online information is gigantic in its volume and search engines have been developed to retrieve specific information from this huge amount of data but the outcome of search engine is unable to provide expected result.
Abstract: For the blessing of World Wide Web, the corpus of online information is gigantic in its volume. Search engines have been developed such as Google, AltaVista, Yahoo, etc., to retrieve specific information from this huge amount of data. But the outcome of search engine is unable to provide expected result as the quantity of information is increasing enormously day by day and the findings are abundant. So, the automatic text summarization is demanded for salient information retrieval. Automatic text summarization is a system of summarizing text by computer where a text is given to the computer as input and the output is a shorter and less redundant form of the original text. An informative pr

Journal ArticleDOI
TL;DR: The design and evaluation of extractive summarization approach to assist the learners with reading difficulties and the results show significant improvement in readability for the target audience using assistive summary.

Journal ArticleDOI
TL;DR: A new multi-document summarization framework which combines rhetorical roles and corpus-based semantic analysis is proposed which is able to capture the semantic and rhetorical relationships between sentences so as to combine them to produce coherent summaries.
Abstract: In this paper, a new multi-document summarization framework which combines rhetorical roles and corpus-based semantic analysis is proposed. The approach is able to capture the semantic and rhetorical relationships between sentences so as to combine them to produce coherent summaries. Experiments were conducted on datasets extracted from web-based news using standard evaluation methods. Results show the promise of our proposed model as compared to state-of-the-art approaches.

Journal ArticleDOI
TL;DR: A novel Probabilistic-modeling Relevance, Coverage, and Novelty (PRCN) framework is proposed, which exploits a reference topic model incorporating user query for dependent relevance measurement and topic coverage is also modeled under this framework.
Abstract: Summarization plays an increasingly important role with the exponential document growth on the Web. Specifically, for query-focused summarization, there exist three challenges: (1) how to retrieve query relevant sentences; (2) how to concisely cover the main aspects (i.e., topics) in the document; and (3) how to balance these two requests. Specially for the issue relevance, many traditional summarization techniques assume that there is independent relevance between sentences, which may not hold in reality. In this paper, we go beyond this assumption and propose a novel Probabilistic-modeling Relevance, Coverage, and Novelty (PRCN) framework, which exploits a reference topic model incorporating user query for dependent relevance measurement. Along this line, topic coverage is also modeled under our framework. To further address the issues above, various sentence features regarding relevance and novelty are constructed as features, while moderate topic coverage are maintained through a greedy algorithm for topic balance. Finally, experiments on DUC2005 and DUC2006 datasets validate the effectiveness of the proposed method.