Showing papers on "Multi-document summarization published in 2017"

PDF

Open Access

Journal Article•DOI•

Recent automatic text summarization techniques: a survey

[...]

Mahak Gambhir¹, Vishal Gupta¹•Institutions (1)

University Institute of Engineering and Technology, Panjab University¹

01 Jan 2017-Artificial Intelligence Review

TL;DR: A comprehensive survey of recent text summarization extractive approaches developed in the last decade is presented and the discussion of useful future directions that can help researchers to identify areas where further research is needed are discussed.

...read moreread less

Abstract: As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.

...read moreread less

581 citations

Journal Article•DOI•

Opinion mining from online hotel reviews A text summarization approach

[...]

Ya Han Hu¹, Yen-Liang Chen², Hui-Ling Chou²•Institutions (2)

National Chung Cheng University¹, National Central University²

01 Mar 2017-Information Processing and Management

TL;DR: This study proposes a novel multi-text summarization technique for identifying the top-k most informative sentences of hotel reviews, and developed a new sentence importance metric.

...read moreread less

Abstract: Text summarization technique can extract essential information from online reviews.Our method can identify top-k most informative sentences from online hotel reviews.We jointly considered author, review time, usefulness, and opinion factors.Online hotel reviews were collected from TripAdvisor in experimental evaluation.The results show that our approach provides more comprehensive hotel information. Online travel forums and social networks have become the most popular platform for sharing travel information, with enormous numbers of reviews posted daily. Automatically generated hotel summaries could aid travelers in selecting hotels. This study proposes a novel multi-text summarization technique for identifying the top-k most informative sentences of hotel reviews. Previous studies on review summarization have primarily examined content analysis, which disregards critical factors like author credibility and conflicting opinions. We considered such factors and developed a new sentence importance metric. Both the content and sentiment similarities were used to determine the similarity of two sentences. To identify the top-k sentences, the k-medoids clustering algorithm was used to partition sentences into k groups. The medoids from these groups were then selected as the final summarization results. To evaluate the performance of the proposed method, we collected two sets of reviews for the two hotels posted on TripAdvisor.com. A total of 20 subjects were invited to review the text summarization results from the proposed approach and two conventional approaches for the two hotels. The results indicate that the proposed approach outperforms the other two, and most of the subjects believed that the proposed approach can provide more comprehensive hotel information.

...read moreread less

243 citations

Proceedings Article•DOI•

Graph-based Neural Multi-Document Summarization

[...]

Michihiro Yasunaga¹, Rui Zhang, Kshitijh Meelu¹, Ayush Pareek², Krishnan Srinivasan¹, Dragomir R. Radev¹ - Show less +2 more•Institutions (2)

Yale University¹, LNM Institute of Information Technology²

01 Aug 2017

TL;DR: The authors employed a Graph Convolutional Network (GCNets) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features for saliency estimation.

...read moreread less

Abstract: We propose a neural multi-document summarization system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences that avoid redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon other traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.

...read moreread less

148 citations

Journal Article•DOI•

Recent advances in document summarization

[...]

Jin-ge Yao¹, Xiaojun Wan¹, Jianguo Xiao¹•Institutions (1)

Peking University¹

01 Nov 2017-Knowledge and Information Systems

TL;DR: Significant contributions made in recent years are emphasized, including progress on modern sentence extraction approaches that improve concept coverage, information diversity and content coherence, as well as attempts from summarization frameworks that integrate sentence compression, and more abstractive systems that are able to produce completely new sentences.

...read moreread less

Abstract: The task of automatic document summarization aims at generating short summaries for originally long documents. A good summary should cover the most important information of the original document or a cluster of documents, while being coherent, non-redundant and grammatically readable. Numerous approaches for automatic summarization have been developed to date. In this paper we give a self-contained, broad overview of recent progress made for document summarization within the last 5 years. Specifically, we emphasize on significant contributions made in recent years that represent the state-of-the-art of document summarization, including progress on modern sentence extraction approaches that improve concept coverage, information diversity and content coherence, as well as attempts from summarization frameworks that integrate sentence compression, and more abstractive systems that are able to produce completely new sentences. In addition, we review progress made for document summarization in domains, genres and applications that are different from traditional settings. We also point out some of the latest trends and highlight a few possible future directions.

...read moreread less

143 citations

Posted Content•

Text Summarization Techniques: A Brief Survey

[...]

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys J. Kochut - Show less +3 more

07 Jul 2017-arXiv: Computation and Language

TL;DR: The main approaches to automatic text summarization are described and the effectiveness and shortcomings of the different methods are described.

...read moreread less

Abstract: In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.

...read moreread less

142 citations

Proceedings Article•DOI•

A survey on extractive text summarization

[...]

N. Moratanch¹, S. Chitrakala¹•Institutions (1)

Anna University¹

01 Jan 2017

TL;DR: This paper interprets extractive text summarization methods with a less redundant summary, highly adhesive, coherent and depth information.

...read moreread less

Abstract: Text Summarization is the process of obtaining salient information from an authentic text document. In this technique, the extracted information is achieved as a summarized report and conferred as a concise summary to the user. It is very crucial for humans to understand and to describe the content of the text. Text Summarization techniques are classified into abstractive and extractive summarization. The extractive summarization technique focuses on choosing how paragraphs, important sentences, etc produces the original documents in precise form. The implication of sentences is determined based on linguistic and statistical features. In this work, a comprehensive review of extractive text summarization process methods has been ascertained. In this paper, the various techniques, populous benchmarking datasets and challenges of extractive summarization have been reviewed. This paper interprets extractive text summarization methods with a less redundant summary, highly adhesive, coherent and depth information.

...read moreread less

142 citations

Posted Content•

Faithful to the Original: Fact Aware Neural Abstractive Summarization

[...]

Ziqiang Cao¹, Furu Wei², Wenjie Li¹, Sujian Li³•Institutions (3)

Hong Kong Polytechnic University¹, Microsoft², Peking University³

13 Nov 2017-arXiv: Information Retrieval

TL;DR: This work argues that faithfulness is also a vital prerequisite for a practical abstractive summarization system and proposes a dual-attention sequence-to-sequence framework to force the generation conditioned on both the source text and the extracted fact descriptions.

...read moreread less

Abstract: Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

...read moreread less

132 citations

Journal Article•DOI•

Word-sentence co-ranking for automatic extractive text summarization

[...]

Changjian Fang¹, Dejun Mu¹, Zhenghong Deng¹, Zhiang Wu²•Institutions (2)

Northwestern Polytechnical University¹, Nanjing University of Finance and Economics²

15 Apr 2017-Expert Systems With Applications

TL;DR: A novel word-sentence co-ranking model named CoRank is proposed, which combines the word- Sentence relationship with the graph-based unsupervised ranking model and can serve as an important building-block of the intelligent summarization systems.

...read moreread less

Abstract: A principled word-sentence co-ranking model called CoRank is proposed.The convergence of CoRank with matrix notation is proved.A redundancy elimination technique is presented to further improve the performance of CoRank. Extractive summarization aims to automatically produce a short summary of a document by concatenating several sentences taken exactly from the original material. Due to its simplicity and easy-to-use, the extractive summarization methods have become the dominant paradigm in the realm of text summarization. In this paper, we address the sentence scoring technique, a key step of the extractive summarization. Specifically, we propose a novel word-sentence co-ranking model named CoRank, which combines the word-sentence relationship with the graph-based unsupervised ranking model. CoRank is quite concise in the view of matrix operations, and its convergence can be theoretically guaranteed. Moreover, a redundancy elimination technique is presented as a supplement to CoRank, so that the quality of automatic summarization can be further enhanced. As a result, CoRank can serve as an important building-block of the intelligent summarization systems. Experimental results on two real-life datasets including nearly 600 documents demonstrate the effectiveness of the proposed methods.

...read moreread less

96 citations

Posted Content•

Graph-based Neural Multi-Document Summarization

[...]

Michihiro Yasunaga¹, Rui Zhang, Kshitijh Meelu¹, Ayush Pareek², Krishnan Srinivasan¹, Dragomir R. Radev¹ - Show less +2 more•Institutions (2)

Yale University¹, LNM Institute of Information Technology²

20 Jun 2017-arXiv: Computation and Language

TL;DR: This model improves upon other traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.

...read moreread less

Abstract: We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.

...read moreread less

84 citations

Journal Article•DOI•

Cat swarm optimization based evolutionary framework for multi document summarization

[...]

Rasmita Rautray¹, Rakesh Chandra Balabantaray²•Institutions (2)

Siksha O Anusandhan University¹, Indian Institutes of Information Technology²

01 Jul 2017-Physica A-statistical Mechanics and Its Applications

TL;DR: This study proposes a novel Cat Swarm Optimization (CSO) based multi document summarizer that outperforms the other summarizers included in the study and demonstrates non-redundancy, cohesiveness and readability of the summary respectively.

...read moreread less

Abstract: Today, World Wide Web has brought us enormous quantity of on-line information. As a result, extracting relevant information from massive data has become a challenging issue. In recent past text summarization is recognized as one of the solution to extract useful information from vast amount documents. Based on number of documents considered for summarization, it is categorized as single document or multi document summarization. Rather than single document, multi document summarization is more challenging for the researchers to find accurate summary from multiple documents. Hence in this study, a novel Cat Swarm Optimization (CSO) based multi document summarizer is proposed to address the problem of multi document summarization. The proposed CSO based model is also compared with two other nature inspired based summarizer such as Harmony Search (HS) based summarizer and Particle Swarm Optimization (PSO) based summarizer. With respect to the benchmark Document Understanding Conference (DUC) datasets, the performance of all algorithms are compared in terms of different evaluation metrics such as ROUGE score, F score, sensitivity, positive predicate value, summary accuracy, inter sentence similarity and readability metric to validate non-redundancy, cohesiveness and readability of the summary respectively. The experimental analysis clearly reveals that the proposed approach outperforms the other summarizers included in the study.

...read moreread less

65 citations

Proceedings Article•

Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization

[...]

Piji Li¹, Zihao Wang¹, Wai Lam¹, Zhaochun Ren², Lidong Bing³ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, University College London², Tencent³

12 Feb 2017

TL;DR: An unsupervised data reconstruction framework is proposed, which jointly considers the reconstruction for latent semantic space and observed term vector space and can capture the salience of sentences from these two different and complementary vector spaces.

...read moreread less

Abstract: We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural variational inference is used for the posterior inference of the latent variables. For salience estimation, we propose an unsupervised data reconstruction framework, which jointly considers the reconstruction for latent semantic space and observed term vector space. Therefore, we can capture the salience of sentences from these two different and complementary vector spaces. Thereafter, the VAEs-based latent semantic model is integrated into the sentence salience estimation component in a unified fashion, and the whole framework can be trained jointly by back-propagation via multi-task learning. Experimental results on the benchmark datasets DUC and TAC show that our framework achieves better performance than the state-of-the-art models.

...read moreread less

Journal Article•DOI•

Scientific document summarization via citation contextualization and scientific discourse

[...]

Arman Cohan¹, Nazli Goharian¹•Institutions (1)

Georgetown University¹

12 Jun 2017-arXiv: Computation and Language

TL;DR: This work presents a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure, and proposes three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning.

...read moreread less

Abstract: The rapid growth of scientific literature has made it difficult for the researchers to quickly learn about the developments in their respective fields. Scientific document summarization addresses this challenge by providing summaries of the important contributions of scientific papers. We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure. Citation texts often lack the evidence and context to support the content of the cited paper and are even sometimes inaccurate. We first address the problem of inaccuracy of the citation texts by finding the relevant context from the cited paper. We propose three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning. We then train a model to identify the discourse facets for each citation. We finally propose a method for summarizing scientific papers by leveraging the faceted citations and their corresponding contexts. We evaluate our proposed method on two scientific summarization datasets in the biomedical and computational linguistics domains. Extensive evaluation results show that our methods can improve over the state of the art by large margins.

...read moreread less

Journal Article•DOI•

A topic modeling based approach to novel document automatic summarization

[...]

Zongda Wu¹, Lei Li¹, Guiling Li², Hui Huang¹, Chengren Zheng¹, Enhong Chen³, Guandong Xu⁴ - Show less +3 more•Institutions (4)

Wenzhou University¹, China University of Geosciences (Wuhan)², University of Science and Technology of China³, University of Technology, Sydney⁴

30 Oct 2017-Expert Systems With Applications

TL;DR: The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by the proposed topic modeling based approach has not only a higher compression ratio, but also better summarization quality.

...read moreread less

Abstract: A topic model based approach for novel summarization is proposed.An importance evaluation function for sentence candidates is designed.A summary smoothing approach is presented to improve the summary readability. Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality.

...read moreread less

Proceedings Article•DOI•

Unsupervised Query-Focused Multi-Document Summarization using the Cross Entropy Method

[...]

Guy Feigenblat¹, Haggai Roitman¹, Odellia Boni¹, David Konopnicki¹•Institutions (1)

IBM¹

07 Aug 2017

TL;DR: A novel unsupervised query-focused multi-document summarization approach that generates a summary by extracting a subset of sentences using the Cross-Entropy (CE) Method is presented.

...read moreread less

Abstract: We present a novel unsupervised query-focused multi-document summarization approach. To this end, we generate a summary by extracting a subset of sentences using the Cross-Entropy (CE) Method. The proposed approach is generic and requires no domain knowledge. Using an evaluation over DUC 2005-2007 datasets with several other state-of-the-art baseline methods, we demonstrate that, our approach is both effective and efficient.

...read moreread less

Proceedings Article•

Improving multi-document summarization via text classification

[...]

Ziqiang Cao¹, Wenjie Li¹, Sujian Li², Furu Wei³•Institutions (3)

Hong Kong Polytechnic University¹, Peking University², Microsoft³

12 Feb 2017

TL;DR: This paper proposed a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization, and also utilizes the classification results to produce summaries of different styles.

...read moreread less

Abstract: Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.

...read moreread less

Journal Article•DOI•

A survey automatic text summarization

[...]

Oguzhan Tas, Farzad Kiyani

30 Jun 2017

TL;DR: Comparison study of various text summarization techniques is given, used to understanding the main concepts in a given document and then expresses those concepts in clear natural language.

...read moreread less

Abstract: Text summarization is compress the source text into a diminished version conserving its information content and overall meaning. Because of the great amount of the information we are provided it and thanks to development of Internet Technologies, text summarization has become an important tool for interpreting text information. Text summarization methods can be classified into extractive and abstractive summarization. An extractive summarization method involves selecting sentences of high rank from the document based on word and sentence features and put them together to generate summary. The importance of the sentences is decided based on statistical and linguistic features of sentences. An abstractive summarization is used to understanding the main concepts in a given document and then expresses those concepts in clear natural language. In this paper, gives comparative study of various text summarization techniques.

...read moreread less

Proceedings Article•DOI•

Coarse-to-Fine Attention Models for Document Summarization

[...]

Jeffrey Ling¹, Alexander M. Rush²•Institutions (2)

Google¹, Facebook²

14 Jul 2017

TL;DR: A novel coarse-to-fine attention model that hierarchically reads a document, using coarse attention to select top-level chunks of text and fine attention to read the words of the chosen chunks, which achieves the desired behavior of sparsely attending to subsets of the document for generation.

...read moreread less

Abstract: Sequence-to-sequence models with attention have been successful for a variety of NLP problems, but their speed does not scale well for tasks with long source sequences such as document summarization. We propose a novel coarse-to-fine attention model that hierarchically reads a document, using coarse attention to select top-level chunks of text and fine attention to read the words of the chosen chunks. While the computation for training standard attention models scales linearly with source sequence length, our method scales with the number of top-level chunks and can handle much longer sequences. Empirically, we find that while coarse-to-fine attention models lag behind state-of-the-art baselines, our method achieves the desired behavior of sparsely attending to subsets of the document for generation.

...read moreread less

Journal Article•DOI•

Opinion summarization methods

[...]

Roque Enrique Lpez Condori, Thiago Alexandre Salgueiro Pardo

15 Jul 2017-Expert Systems With Applications

TL;DR: This paper investigates to generate extractive and abstractive summaries of opinions, and studies some well-known methods in the area and compares them, and develops new methods that consider the main advantages of the ones before.

...read moreread less

Abstract: Research about some aspect-based opinion summarization methods.A new content selection strategy to produce extractive summaries is proposed.A novel NLG template-based system to generate abstractive summaries is proposed.Extractive and abstractive opinion summarization methods are compared. In the last years, the opinion summarization task has gained much importance because of the large amount of online information and the increasing interest in learning the user evaluation about products, services, companies, and people. Although there are many works in this area, there is room for improvement, as the results are far from ideal. In this paper, we present our investigations to generate extractive and abstractive summaries of opinions. We study some well-known methods in the area and compare them. Besides using these methods, we also develop new methods that consider the main advantages of the ones before. We evaluate them according to three traditional summarization evaluation measures: informativeness, linguistic quality, and utility of the summary. We show that we produce interesting results and that our methods outperform some methods from literature.

...read moreread less

Journal Article•DOI•

Perceptual Video Summarization—A New Framework for Video Summarization

[...]

Sinnu Susan Thomas¹, Sumana Gupta¹, Venkatesh K. Subramanian¹•Institutions (1)

Indian Institute of Technology Kanpur¹

01 Aug 2017-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper introduces for the first time the features of the human visual system within the summarization framework itself to allow for the emphasis of perceptually significant events while simultaneously eliminating perceptual redundancy from the summaries.

...read moreread less

Abstract: The enormous growth of video content in recent times has raised the need to abbreviate the content for human consumption. Thus, there is a need for summaries of a quality that meets the requirements of human users. This also means that the summarization must incorporate the peculiar features of human perception. We present a new framework for video summarization in this paper. Unlike many available summarization algorithms that utilize only statistical redundancy, we introduce for the first time the features of the human visual system within the summarization framework itself to allow for the emphasis of perceptually significant events while simultaneously eliminating perceptual redundancy from the summaries. The subjective and objective evaluation scores have evaluated the framework.

...read moreread less

Proceedings Article•

Active Video Summarization: Customized Summaries via On-line Interaction with the User

[...]

Ana Garcia del Molino¹, Xavier Boix², Joo-Hwee Lim¹, Ah-Hwee Tan³•Institutions (3)

Institute for Infocomm Research Singapore¹, Massachusetts Institute of Technology², Nanyang Technological University³

12 Feb 2017

TL;DR: This paper introduces Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary, and introduces a new dataset for customized video summarization (CSumm).

...read moreread less

Abstract: To facilitate the browsing of long videos, automatic video summarization provides an excerpt that represents its content. In the case of egocentric and consumer videos, due to their personal nature, adapting the summary to specific user's preferences is desirable. Current approaches to customizable video summarization obtain the user's preferences prior to the summarization process. As a result, the user needs to manually modify the summary to further meet the preferences. In this paper, we introduce Active Video Summarization (AVS), an interactive approach to gather the user's preferences while creating the summary. AVS asks questions about the summary to update it on-line until the user is satisfied. To minimize the interaction, the best segment to inquire next is inferred from the previous feedback. We evaluate AVS in the commonly used UTEgo dataset. We also introduce a new dataset for customized video summarization (CSumm) recorded with a Google Glass. The results show that AVS achieves an excellent compromise between usability and quality. In 41% of the videos, AVS is considered the best over all tested baselines, including summaries manually generated. Also, when looking for specific events in the video, AVS provides an average level of satisfaction higher than those of all other baselines after only six questions to the user.

...read moreread less

Journal Article•DOI•

Extractive Multi-Document Summarization Using population-based Multicriteria Optimization

[...]

Ansamma John¹, P. S. Premjith, M. Wilscy¹•Institutions (1)

University of Kerala¹

15 Nov 2017-Expert Systems With Applications

TL;DR: This paper proposes a population-based multicriteria optimization method with multiple objective functions which generates an extractive generic summary with maximum relevance and minimum redundancy by representing each sentence of the input document as a vector of words in Proper Noun,Noun, Verb and Adjective set.

...read moreread less

Abstract: Multi-document summarization is the process of extracting salient information from a set of source texts and present that information to the user in a condensed form. In this paper, we propose a multi-document summarization system which generates an extractive generic summary with maximum relevance and minimum redundancy by representing each sentence of the input document as a vector of words in Proper Noun, Noun, Verb and Adjective set. Five features, such as TF_ISF, Aggregate Cross Sentence Similarity, Title Similarity, Proper Noun and Sentence Length associated with the sentences, are extracted, and scores are assigned to sentences based on these features. Weights that can be assigned to different features may vary depending upon the nature of the document, and it is hard to discover the most appropriate weight for each feature, and this makes generation of a good summary a very tough task without human intelligence. Multi-document summarization problem is having large number of decision parameters and number of possible solutions from which most optimal summary is to be generated. Summary generated may not guarantee the essential quality and may be far from the ideal human generated summary. To address this issue, we propose a population-based multicriteria optimization method with multiple objective functions. Three objective functions are selected to determine an optimal summary, with maximum relevance, diversity, and novelty, from a global population of summaries by considering both the statistical and semantic aspects of the documents. Semantic aspects are considered by Latent Semantic Analysis (LSA) and Non Negative Matrix Factorization (NMF) techniques. Experiments have been performed on DUC 2002, DUC 2004 and DUC 2006 datasets using ROUGE tool kit. Experimental results show that our system outperforms the state of the art works in terms of Recall and Precision.

...read moreread less

Journal Article•DOI•

Linguistic summarization of event logs A practical approach

[...]

Remco M. Dijkman¹, Anna Wilbik¹•Institutions (1)

Eindhoven University of Technology¹

01 Jul 2017-Information Systems

TL;DR: A novel technique for linguistic summarization of event logs is presented, which generates linguistic summary that are concise enough to be used in a practical setting, while at the same time enriching the summaries that are produced by also enabling conjunctive statements.

...read moreread less

Proceedings Article•

Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization

[...]

Tobias Falke¹, Christian M. Meyer², Iryna Gurevych²•Institutions (2)

FernUniversität Hagen¹, Technische Universität Darmstadt²

01 Nov 2017

TL;DR: A new model for concept-map-based multi-document summarization is proposed that learns to identify and merge coreferent concepts to reduce redundancy, determines their importance with a strong supervised model and finds an optimal summary concept map via integer linear programming.

...read moreread less

Abstract: Concept-map-based multi-document summarization is a variant of traditional summarization that produces structured summaries in the form of concept maps. In this work, we propose a new model for the task that addresses several issues in previous methods. It learns to identify and merge coreferent concepts to reduce redundancy, determines their importance with a strong supervised model and finds an optimal summary concept map via integer linear programming. It is also computationally more efficient than previous methods, allowing us to summarize larger document sets. We evaluate the model on two datasets, finding that it outperforms several approaches from previous work.

...read moreread less

Proceedings Article•DOI•

Extract with Order for Coherent Multi-Document Summarization

[...]

Mir Tafseer Nayeem¹, Yllias Chali¹•Institutions (1)

University of Lethbridge¹

01 Jun 2017

TL;DR: A rank based sentence selection using continuous vector representations along with key-phrases is implemented and a model to tackle summary coherence for increasing readability is proposed.

...read moreread less

Abstract: In this work, we aim at developing an extractive summarizer in the multi-document setting. We implement a rank based sentence selection using continuous vector representations along with key-phrases. Furthermore, we propose a model to tackle summary coherence for increasing readability. We conduct experiments on the Document Understanding Conference (DUC) 2004 datasets using ROUGE toolkit. Our experiments demonstrate that the methods bring significant improvements over the state of the art methods in terms of informativity and coherence.

...read moreread less

Journal Article•DOI•

Automatic Arabic Summarization: A survey of methodologies and systems

[...]

Lamees Mahmoud Mohd Said Al Qassem¹, Di Wang¹, Zaid Al Mahmoud¹, Hassan Barada¹, Ahmad Al-Rubaie¹, Nawaf Almoosa¹ - Show less +2 more•Institutions (1)

Khalifa University¹

01 Jan 2017-Procedia Computer Science

TL;DR: The main challenges for Arabic text summarization are described and the various methodologies and systems in the literature are surveyed and this survey would be a good basis for the design of an Arabic automaticText summarization that combines the various “good” features of the existing systems and dismiss the “not-so-good’ features.

...read moreread less

Proceedings Article•DOI•

Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization

[...]

Maxime Peyrard¹, Judith Eckle-Kohler²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Technische Universität Darmstadt²

01 Jul 2017

TL;DR: A new supervised framework is presented that learns to estimate automatic Pyramid scores and uses them for optimization-based extractive multi-document summarization and there is much room for improvement in comparison with the upper-bound for automatic Pyramid.

...read moreread less

Abstract: We present a new supervised framework that learns to estimate automatic Pyramid scores and uses them for optimization-based extractive multi-document summarization. For learning automatic Pyramid scores, we developed a method for automatic training data generation which is based on a genetic algorithm using automatic Pyramid as the fitness function. Our experimental evaluation shows that our new framework significantly outperforms strong baselines regarding automatic Pyramid, and that there is much room for improvement in comparison with the upper-bound for automatic Pyramid.

...read moreread less

Journal Article•DOI•

Density peaks clustering based integrate framework for multi-document summarization

[...]

Baoyan Wang¹, Jian Zhang², Yi Liu³, Yuexian Zou¹•Institutions (3)

Peking University¹, Dongguan University of Technology², Hong Kong University of Science and Technology³

01 Mar 2017-CAAI Transactions on Intelligence Technology

TL;DR: A novel unsupervised integrated score framework to generate generic extractive multi-document summaries by ranking sentences based on dynamic programming (DP) strategy that takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively.

...read moreread less

Proceedings Article•DOI•

Movie review summarization and sentiment analysis using rapidminer

[...]

Alaa F. Alsaqer¹, Sreela Sasi¹•Institutions (1)

Gannon University¹

20 Jul 2017

TL;DR: This research focuses on improving the summarization accuracy with sentiment analysis of movie review posts using RapidMiner operators using the Aylien Text Analysis extension.

...read moreread less

Abstract: Automatic text summarization is one of the important challenges of natural language tasks. It will help the readers save time to get the important information from a lengthy document automatically. Sentiment Analysis is the process of identifying and categorizing opinions expressed in a piece of text computationally to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral. This research focuses on improving the summarization accuracy with sentiment analysis of movie review posts using RapidMiner operators. The first model of summarization is built using the Aylien Text Analysis extension. The proposed second model is built using the Text Processing extension. For both these methods the sentiment analysis is done using the same Aylien Text Analysis extension for evaluating the summarization results. An accuracy of 90% is achieved for sentiment analysis using the first model and 96% for the second model.

...read moreread less

Journal Article•DOI•

Twitter summarization with social-temporal context

[...]

Ruifang He¹, Yang Liu², Guangchuan Yu¹, Jiliang Tang³, Qinghua Hu¹, Jianwu Dang¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Peking University², Arizona State University³

01 Mar 2017-World Wide Web

TL;DR: This paper provides a methodology to model temporal context globally and locally, and proposes a novel unsupervised summarization framework with social-temporal context for Twitter data, to assess the proposed framework and demonstrate the importance of social- Temporal context in Twitter summarization.

...read moreread less

Abstract: Twitter is one of the most popular social media platforms for online users to create and share information. Tweets are short, informal, and large-scale, which makes it difficult for online users to find reliable and useful information, arising the problem of Twitter summarization. On the one hand, tweets are short and highly unstructured, which makes traditional document summarization methods difficult to handle Twitter data. On the other hand, Twitter provides rich social-temporal context beyond texts, bringing about new opportunities. In this paper, we investigate how to exploit social-temporal context for Twitter summarization. In particular, we provide a methodology to model temporal context globally and locally, and propose a novel unsupervised summarization framework with social-temporal context for Twitter data. To assess the proposed framework, we manually label a real-world Twitter dataset. Experimental results from the dataset demonstrate the importance of social-temporal context in Twitter summarization.

...read moreread less

Proceedings Article•DOI•

Interactive Abstractive Summarization for Event News Tweets.

[...]

Ori Shapira, Hadar Ronen¹, Meni Adler¹, Yael Amsterdamer¹, Judit Bar-Ilan, Ido Dagan¹ - Show less +2 more•Institutions (1)

Bar-Ilan University¹

01 Sep 2017

TL;DR: A novel interactive summarization system that is based on abstractive summarization, derived from a recent consolidated knowledge representation for multiple texts, providing a bullet-style summary while allowing to attain the most important information first and interactively drill down to more specific details.

...read moreread less

Abstract: We present a novel interactive summarization system that is based on abstractive summarization, derived from a recent consolidated knowledge representation for multiple texts. We incorporate a couple of interaction mechanisms, providing a bullet-style summary while allowing to attain the most important information first and interactively drill down to more specific details. A usability study of our implementation, for event news tweets, suggests the utility of our approach for text exploration.

...read moreread less