Showing papers on "Multi-document summarization published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

[...]

Logan Lebanoff¹, Kaiqiang Song¹, Fei Liu¹•Institutions (1)

01 Aug 2018

TL;DR: The authors exploited the maximal marginal relevance method to select representative sentences from multi-document input, and leveraged an abstractive encoder-decoder model to fuse disparate sentences to generate abstractive summary.

...read moreread less

Abstract: Generating a text abstract from a set of documents remains a challenging task. The neural encoder-decoder framework has recently been exploited to summarize single documents, but its success can in part be attributed to the availability of large parallel data automatically acquired from the Web. In contrast, parallel data for multi-document summarization are scarce and costly to obtain. There is a pressing need to adapt an encoder-decoder model trained on single-document summarization data to work with multiple-document input. In this paper, we present an initial investigation into a novel adaptation method. It exploits the maximal marginal relevance method to select representative sentences from multi-document input, and leverages an abstractive encoder-decoder model to fuse disparate sentences to an abstractive summary. The adaptation method is robust and itself requires no training data. Our system compares favorably to state-of-the-art extractive and abstractive approaches judged by automatic metrics and human assessors.

...read moreread less

119 citations

Posted Content•

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

[...]

Logan Lebanoff¹, Kaiqiang Song¹, Fei Liu¹•Institutions (1)

University of Central Florida¹

19 Aug 2018-arXiv: Computation and Language

TL;DR: An initial investigation into a novel adaptation method that exploits the maximal marginal relevance method to select representative sentences from multi-document input, and leverages an abstractive encoder-decoder model to fuse disparate sentences to an Abstractive summary.

...read moreread less

88 citations

Proceedings Article•

Abstract Meaning Representation for Multi-Document Summarization

[...]

Kexin Liao, Logan Lebanoff¹, Fei Liu¹•Institutions (1)

University of Central Florida¹

01 Aug 2018

TL;DR: In this article, the authors study the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation.

...read moreread less

Abstract: Generating an abstract from a collection of documents is a desirable capability for many real-world applications. However, abstractive approaches to multi-document summarization have not been thoroughly investigated. This paper studies the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation. Our approach condenses source documents to a set of summary graphs following the AMR formalism. The summary graphs are then transformed to a set of summary sentences in a surface realization step. The framework is fully data-driven and flexible. Each component can be optimized independently using small-scale, in-domain training data. We perform experiments on benchmark summarization datasets and report promising results. We also describe opportunities and challenges for advancing this line of research.

...read moreread less

72 citations

Journal Article•DOI•

The challenging task of summary evaluation: an overview

[...]

Elena Lloret¹, Laura Plaza², Ahmet Aker³•Institutions (3)

University of Alicante¹, National University of Distance Education², University of Duisburg-Essen³

01 Mar 2018

TL;DR: A clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.

...read moreread less

Abstract: Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.

...read moreread less

72 citations

Journal Article•DOI•

An evolutionary framework for multi document summarization using Cuckoo search approach: MDSCSA

[...]

Rasmita Rautray¹, Rakesh Chandra Balabantaray²•Institutions (2)

Siksha O Anusandhan University¹, Indian Institutes of Information Technology²

01 Jul 2018-Applied Computing and Informatics

TL;DR: A novel Cuckoo search based multi- document summarizer (MDSCSA) is proposed to address the problem of multi-document summarization and clearly reveals that the proposed approach outperforms the other summarizers included in this study.

...read moreread less

61 citations

Journal Article•DOI•

Extractive multi-document summarization using multilayer networks

[...]

Jorge A. V. Tohalino¹, Diego R. Amancio², Diego R. Amancio¹•Institutions (2)

University of São Paulo¹, Indiana University²

01 Aug 2018-Physica A-statistical Mechanics and Its Applications

TL;DR: This work evaluates the performance of a multilayer-based method to select the most relevant sentences in the context of an extractive multi document summarization (MDS) task and makes a distinction between edges linking sentences from different documents (inter-layer) and those connecting sentences from the same document (intra-layer).

...read moreread less

Abstract: Huge volumes of textual information has been produced every single day. In order to organize and understand such large datasets, in recent years, summarization techniques have become popular. These techniques aims at finding relevant, concise and non-redundant content from such a big data. While network methods have been adopted to model texts in some scenarios, a systematic evaluation of multilayer network models in the multi-document summarization task has been limited to a few studies. Here, we evaluate the performance of a multilayer-based method to select the most relevant sentences in the context of an extractive multi document summarization (MDS) task. In the adopted model, nodes represent sentences and edges are created based on the number of shared words between sentences. Differently from previous studies in multi-document summarization, we make a distinction between edges linking sentences from different documents (inter-layer) and those connecting sentences from the same document (intra-layer). As a proof of principle, our results reveal that such a discrimination between intra- and inter-layer in a multilayered representation is able to improve the quality of the generated summaries. This piece of information could be used to improve current statistical methods and related textual models.

...read moreread less

59 citations

Proceedings Article•

Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

[...]

Mir Tafseer Nayeem¹, Tanvir Ahmed Fuad¹, Yllias Chali¹•Institutions (1)

University of Lethbridge¹

01 Aug 2018

TL;DR: A paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level is designed which improves the information coverage and at the same time abstractiveness of the generated sentences.

...read moreread less

Abstract: In this work, we aim at developing an unsupervised abstractive summarization system in the multi-document setting. We design a paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level. Our model improves the information coverage and at the same time abstractiveness of the generated sentences. We conduct our experiments on the human-generated multi-sentence compression datasets and evaluate our system on several newly proposed Machine Translation (MT) evaluation metrics. Furthermore, we apply our sentence level model to implement an abstractive multi-document summarization system where documents usually contain a related set of sentences. We also propose an optimal solution for the classical summary length limit problem which was not addressed in the past research. For the document level summary, we conduct experiments on the datasets of two different domains (e.g., news article and user reviews) which are well suited for multi-document abstractive summarization. Our experiments demonstrate that the methods bring significant improvements over the state-of-the-art methods.

...read moreread less

58 citations

Journal Article•DOI•

Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

[...]

Kokil Jaidka¹, Muthu Kumar Chandrasekaran², Sajal Rustagi³, Min-Yen Kan²•Institutions (3)

University of Pennsylvania¹, National University of Singapore², Indian Institute of Technology Roorkee³

01 Sep 2018-International Journal on Digital Libraries

TL;DR: The participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey, are described.

...read moreread less

Abstract: We describe the participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey. CL-SciSumm is the first medium-scale Shared Task on scientific document summarization in the computational linguistics (CL) domain. Participants were provided a training corpus of 30 topics, each comprising of a reference paper (RP) and 10 or more citing papers, all of which cite the RP. For each citation, the text spans (i.e., citances) that pertain to the RP have been identified. Participants solved three sub-tasks in automatic research paper summarization using this text corpus. Fifteen teams from six countries registered for the Shared Task, of which ten teams ultimately submitted and presented their results. The annotated corpus comprised 30 target papers--currently the largest available corpora of its kind. The corpus is available for free download and use at https://github.com/WING-NUS/scisumm-corpus.

...read moreread less

45 citations

Proceedings Article•DOI•

Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study.

[...]

Jianmin Zhang¹, Jiwei Tan², Xiaojun Wan¹•Institutions (2)

Peking University¹, Alibaba Group²

01 Nov 2018

TL;DR: This paper proposes an approach to extend the neural abstractive model trained on large scale SDS data to the MDS task, which makes use of a small number of multi-document summaries for fine tuning.

...read moreread less

Abstract: Till now, neural abstractive summarization methods have achieved great success for single document summarization (SDS). However, due to the lack of large scale multi-document summaries, such methods can be hardly applied to multi-document summarization (MDS). In this paper, we investigate neural abstractive methods for MDS by adapting a state-of-the-art neural abstractive summarization model for SDS. We propose an approach to extend the neural abstractive model trained on large scale SDS data to the MDS task. Our approach only makes use of a small number of multi-document summaries for fine tuning. Experimental results on two benchmark DUC datasets demonstrate that our approach can outperform a variety of baseline neural models.

...read moreread less

39 citations

Journal Article•DOI•

CIBS: A biomedical text summarizer using topic-based sentence clustering.

[...]

Milad Moradi¹•Institutions (1)

Isfahan University of Technology¹

13 Nov 2018-Journal of Biomedical Informatics

TL;DR: The results show that the CIBS method can improve the performance of single- and multi-document biomedical text summarization and it is shown that the topic-based sentence clustering approach can be effectively used to increase the informative content of summaries, as well as to decrease the redundant information.

...read moreread less

32 citations

Posted Content•

Towards a Neural Network Approach to Abstractive Multi-Document Summarization

[...]

Jianmin Zhang, Jiwei Tan, Xiaojun Wan

24 Apr 2018-arXiv: Computation and Language

...read moreread less

Journal Article•DOI•

Event summarization for sports games using twitter streams

[...]

Yue Huang¹, Chao Shen², Tao Li¹, Tao Li²•Institutions (2)

Nanjing University of Posts and Telecommunications¹, Florida International University²

01 May 2018-World Wide Web

TL;DR: To make the system applicable in real data, an online clustering approach is developed for participant detection and an online temporal-content mixture model is proposed to conduct sub-event detection.

...read moreread less

Abstract: Given a textual data stream related to an event, social event summarization aims to generate an informative textual description that can capture all the important moments, and it plays a critical role in mining and analyzing social media streams. In this paper, we present a general social event summarization framework using Twitter streams. The proposed framework consists of three key components: participant detection, sub-event detection, and summary tweet extraction. To make the system applicable in real data, an online clustering approach is developed for participant detection and an online temporal-content mixture model is proposed to conduct sub-event detection. Experiments show that the proposed framework can achieve similar performance with its batch counterpart.

...read moreread less

Proceedings Article•

auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus

[...]

Markus Zopf¹•Institutions (1)

Technische Universität Darmstadt¹

01 May 2018

TL;DR: A large heterogeneous multilingual multi-document summarization corpus with 7,316 topics in English and German is created, which has variing summary lengths and variing number of source documents.

...read moreread less

Abstract: Automatic text summarization is a challenging natural language processing (NLP) task which has been researched for several decades. The available datasets for multi-document summarization (MDS) are, however, rather small and usually focused on the newswire genre. Nowadays, machine learning methods are applied to more and more NLP problems such as machine translation, question answering, and single-document summarization. Modern machine learning methods such as neural networks require large training datasets which are available for the three tasks but not yet for MDS. This lack of training data limits the development of machine learning methods for MDS. In this work, we automatically generate a large heterogeneous multilingual multi-document summarization corpus. The key idea is to use Wikipedia articles as summaries and to automatically search for appropriate source documents. We created a corpus with 7,316 topics in English and German, which has variing summary lengths and variing number of source documents. More information about the corpus can be found at the corpus GitHub page at https://github.com/AIPHES/auto-hMDS.

...read moreread less

Journal Article•DOI•

Section mixture models for scientific document summarization

[...]

John M. Conroy, Sashka T. Davis

01 Sep 2018-International Journal on Digital Libraries

TL;DR: A system for summarization of scientific and structured documents that has three components: section mixture models are used for estimation of the weights of terms; a hypothesis test to select a subset of these terms; and a sentence extractor based on techniques for combinatorial optimization is presented.

...read moreread less

Abstract: In this paper, we present a system for summarization of scientific and structured documents that has three components: section mixture models are used for estimation of the weights of terms; a hypothesis test to select a subset of these terms; and a sentence extractor based on techniques for combinatorial optimization The section mixture models approach is an adaptation of a bigram mixture model based on the main sections of a scientific document and a collection of citing sentences (citances) from papers that reference the document The model was adapted from earlier work done on Biomedical documents used in the summarization task of the 2014 Text Analysis Conference (TAC 2014) The mixture model trained on the Biomedical data was used also on the data for the Computational Linguistics scientific summarization task of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (CL-SciSumm 2016) This model gives rise to machine-generated summaries with ROUGE scores that are nearly as strong as those seen on the Biomedical data and was also the highest scoring submission to the task of generating a human summary For sentence extraction, we use the OCCAMS algorithm (Davis et al, in: Vreeken, Ling, Zaki, Siebes, Yu, Goethals, Webb, Wu (eds) ICDM workshops, IEEE Computer Society, pp 454–463, 2012) which takes the sentences from the original document and the assignment of weights of the terms computed by the language models and outputs a set of minimally overlapping sentences whose combined term coverage is maximized Finally, we explore the importance of an appropriate background model for the hypothesis test to select terms to achieve the best quality summaries

...read moreread less

Journal Article•DOI•

Creating a reference data set for the summarization of discussion forum threads

[...]

Suzan Verberne¹, Emiel Krahmer², Iris Hendrickx¹, Sander Wubben², Antal van den Bosch¹ - Show less +1 more•Institutions (2)

Radboud University Nijmegen¹, Tilburg University²

01 Jun 2018

TL;DR: An elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set is presented and shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries.

...read moreread less

Abstract: In this paper we address extractive summarization of long threads in online discussion fora. We present an elaborate user evaluation study to determine human preferences in forum summarization and to create a reference data set. We showed long threads to ten different raters and asked them to create a summary by selecting the posts that they considered to be the most important for the thread. We study the agreement between human raters on the summarization task, and we show how multiple reference summaries can be combined to develop a successful model for automatic summarization. We found that although the inter-rater agreement for the summarization task was slight to fair, the automatic summarizer obtained reasonable results in terms of precision, recall, and ROUGE. Moreover, when human raters were asked to choose between the summary created by another human and the summary created by our model in a blind side-by-side comparison, they judged the model’s summary equal to or better than the human summary in over half of the cases. This shows that even for a summarization task with low inter-rater agreement, a model can be trained that generates sensible summaries. In addition, we investigated the potential for personalized summarization. However, the results for the three raters involved in this experiment were inconclusive. We release the reference summaries as a publicly available dataset.

...read moreread less

Journal Article•DOI•

An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms

[...]

Htet Myet Lynn¹, Chang Choi¹, Pankoo Kim¹•Institutions (1)

Chosun University¹

01 Jun 2018

TL;DR: An improved extractive text summarization method for documents is proposed by enhancing the conventional lexical chain method to produce better relevant information of the text using three distinct features or characteristics of keyword in a text.

...read moreread less

Abstract: Many researches have been converging on automatic text summarization as increasing of text documents due to the expansion of information diffusion constantly. The objective of this proposal is to achieve the most reliable and substantial context or most relevant brief summary of the text in extractive manner. The extractive text summarization produces the short summary of a certain text which contains the most important information of original text by extracting the set of sentences from the original document. This paper proposes an improved extractive text summarization method for documents by enhancing the conventional lexical chain method to produce better relevant information of the text using three distinct features or characteristics of keyword in a text. The keyword of the document is labeled using our previous work, transition probability distribution generator model which can learn the characteristics of the keyword in a document, and generates their probability distribution upon each feature.

...read moreread less

Journal Article•DOI•

Solving Multi-Document Summarization as an Orienteering Problem

[...]

Asma Bader Al-Saleh, Mohamed El Bachir Menai

30 Jun 2018-Algorithms

TL;DR: A novel extractive graph-based approach to solve the multi-document summarization (MDS) problem is proposed and it is shown that MDS-OP achieved the best F-measure scores on both tasks in terms of ROUGE-1 and RouGE-L (DUC 2004), ROU GE4, and three other evaluation methods (MultiLing 2015).

...read moreread less

Abstract: With advances in information technology, people face the problem of dealing with tremendous amounts of information and need ways to save time and effort by summarizing the most important and relevant information. Thus, automatic text summarization has become necessary to reduce the information overload. This article proposes a novel extractive graph-based approach to solve the multi-document summarization (MDS) problem. To optimize the coverage of information in the output summary, the problem is formulated as an orienteering problem and heuristically solved by an ant colony system algorithm. The performance of the implemented system (MDS-OP) was evaluated on DUC 2004 (Task 2) and MultiLing 2015 (MMS task) benchmark corpora using several ROUGE metrics, as well as other methods. Its comparison with the performances of 26 systems shows that MDS-OP achieved the best F-measure scores on both tasks in terms of ROUGE-1 and ROUGE-L (DUC 2004), ROUGE-SU4, and three other evaluation methods (MultiLing 2015). Overall, MDS-OP ranked among the best 3 systems.

...read moreread less

Proceedings Article•DOI•

Multi-Document Summarization Using Distributed Bag-of-Words Model

[...]

Kaustubh Mani¹, Ishan Verma², Hardik Meisheri², Lipika Dey²•Institutions (2)

International Institute of Information Technology, Hyderabad¹, Tata Consultancy Services²

01 Dec 2018

TL;DR: This paper presents an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model that selects summary sentences in order to minimize the reconstruction error between the summary and the documents.

...read moreread less

Abstract: As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. Specifically, our approach selects summary sentences in order to minimize the reconstruction error between the summary and the documents. We apply sentence selection and beam search, to further improve the performance of our model. Experimental results on two different datasets show significant performance gains compared with the state-of-the-art baselines.

...read moreread less

Journal Article•DOI•

Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms

[...]

Jonathan Rojas Simon¹, Yulia Ledeneva¹, René Arnulfo García Hernández¹•Institutions (1)

Universidad Autónoma del Estado de México¹

30 Mar 2018-Computación Y Sistemas

TL;DR: A method based on Genetic Algorithms is proposed for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a meta-document representation and three heuristics mentioned in several works of state-of-the-art are calculated to rank the most recent MDS methods.

...read moreread less

Abstract: Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC) workshops. Since DUC01, several methods have been presented in approximately 268 publications of the state-of-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we propose a method based on Genetic Algorithms for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds.

...read moreread less

Proceedings Article•DOI•

Template Oriented Text Summarization via Knowledge Graph

[...]

Pin Wu¹, Quan Zhou¹, Zhidan Lei¹, Wei Qiu¹, Xiaoqiang Li¹ - Show less +1 more•Institutions (1)

Shanghai University¹

01 Jul 2018

TL;DR: This paper proposes a method based on knowledge graph technology to automatically extract abstract texts that can not only obtain higher-level extraction from the text, but also can select template and question and answer to obtain a personalized abstract.

...read moreread less

Abstract: People are flooded with massive semi-structured and unstructured texts in their daily work life. The fast-paced lifestyle has forced us to get more focused information from these large amounts of text more quickly. So people urgently need a technology that can automatically extract abstracts from text. The traditional extractive automatic abstract method can only extract keywords or key sentences. Although the current popular sequence-to-sequence extraction methods have greatly improved compared with the traditional methods, they cannot be combined with the background information to obtain higher level abstraction. Therefore, we propose a method based on knowledge graph technology to automatically extract abstract texts. This method can not only obtain higher-level extraction from the text, but also can select template and question and answer to obtain a personalized abstract. We experimented on the CNN DAILYMAIL dataset. The results show that the abstract obtained by this method can reflect more textual information, and more in line with human reading habits, and can achieve personalized extraction, and can obtain close to the best ROUGE index results.

...read moreread less

Posted Content•

CQASUMM: Building References for Community Question Answering Summarization Corpora

[...]

Tanya Chowdhury, Tanmoy Chakraborty¹•Institutions (1)

Indraprastha Institute of Information Technology¹

12 Nov 2018-arXiv: Computation and Language

TL;DR: OdinioSumm is introduced, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score and is compared to the popular MDS techniques and evaluated their performance on the CQA corpora.

...read moreread less

Abstract: Community Question Answering forums such as Quora, Stackoverflow are rich knowledge resources, often catering to information on topics overlooked by major search engines. Answers submitted to these forums are often elaborated, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task for CQA forums. While a number of efforts have been made to summarize factoid CQA, little work exists in summarizing non-factoid CQA. We believe this is due to the lack of a considerably large, annotated dataset for CQA summarization. We create CQASUMM, the first huge annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference summary and build hundred word summaries from them. We treat other answers as candidates documents for summarization. We provide a script to generate the dataset and introduce the new task of Community Question Answering Summarization. Multi document summarization has been widely studied with news article datasets, especially in the DUC and TAC challenges using news corpora. However documents in CQA have higher variance, contradicting opinion and lesser amount of overlap. We compare the popular multi document summarization techniques and evaluate their performance on our CQA corpora. We look into the state-of-the-art and understand the cases where existing multi document summarizers (MDS) fail. We find that most MDS workflows are built for the entirely factual news corpora, whereas our corpus has a fair share of opinion based instances too. We therefore introduce OpinioSumm, a new MDS which outperforms the best baseline by 4.6% w.r.t ROUGE-1 score.

...read moreread less

Posted Content•

Abstract Meaning Representation for Multi-Document Summarization

[...]

Kexin Liao, Logan Lebanoff¹, Fei Liu¹•Institutions (1)

University of Central Florida¹

14 Jun 2018-arXiv: Computation and Language

TL;DR: In this paper, the authors study the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation.

...read moreread less

Journal Article•DOI•

Automatic summarization of scientific publications using a feature selection approach

[...]

Hazem Al Saied, Nicolas Dugué, Jean-Charles Lamirel

01 Sep 2018-International Journal on Digital Libraries

TL;DR: The Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall.

...read moreread less

Abstract: Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.

...read moreread less

Proceedings Article•DOI•

Towards State-of-the-art Baselines for Vietnamese Multi-document Summarization

[...]

Minh-Tien Nguyen¹, Hoang-Diep Nguyen¹, Thi-Hai-Nang Nguyen¹, Van-Hau Nguyen¹•Institutions (1)

Hung Yen University of Technology and Education¹

01 Nov 2018

TL;DR: According to experimental results, learning-to-rank methods achieve promising ROUGE-scores in many cases and one of them surpasses the state-of-the-art unsupervised learning method.

...read moreread less

Abstract: Text summarization is challenging, but an interesting task of natural language processing. While this task has been widely studied in English, it is still an early stage in Vietnamese. This paper introduces an investigation of extractive summarization methods in Vietnamese. To do that, we implement and compare several well-known summarization methods in three directions: unsupervised, supervised, and deep learning. We validate the performance of the methods on two Vietnamese datasets. According to experimental results, we find two interesting points. Firstly, learning-to-rank methods achieve promising ROUGE-scores in many cases. Particularly, one of them surpasses the state-of-the-art unsupervised learning method. Secondly, formulating the scoring step in the form of learning-to-rank benefits the selection step.

...read moreread less

Journal Article•DOI•

Sentence Extraction Based on Sentence Distribution and Part of Speech Tagging for Multi-Document Summarization

[...]

Agus Zainal Arifin¹, Moch. Zawaruddin Abdullah¹, Ahmad Wahyu Rosyadi¹, Desepta Isna Ulumi¹, Aminul Wahib¹, Rizka Wakhidatus Sholikah¹ - Show less +2 more•Institutions (1)

Sepuluh Nopember Institute of Technology¹

01 Apr 2018-TELKOMNIKA Telecommunication Computing Electronics and Control

TL;DR: This paper proposes a new sentence weighting method by incorporating sentence distribution and POS tagging for multi-document summarization and achieves better results with an increasing rate of 5.41% on ROUGE-1 and 0.62% onRouGE-2.

...read moreread less

Abstract: Automatic multi-document summarization needs to find representative sentences not only by sentence distribution to select the most important sentence but also by how informative a term is in a sentence. Sentence distribution is suitable for obtaining important sentences by determining frequent and well-spread words in the corpus but ignores the grammatical information that indicates instructive content. The presence or absence of informative content in a sentence can be indicated by grammatical information which is carried by part of speech (POS) labels. In this paper, we propose a new sentence weighting method by incorporating sentence distribution and POS tagging for multi-document summarization. Similarity-based Histogram Clustering (SHC) is used to cluster sentences in the data set. Cluster ordering is based on cluster importance to determine the important clusters. Sentence extraction based on sentence distribution and POS tagging is introduced to extract the representative sentences from the ordered clusters. The results of the experiment on the Document Understanding Conferences (DUC) 2004 are compared with those of the Sentence Distribution Method. Our proposed method achieved better results with an increasing rate of 5.41% on ROUGE-1 and 0.62% on ROUGE-2.

...read moreread less

Journal Article•DOI•

Multi-Document Summarization Using K-Means and Latent Dirichlet Allocation (LDA) – Significance Sentences

[...]

Shiva Twinandilla¹, Satriyo Adhy¹, Bayu Surarso¹, Retno Kusumaningrum¹•Institutions (1)

Diponegoro University¹

01 Jan 2018-Procedia Computer Science

TL;DR: This research proposes a novel summarization method which combines K-Means Clustering and LDA - Significance Sentences, so it can generate document summaries based on the topic and has good performance when the K-means method can cluster the document according to the topic correctly.

...read moreread less

Proceedings Article•DOI•

Learning to Estimate the Importance of Sentences for Multi-Document Summarization

[...]

Minh-Tien Nguyen¹, Thi-Hai-Nang Nguyen¹, Hoang-Diep Nguyen¹, Van-Hau Nguyen¹•Institutions (1)

Hung Yen University of Technology and Education¹

01 Nov 2018

TL;DR: A model for improving the quality of the scoring step, which benefits sentence selection to extract high-quality summaries and achieves sufficient improvements over traditional methods and competitive results with state-of-the-art deep learning models is presented.

...read moreread less

Abstract: Sentence scoring is a vital step in an extractive summarization system. This paper presents a model for improving the quality of the scoring step, which benefits sentence selection to extract high-quality summaries. Different from previous methods, our model takes advantage of local information (inside a single document) and global information (on the whole corpus). The combination allows defining a rich set of features used for learning. Under a learning-to-rank formulation, the model learns to estimate the importance of sentences. After ranking, summaries are finally extracted by selecting top-ranked sentences with the consideration of diversity. Experiments on three benchmark datasets (DUC 2001, 2002, and 2004) indicate that our model achieves sufficient improvements over traditional methods and competitive results with state-of-the-art deep learning models.

...read moreread less

Patent•

Method and device for generating multi-document summarization

[...]

Piji Li, Lyu Zhengdong, Hang Li

02 Nov 2018

TL;DR: In this article, a method and a device for generating a multi-document summarization, which relates to the field of data processing and solves the problem of poor performance of a summarization generated by an existing automatic multidocument summarization technology, is presented.

...read moreread less

Abstract: The embodiment of the invention discloses a method and a device for generating a multi-document summarization, relates to the field of data processing and solves the problem of poor performance of a summarization generated by an existing automatic multi-document summarization technology A specific scheme of the method comprises the steps of dividing multiple documents into n sentences; generatingan input word bag vector; performing unsupervised training on each sentence represented by the input word bag vector to obtain an encoding hidden layer vector of each sentence and a potential semantic vector of each sentence; collecting m potential semantic vectors; obtaining m decoding hidden layer vectors and m output word bag vectors according to the m potential semantic vectors; updating them decoding hidden layer vectors and the m output word bag vectors; estimating an importance degree of each sentence; acquiring the importance degree and a redundancy degree of a verb phrase of each sentence and the importance degree and the redundancy degree of a noun phrase of each sentence; and generating the summarization of multiple documents according to the importance degree and the redundancy degree of all noun phrases and the importance degree and the redundancy degree of all verb phrases The embodiment of the invention is used for a process for generating the multi-document summarization

...read moreread less

Journal Article•DOI•

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

[...]

M. S. Bewoor¹, S. H. Patil¹•Institutions (1)

Bharati Vidyapeeth University¹

20 Feb 2018-Engineering, Technology & Applied Science Research

TL;DR: Experimental results show the degree of effectiveness in text summarization over different clustering algorithms and analysis of treating a query sentence as a common one, segmented from documents forText summarization.

...read moreread less

Abstract: The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.

...read moreread less

Journal Article•DOI•

On redundancy in multi-document summarization

[...]

Hiram Calvo¹, Pabel Carrillo-Mendoza¹, Alexander Gelbukh¹•Institutions (1)

Instituto Politécnico Nacional¹

01 Jan 2018-Journal of Intelligent and Fuzzy Systems