scispace - formally typeset
Search or ask a question

Showing papers on "Multi-document summarization published in 2021"


Posted Content
TL;DR: This work releases MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
Abstract: To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. We experiment with a summarization system based on BART, with promising early results. We formulate our summarization inputs and targets in both free text and structured forms and modify a recently proposed metric to assess the quality of our system's generated summaries. Data and models are available at this https URL

49 citations


Journal ArticleDOI
TL;DR: A novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS), which comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM).
Abstract: Opinion summarization is a process to produce concise summaries from a large number of opinionated texts. In this paper, we present a novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS). The method comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM). SAS employs recurrent neural network (RNN) which is composed by long short-term memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about a word have vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the existing drawbacks. TSS exploits multiple sources of statistical and linguistic knowledge features to augment word-level embedding and extract a proper set of sentences from multiple documents. TSS also uses the Restricted Boltzmann Machine algorithm to enhance and optimize those features and improve resultant accuracy without losing any important information. OSM consists of two phases: sentence classification and sentence selection which work together to produce a useful summary. Experiment results show that RDLS outperforms other existing methods. Moreover, the ensemble of statistical and linguistic knowledge, sentiment knowledge, sentiment shifter rules and word-embedding model allows RLDS to achieve significant accuracy.

22 citations


Journal ArticleDOI
TL;DR: Experiments on benchmark datasets show that the proposed summarization approach significantly outperforms relevant state-of-the-art baselines and the Semantic Link Network plays an important role in representing and understanding documents.
Abstract: The key to realize advanced document summarization is semantic representation of documents. This paper investigates the role of Semantic Link Network in representing and understanding documents for multi-document summarization. It proposes a novel abstractive multi-document summarization framework by first transforming documents into a Semantic Link Network of concepts and events and then transforming the Semantic Link Network into the summary of the documents based on the selection of important concepts and events while keeping semantics coherence. Experiments on benchmark datasets show that the proposed summarization approach significantly outperforms relevant state-of-the-art baselines and the Semantic Link Network plays an important role in representing and understanding documents.

19 citations


Proceedings ArticleDOI
Iftah Gamzu1, Hila Gonen1, Gilad Kutiel1, Ran Levy1, Eugene Agichtein2 
01 Jun 2021
TL;DR: This work suggests a novel task of extracting a single representative helpful sentence from a set of reviews for a given product, and describes a complete model that extracts representative helpful sentences with positive and negative sentiment towards the product and demonstrates that it outperforms several baselines.
Abstract: In recent years online shopping has gained momentum and became an important venue for customers wishing to save time and simplify their shopping process. A key advantage of shopping online is the ability to read what other customers are saying about products of interest. In this work, we aim to maintain this advantage in situations where extreme brevity is needed, for example, when shopping by voice. We suggest a novel task of extracting a single representative helpful sentence from a set of reviews for a given product. The selected sentence should meet two conditions: first, it should be helpful for a purchase decision and second, the opinion it expresses should be supported by multiple reviewers. This task is closely related to the task of Multi Document Summarization in the product reviews domain but differs in its objective and its level of conciseness. We collect a dataset in English of sentence helpfulness scores via crowd-sourcing and demonstrate its reliability despite the inherent subjectivity involved. Next, we describe a complete model that extracts representative helpful sentences with positive and negative sentiment towards the product and demonstrate that it outperforms several baselines.

18 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: This paper develops an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session.
Abstract: Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session. Our framework includes a procedure of collecting real user sessions, as well as evaluation measures relying on summarization standards, but adapted to reflect interaction. All of our solutions and resources are available publicly as a benchmark, allowing comparison of future developments in interactive summarization, and spurring progress in its methodological evaluation. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis motivate the proposed evaluation framework design and support its viability.

17 citations


Journal ArticleDOI
TL;DR: An unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach that outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods.
Abstract: Extractive multi-document summarization (MDS) is the process of automatically summarizing a collection of documents by ranking sentences according to their importance and informativeness. Text representation is a fundamental process that affects the effectiveness of many text summarization methods. Word embedding representations have shown to be effective for several Natural Language Processing (NLP) tasks including Automatic Text Summarization (ATS). However, most of these representations do not consider the order and the semantic relationships between words in a sentence. This does not fully allow grasping the sentence semantics and the syntactic relationships between sentences constituents. In this paper, to overcome this problem, we propose an unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach. The proposed method selects relevant sentences according to the final score obtained by combining three scores: sentence content relevance, sentence novelty, and sentence position scores. The sentence content relevance score is computed as the cosine similarity between the centroid embedding vector of the cluster of documents and the sentence embedding vectors. The sentence novelty metric is explicitly adopted to deal with redundancy. The sentence position metric assumes that the first sentences of a document are more relevant to the summary, and it assigns high scores to these sentences. Moreover, this paper provides a comparative analysis of nine sentence embedding models used to represent sentences as dense vectors in a low dimensional vector space in the context of extractive multi-document summarization. Experiments are performed on the standard DUC’2002–2004 benchmark datasets and the Multi-News dataset. The overall obtained results have shown that our method outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods.

14 citations


Proceedings Article
02 Mar 2021
TL;DR: In this article, two data augmentation and encoding methods were proposed for query-focused multi-document summarization (QMDS) training datasets, which were constructed using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create a new query-centric dataset, called the Query-focused Multi-Document Summarization IR dataset.
Abstract: The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.

14 citations


Journal ArticleDOI
TL;DR: Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods and transfer learning using pre-trained word embedding models has shown promising results.
Abstract: Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods. Transfer learning using pre-trained word embedding models has shown promising ...

8 citations


Book ChapterDOI
21 Sep 2021
TL;DR: The CLEF 2021 SimpleText track as discussed by the authors addresses the opportunities and challenges of text simplification approaches to improve scientific information access head-on, and provides appropriate data and benchmarks, starting with pilot tasks in 2019 and creating a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.
Abstract: Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track addresses the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.

8 citations


Book ChapterDOI
Sayali Kulkarni1, Sheide Chammas1, Wan Zhu1, Fei Sha1, Eugene Ie1 
05 Sep 2021
TL;DR: This paper proposed an approach for automatically generated dataset for both extractive and abstractive summaries and design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input.
Abstract: Document summarization compress source document(s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (q mds) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on q mds.

8 citations


Journal ArticleDOI
TL;DR: This paper proposes to leverage transfer learning from pre-trained sentence embedding models to represent documents’ sentences and users’ queries using embedding vectors that capture the semantic and the syntactic relationships between their constituents (words, phrases).
Abstract: Extractive query-focused multi-document summarization (QF-MDS) is the process of automatically generating an informative summary from a collection of documents that answers a pre-given query. Sentence and query representation is a fundamental cornerstone that affects the effectiveness of several QF-MDS methods. Transfer learning using pre-trained word embedding models has shown promising performance in many applications. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. In this paper, to deal with this issue, we propose to leverage transfer learning from pre-trained sentence embedding models to represent documents’ sentences and users’ queries using embedding vectors that capture the semantic and the syntactic relationships between their constituents (words, phrases). Furthermore, BM25 and semantic similarity function are linearly combined to retrieve a subset of sentences based on their relevance to the query. Finally, the maximal marginal relevance criterion is applied to re-rank the selected sentences by maintaining query relevance and minimizing redundancy. The proposed method is unsupervised, simple, efficient, and requires no labeled text summarization training data. Experiments are conducted using three standard datasets from the DUC evaluation campaign (DUC’2005–2007). The overall obtained results show that our method outperforms several state-of-the-art systems and achieves comparable results to the best performing systems, including supervised deep learning-based methods.

Proceedings ArticleDOI
01 Aug 2021

Journal ArticleDOI
TL;DR: A survey of extractive multi-document summarization can be found in this paper, where the authors present an extensive survey of the extractive MDS techniques over the last decade and compare their strengths and weaknesses.
Abstract: With the tremendous growth in the number of electronic documents, it is becoming challenging to manage the volume of information. Much research has focused on automatically summarizing the information available in the documents. Multi-Document Summarization (MDS) is one approach that aims to extract the information from the available documents in such a concise way that none of the important points are missed from the summary while avoiding the redundancy of information at the same time. This study presents an extensive survey of extractive MDS over the last decade to show the progress of research in this field. We present different techniques of extractive MDS and compare their strengths and weaknesses. Research work is presented by category and evaluated to help the reader understand the work in this field and to guide them in defining their own research directions. Benchmark datasets and standard evaluation techniques are also presented. This study concludes that most of the extractive MDS techniques are successful in developing salient and information-rich summaries of the documents provided.

Journal ArticleDOI
TL;DR: The authors proposed a neural seq-to-seq model to convey relevant and diverse information in multi-document summarization, which is critical in multidocument summarization and yet remains elusive for NQS models whose outputs are often redundant.
Abstract: The ability to convey relevant and diverse information is critical in multi-document summarization and yet remains elusive for neural seq-to-seq models whose outputs are often redundant and fail to...

Posted Content
TL;DR: This article presented a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization and human generated summaries for evaluating and comparing the summaries generated by machine learning models.
Abstract: The amount of text data available online is increasing at a very fast pace hence text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence generating automated summaries for the data and using it to train machine learning models will make these models space and time-efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries for evaluating and comparing the summaries generated by machine learning models.

Posted Content
TL;DR: In this article, the authors propose a comparative summarization framework CoCoSum, which consists of two few-shot summarization models that are jointly used to generate contrastive and common summaries.
Abstract: Opinion summarization focuses on generating summaries that reflect popular opinions of multiple reviews for a single entity (e.g., a hotel or a product.) While generated summaries offer general and concise information about a particular entity, the information may be insufficient to help the user compare multiple entities. Thus, the user may still struggle with the question "Which one should I pick?" In this paper, we propose a {\em comparative opinion summarization} task, which is to generate two contrastive summaries and one common summary from two given sets of reviews from different entities. We develop a comparative summarization framework CoCoSum, which consists of two few-shot summarization models that are jointly used to generate contrastive and common summaries. Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce high-quality contrastive and common summaries than state-of-the-art opinion summarization models.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Compared to conventional methods, the hybrid generation approach inspired by traditional concept-to-text systems leads to more faithful, relevant and aggregation-sensitive summarization – while being equally fluent.
Abstract: We present a method for generating comparative summaries that highlight similarities and contradictions in input documents. The key challenge in creating such summaries is the lack of large parallel training data required for training typical summarization systems. To this end, we introduce a hybrid generation approach inspired by traditional concept-to-text systems. To enable accurate comparison between different sources, the model first learns to extract pertinent relations from input documents. The content planning component uses deterministic operators to aggregate these relations after identifying a subset for inclusion into a summary. The surface realization component lexicalizes this information using a text-infilling language model. By separately modeling content selection and realization, we can effectively train them with limited annotations. We implemented and tested the model in the domain of nutrition and health – rife with inconsistencies. Compared to conventional methods, our framework leads to more faithful, relevant and aggregation-sensitive summarization – while being equally fluent.

Journal ArticleDOI
01 Apr 2021
TL;DR: This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents and finds that the performance of MDTS is better when compared with other methods.
Abstract: Internet or Web consists of a massive amount of information, handling which is a tedious task. Summarization plays a crucial role in extracting or abstracting key content from multiple sources with its meaning contained, thereby reducing the complexity in handling the information. Multi-document summarization gives the gist of the content collected from multiple documents. Temporal summarization concentrates on temporally related events. This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents. This technique extracts the events with the time stamp. TIMEML standards tags are used in extracting events and times. These event-times are stored in a structured database form for easier operations. Sentence ranking methods are build based on the frequency of events occurrences in the sentence. Sentence similarity measures are computed to eliminate the redundant sentences in an extracted summary. Depending on the required summary length, top-ranked sentences are selected to form the summary. Experiments are conducted on DUC 2006 and DUC 2007 data set that was released for multi-document summarization task. The extracted summaries are evaluated using ROUGE to determine precision, recall and F measure of generated summaries. The performance of the proposed method is compared with particle swarm optimization-based algorithm (PSOS), Cat swarm optimization-based summarization (CSOS), Cuckoo Search based multi-document summarization (MDSCSA). It is found that the performance of MDTS is better when compared with other methods. Doi: 10.28991/esj-2021-01268 Full Text: PDF

Posted Content
TL;DR: This work proposes a novel reinforcement learning based framework PoBRL for solving multidocument summarization that decouples this multi-objective optimization into different sub-problems that can be solved individually by reinforcement learning.
Abstract: We propose a novel reinforcement learning based framework PoBRL for solving multi-document summarization. PoBRL jointly optimizes over the following three objectives necessary for a high-quality summary: importance, relevance, and length. Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning. Utilizing PoBRL, we then blend each learned policies together to produce a summary that is a concise and complete representation of the original input. Our empirical analysis shows state-of-the-art performance on several multi-document datasets. Human evaluation also shows that our method produces high-quality output.

Book ChapterDOI
13 Aug 2021
TL;DR: Wang et al. as mentioned in this paper proposed a query expansion method which combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking.
Abstract: Manifold ranking has been successfully applied in query-oriented multi-document summarization. It not only makes use of the relationships among the sentences, but also the relationships between the given query and the sentences. However, the information of original query is often insufficient. So we present a query expansion method, which is combined in the manifold ranking to resolve this problem. Our method not only utilizes the information of the query term itself and the knowledge base WordNet to expand it by synonyms, but also uses the information of the document set itself to expand the query in various ways (mean expansion, variance expansion and TextRank expansion). Compared with the previous query expansion methods, our method combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking. In addition, we use the degree of word overlap and the proximity between words to calculate the similarity between sentences. We performed experiments on the datasets of DUC 2006 and DUC2007, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.

Book ChapterDOI
TL;DR: Wang et al. as discussed by the authors proposed a query expansion method which combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking.
Abstract: Manifold ranking has been successfully applied in query-oriented multi-document summarization. It not only makes use of the relationships among the sentences, but also the relationships between the given query and the sentences. However, the information of original query is often insufficient. So we present a query expansion method, which is combined in the manifold ranking to resolve this problem. Our method not only utilizes the information of the query term itself and the knowledge base WordNet to expand it by synonyms, but also uses the information of the document set itself to expand the query in various ways (mean expansion, variance expansion and TextRank expansion). Compared with the previous query expansion methods, our method combines multiple query expansion methods to better represent query information, and at the same time, it makes a useful attempt on manifold ranking. In addition, we use the degree of word overlap and the proximity between words to calculate the similarity between sentences. We performed experiments on the datasets of DUC 2006 and DUC2007, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.


Proceedings ArticleDOI
08 Apr 2021
TL;DR: In this paper, a transformer model is used to generate individual sentence summaries of respected review text and then used a combination of Universal Sentence Encoder, statistical methods and graph reduction algorithm to select the most relevant sentences to best represent the whole text.
Abstract: With abundant user data and computing power, using machine learning to automate or minimize our task has become a recent trend. These resources are useful in providing some correlations between data and drawing conclusions. There has been a rapid progression in the field of AI/ML since the past decade with areas like computer vision and natural language processing at the forefront of the race. Text summarization is one of the tasks which has been explored a lot but there hasn’t been any practical application for the same other than tasks such as news or book summarization which are based on Extractive summarization. One of the main disadvantages of Abstractive summarization is there is too much focus on generating good results with respect to a particular sentence and too little on the corpus of text containing thousands of such sentences. There has been some work on multi document summarization but that does not account for the fact that the text can be changed or appended with new text and thus these methods become obsolete. Thus, the proposed method can be applied to a large corpus of data containing thousands of entities/sentences and can also be applied to changing text corpus. The method can be applied to new data that is added separately without rerunning the model on the whole corpus again. This gives us the power of batch processing which can be leveraged according to our space and time constraints. There has not been any research on the same to the best of our knowledge. A transformer model is used to generate individual sentence summaries of respected review text and then used a combination of Universal Sentence Encoder, statistical methods and graph reduction algorithm to select the most relevant sentences to best represent the whole text. By doing so, the same mechanism is applied to incorporate new data that can be added to the corpus and the accuracy would not be affected much. Our results show that even by increasing the degree of contraction of the text corpus (particularly large text corpus), the same accuracy can be achieved.

Posted Content
TL;DR: The authors analyzed the paragraph-level attention weights of GraphSum's multi-heads and decoding layers in order to improve the explainability of a transformer-based multi-document summarization (MDS) model.
Abstract: Modern multi-document summarization (MDS) methods are based on transformer architectures. They generate state of the art summaries, but lack explainability. We focus on graph-based transformer models for MDS as they gained recent popularity. We aim to improve the explainability of the graph-based MDS by analyzing their attention weights. In a graph-based MDS such as GraphSum, vertices represent the textual units, while the edges form some similarity graph over the units. We compare GraphSum's performance utilizing different textual units, i. e., sentences versus paragraphs, on two news benchmark datasets, namely WikiSum and MultiNews. Our experiments show that paragraph-level representations provide the best summarization performance. Thus, we subsequently focus oAnalysisn analyzing the paragraph-level attention weights of GraphSum's multi-heads and decoding layers in order to improve the explainability of a transformer-based MDS model. As a reference metric, we calculate the ROUGE scores between the input paragraphs and each sentence in the generated summary, which indicate source origin information via text similarity. We observe a high correlation between the attention weights and this reference metric, especially on the the later decoding layers of the transformer architecture. Finally, we investigate if the generated summaries follow a pattern of positional bias by extracting which paragraph provided the most information for each generated summary. Our results show that there is a high correlation between the position in the summary and the source origin.

Posted Content
TL;DR: In this article, a comparison between different multilingual and monolingual BERT models for extractive text summarization in Vietnamese is presented. And the experiment results indicate that monolinguistic models produce promising results compared to other multilingual models and previous text summarizing models for Vietnamese.
Abstract: Recent researches have demonstrated that BERT shows potential in a wide range of natural language processing tasks. It is adopted as an encoder for many state-of-the-art automatic summarizing systems, which achieve excellent performance. However, so far, there is not much work done for Vietnamese. In this paper, we showcase how BERT can be implemented for extractive text summarization in Vietnamese. We introduce a novel comparison between different multilingual and monolingual BERT models. The experiment results indicate that monolingual models produce promising results compared to other multilingual models and previous text summarizing models for Vietnamese.

Proceedings ArticleDOI
20 Jan 2021
TL;DR: In this paper, a new Social Spider Optimization (SSO) algorithm based multi-document summarization model was proposed, which comprises of involves preprocessing, representation of inputs, and representation of summary.
Abstract: At present times, text summarization is considered an effective technique used for the extraction of useful data from the massive quantity of documents. Depending upon the document count involved in the summarization process, it is classified into single or multi-document summarization. Compared to single document summarization, the multi-document summarization process remains a difficult task of finding a precise summary from many documents. A new Social Spider Optimization (SSO) algorithm based multi-document summarization model were proposed. This model comprises of involves preprocessing, representation of inputs, and representation of summary. The intention of the summary representation process is the generation of the summary of the documents comprising meaning data. By the optimum sentence selection procedure using the SSO algorithm, the essential sentences representing the summary are chosen. A detailed experimental validation process is carried out using the DUC 2006 and 2007 dataset. The experimental results verified the effectual outcome of the SSO algorithm compared to other optimization algorithms.

Proceedings Article
Chen Moye1, Wei Li, Jiachen Liu1, Xinyan Xiao1, Hua Wu1, Haifeng Wang1 
25 Oct 2021
TL;DR: Li et al. as discussed by the authors proposed a sub-graph selection method for multi-document summarization, in which source documents are regarded as relation graphs of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its subgraphs, and instead of selecting salient sentences, SgSum selects a salient subgraph from the relation graph as the summary.
Abstract: Most of existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary, which have two main drawbacks: (1) neglecting both the intra and cross-document relations between sentences; (2) neglecting the coherence and conciseness of the whole summary. In this paper, we propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem, in which source documents are regarded as a relation graph of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its sub-graphs. Instead of selecting salient sentences, SgSum selects a salient sub-graph from the relation graph as the summary. Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent. Extensive experiments on MultiNews and DUC datasets show that our proposed method brings substantial improvements over several strong baselines. Human evaluation results also demonstrate that our model can produce significantly more coherent and informative summaries compared with traditional MDS methods. Moreover, the proposed architecture has strong transfer ability from single to multi-document input, which can reduce the resource bottleneck in MDS tasks.

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors proposed Abstractive Text Summarization using Deep Learning with Attention Mechanism, which removes duplicate data and generates new sentences by rephrasing them or adding words originally absent in the source text.
Abstract: Due to the advancement of the Internet nowadays, a lot of people are mostly dependent on the Web to get the required information. As data is increasing exponentially, there is a high chance of duplication of data; it is difficult and tedious for the manual reading of all the documents as well as the rejection of the duplicates and extraction of useful information. One of the solutions to this issue is “Text Summarization,” through which a huge volume of data can be read quickly; but it is very hard to summarize documents manually, thus necessitating the use of an automatic tool to perform this task. Abstractive Text Summarization is one such automated technique of producing a short and accurate summary of a document while preserving essential information and comprehensive meaning. In this paper, Abstractive Text Summarization using Deep Learning with Attention Mechanism has been proposed. The designed framework removes duplicate data and generates new sentences by rephrasing them or adding words originally absent in the source text. Experimental results on the dataset, Amazon Fine Food Review, are evaluated by utilizing performance metrics such as Rouge scores.

Journal ArticleDOI
10 Apr 2021
TL;DR: The proposed model makes use of naïve Bayes classifier (NBC) model for the CLMDS that enables the user to provide a query in Tamil language, generate a summary from multiple English documents, and finally translate the summary into Tamil language.
Abstract: Cross-Language Multi-document summarization (CLMDS) process produces a summary generated from multiple documents in which the summary language is different from the source document language. The CLMDS model allows the user to provide query in a particular language (e.g., Tamil) and generates a summary in the same language from different language source documents. The proposed model enables the user to provide a query in Tamil language, generate a summary from multiple English documents, and finally translate the summary into Tamil language. The proposed model makes use of naive Bayes classifier (NBC) model for the CLMDS. An extensive set of experimentation analysis was performed and the results are investigated under distinct aspects. The resultant experimental values ensured the supremacy of the presented CLMDS model.

Journal ArticleDOI
TL;DR: A novel multi-document summarization with sentences overlapping with pairwise overlapping between the sentences is proposed, and the results are promising when they have been compared with some existing methods.
Abstract: Extractive multi-document summarization receives a set of documents and extracts the important sentences to form a summary. This paper proposes a novel multi-document summarization with sentences overlapping. First, we preprocess multi-document and calculate 12 features of each sentence. This paper suggests four new features: ROUGE-1 and ROUGE-2 score between the sentence and a single document, ROUGE-1 and ROUGE-2 score between the sentence and multiple documents, also a new definition of sentence overlapping feature. Then, we assign each sentence a score by the learned model. We calculate pairwise overlapping between the sentences and finally select the sentences with higher score and less redundancy. These sentences are given to form the final summary to output under a length constraint. Our method is language free, and it can be implemented on other languages with minor changes. The proposed method is tested on DUC 2006 and 2007 datasets. The effectiveness of this technique is measured using the ROUGE score, and the results are promising when they have been compared with some existing methods.