Extraction of relevant figures and tables for multi-document summarization

doi:10.1007/978-3-642-28601-8_34

Home
/
Papers
/
Extraction of relevant figures and tables for multi-document summarization

Book Chapter•DOI•

Extraction of relevant figures and tables for multi-document summarization

Ashish Sadh¹, Amit Sahu¹, Devesh Srivastava¹, Ratna Sanyal¹, Sudip Sanyal¹ - Show less +1 more•Institutions (1)

Indian Institute of Information Technology, Allahabad¹

11 Mar 2012-pp 402-413

TL;DR: E evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments, and feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear.

read less

Abstract: We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.

...read moreread less

Citations

PDF

Open Access

More filters

PubMed Central

[...]

黄亚明

01 Jun 2009

TL;DR: PubMed Central（PMC） as discussed by the authors ] is a pub-med central that provides a platform for the dissemination of MEDLINE information to the general public.

...read moreread less

Abstract: PubMed Central（PMC）是美国国立卫生研究院国立医学图书馆生物技术与信息中心开发和维护的生物医学与生命科学期刊文献免费数字文档库。其宗旨是承担起数字时代世界级图书馆的作用。它不是期刊出版商。出版商自愿加入PMC，并需满足一定的科研水平和编辑质量标准。

...read moreread less

108 citations

Journal Article•DOI•

QMOS: Query-based multi-documents opinion-oriented summarization

[...]

Asad Abdi¹, Siti Mariyam Shamsuddin¹, Ramiz M. Aliguliyev²•Institutions (2)

Universiti Teknologi Malaysia¹, Azerbaijan National Academy of Sciences²

01 Mar 2018-Information Processing and Management

TL;DR: The QMOS method is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews that combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon and employs the Semantic Sentiment Approach.

...read moreread less

Abstract: Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users’ needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.

...read moreread less

43 citations

Journal Article•DOI•

A hybrid deep learning architecture for opinion-oriented multi-document summarization based on multi-feature fusion

[...]

Asad Abdi¹, Asad Abdi², Shafaatunnur Hasan², Siti Mariyam Shamsuddin², Norisma Idris³, Jalil Piran⁴ - Show less +2 more•Institutions (4)

University of Twente¹, Universiti Teknologi Malaysia², Information Technology University³, Sejong University⁴

15 Feb 2021-Knowledge Based Systems

TL;DR: A novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS), which comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM).

...read moreread less

Abstract: Opinion summarization is a process to produce concise summaries from a large number of opinionated texts. In this paper, we present a novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS). The method comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM). SAS employs recurrent neural network (RNN) which is composed by long short-term memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about a word have vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the existing drawbacks. TSS exploits multiple sources of statistical and linguistic knowledge features to augment word-level embedding and extract a proper set of sentences from multiple documents. TSS also uses the Restricted Boltzmann Machine algorithm to enhance and optimize those features and improve resultant accuracy without losing any important information. OSM consists of two phases: sentence classification and sentence selection which work together to produce a useful summary. Experiment results show that RDLS outperforms other existing methods. Moreover, the ensemble of statistical and linguistic knowledge, sentiment knowledge, sentiment shifter rules and word-embedding model allows RLDS to achieve significant accuracy.

...read moreread less

22 citations

Patent•

Generating language sections from tabular data

[...]

Amit P. Bohra¹, Krishna Kummamuru¹, Alexander Pikovsky¹, Swapnasarit Sahu¹•Institutions (1)

IBM¹

13 Jun 2014

TL;DR: In this article, a computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize Tabular data.

...read moreread less

Abstract: A computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize tabular data. The method may include identifying a content characteristic for each category of the set of categories in the first tabular portion. And the method may include generating a first language section from at least two distinct categories of the set of categories, wherein a format of the first language section is based on the content characteristics for the at least two distinct categories.

...read moreread less

19 citations

Journal Article•

Image retrieval: Techniques, challenge, and trend

[...]

Hui Hui Wang¹, Dzulkifli Mohamad¹, Nor Azman Ismail•Institutions (1)

Universiti Teknologi Malaysia¹

01 Dec 2009-World academy of science, engineering and technology

TL;DR: This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval and highlights both the already addressed and outstanding issues.

...read moreread less

Abstract: This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users’ mind. Keywords—content based image retrieval, keyword based image retrieval, semantic gap, semantic image retrieval.

...read moreread less

2 citations

References

PDF

Open Access

More filters

Book Chapter•DOI•

Developing a test collection for the evaluation of integrated search

[...]

Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen

28 Mar 2010

TL;DR: The characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, are discussed and a new test collection is described and analyses.

...read moreread less

Abstract: The poster discusses the characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, and describes and analyses a new test collection constructed for this purpose. The test collection consists of approx. 18,000 monographic records, 160,000 papers and journal articles in PDF and 275,000 abstracts with a varied set of metadata and vocabularies from the physics domain, 65 topics based on real work tasks and corresponding graded relevance assessments. The test collection may be used for systems- as well as user-oriented evaluation.

...read moreread less

1,039 citations

Proceedings Article•DOI•

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

[...]

Dragomir R. Radev¹, Hongyan Jing², Malgorzata Budzikowska³•Institutions (3)

University of Michigan¹, Columbia University², IBM³

30 Apr 2000

TL;DR: A multi-document summarizer, called MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and two new techniques, based on sentence utility and subsumption, are described.

...read moreread less

Abstract: We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

...read moreread less

493 citations

Posted Content•

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

[...]

Dragomir R. Radev¹, Hongyan Jing², Malgorzata Budzikowska³•Institutions (3)

University of Michigan¹, Columbia University², IBM³

12 May 2000-arXiv: Computation and Language

TL;DR: This article presented a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system, and also described two new techniques, based on sentence utility and subsumption, which have applied to the evaluation of both single and multiple document summaries.

...read moreread less

488 citations

Proceedings Article•

Proceedings of the 17th ACM conference on Information and knowledge management

[...]

James G. Shanahan, Sihem Amer-Yahia¹, Ioana Manolescu², Yi Zhang³, David A. Evans, Alek Kolcz⁴, Key-Sun Choi⁵, Abdur Chowdury⁶ - Show less +4 more•Institutions (6)

Yahoo!¹, French Institute for Research in Computer Science and Automation², University of California, Santa Cruz³, Microsoft⁴, KAIST⁵, Twitter⁶

26 Oct 2008

TL;DR: The composition of a query plan for a group-by skyline query is examined and the missing cost model for the BBS algorithm is developed and Experimental results show that the techniques are able to devise the best query plans for a variety of group- by skyline queries.

...read moreread less

Abstract: It is our great pleasure to welcome you to the 17th ACM Conference on Information and Knowledge Management -- CIKM'08. Since 1992, the ACM Conference on Information and Knowledge Management (CIKM) has been successfully bringing together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. In CIKM 2008, we continued the tradition of promoting collaboration among the general areas of databases, information retrieval, and knowledge management. This year's call for papers attracted almost 800 submissions from Asia, Canada, Europe, Africa, and the United States. The program committee accepted 132 papers and 103 posters giving CIKM'08 an acceptance rate of 17%.

...read moreread less

281 citations

Proceedings Article•DOI•

Group-by skyline query processing in relational engines

[...]

Ming-Hay Luk¹, Man Lung Yiu¹, Eric Lo¹•Institutions (1)

Hong Kong Polytechnic University¹

02 Nov 2009

TL;DR: In this article, the authors present a comprehensive study on processing group-by skyline queries in the context of relational engines, and examine the composition of a query plan for a groupby skyline query and develop the missing cost model for the BBS algorithm.

...read moreread less

Abstract: The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Since then, 100+ skyline-related papers have been published; however, we discovered that one of the most intuitive and practical type of skyline queries, namely, group-by skyline queries remains unaddressed. Group-by skyline queries find the skyline for each group of tuples. In this paper, we present a comprehensive study on processing group-by skyline queries in the context of relational engines. Specifically, we examine the composition of a query plan for a group-by skyline query and develop the missing cost model for the BBS algorithm. Experimental results show that our techniques are able to devise the best query plans for a variety of group-by skyline queries. Our focus is on algorithms that can be directly implemented in today's commercial database systems without the addition of new access methods (which would require addressing the associated challenges of maintenance with updates, concurrency control, etc.).

...read moreread less

281 citations