scispace - formally typeset
Search or ask a question
Book ChapterDOI

Extraction of relevant figures and tables for multi-document summarization

11 Mar 2012-pp 402-413
TL;DR: E evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments, and feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear.
Abstract: We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.
Citations
More filters
01 Jun 2009
TL;DR: PubMed Central(PMC) as discussed by the authors ] is a pub-med central that provides a platform for the dissemination of MEDLINE information to the general public.
Abstract: PubMed Central(PMC)是美国国立卫生研究院国立医学图书馆生物技术与信息中心开发和维护的生物医学与生命科学期刊文献免费数字文档库。其宗旨是承担起数字时代世界级图书馆的作用。它不是期刊出版商。出版商自愿加入PMC,并需满足一定的科研水平和编辑质量标准。

108 citations

Journal ArticleDOI
TL;DR: The QMOS method is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews that combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon and employs the Semantic Sentiment Approach.
Abstract: Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users’ needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.

43 citations

Journal ArticleDOI
TL;DR: A novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS), which comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM).
Abstract: Opinion summarization is a process to produce concise summaries from a large number of opinionated texts. In this paper, we present a novel deep-learning-based method for the generic opinion-oriented extractive summarization of multi-documents (also known as RDLS). The method comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM). SAS employs recurrent neural network (RNN) which is composed by long short-term memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about a word have vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the existing drawbacks. TSS exploits multiple sources of statistical and linguistic knowledge features to augment word-level embedding and extract a proper set of sentences from multiple documents. TSS also uses the Restricted Boltzmann Machine algorithm to enhance and optimize those features and improve resultant accuracy without losing any important information. OSM consists of two phases: sentence classification and sentence selection which work together to produce a useful summary. Experiment results show that RDLS outperforms other existing methods. Moreover, the ensemble of statistical and linguistic knowledge, sentiment knowledge, sentiment shifter rules and word-embedding model allows RLDS to achieve significant accuracy.

22 citations

Patent
13 Jun 2014
TL;DR: In this article, a computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize Tabular data.
Abstract: A computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize tabular data. The method may include identifying a content characteristic for each category of the set of categories in the first tabular portion. And the method may include generating a first language section from at least two distinct categories of the set of categories, wherein a format of the first language section is based on the content characteristics for the at least two distinct categories.

19 citations

Journal Article
TL;DR: This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval and highlights both the already addressed and outstanding issues.
Abstract: This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users’ mind. Keywords—content based image retrieval, keyword based image retrieval, semantic gap, semantic image retrieval.

2 citations

References
More filters
Book ChapterDOI
28 Mar 2010
TL;DR: The characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, are discussed and a new test collection is described and analyses.
Abstract: The poster discusses the characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, and describes and analyses a new test collection constructed for this purpose. The test collection consists of approx. 18,000 monographic records, 160,000 papers and journal articles in PDF and 275,000 abstracts with a varied set of metadata and vocabularies from the physics domain, 65 topics based on real work tasks and corresponding graded relevance assessments. The test collection may be used for systems- as well as user-oriented evaluation.

1,039 citations

Proceedings ArticleDOI
30 Apr 2000
TL;DR: A multi-document summarizer, called MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and two new techniques, based on sentence utility and subsumption, are described.
Abstract: We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

493 citations

Posted Content
TL;DR: This article presented a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system, and also described two new techniques, based on sentence utility and subsumption, which have applied to the evaluation of both single and multiple document summaries.
Abstract: We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

488 citations

Proceedings Article
26 Oct 2008
TL;DR: The composition of a query plan for a group-by skyline query is examined and the missing cost model for the BBS algorithm is developed and Experimental results show that the techniques are able to devise the best query plans for a variety of group- by skyline queries.
Abstract: It is our great pleasure to welcome you to the 17th ACM Conference on Information and Knowledge Management -- CIKM'08. Since 1992, the ACM Conference on Information and Knowledge Management (CIKM) has been successfully bringing together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. In CIKM 2008, we continued the tradition of promoting collaboration among the general areas of databases, information retrieval, and knowledge management. This year's call for papers attracted almost 800 submissions from Asia, Canada, Europe, Africa, and the United States. The program committee accepted 132 papers and 103 posters giving CIKM'08 an acceptance rate of 17%.

281 citations

Proceedings ArticleDOI
02 Nov 2009
TL;DR: In this article, the authors present a comprehensive study on processing group-by skyline queries in the context of relational engines, and examine the composition of a query plan for a groupby skyline query and develop the missing cost model for the BBS algorithm.
Abstract: The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Since then, 100+ skyline-related papers have been published; however, we discovered that one of the most intuitive and practical type of skyline queries, namely, group-by skyline queries remains unaddressed. Group-by skyline queries find the skyline for each group of tuples. In this paper, we present a comprehensive study on processing group-by skyline queries in the context of relational engines. Specifically, we examine the composition of a query plan for a group-by skyline query and develop the missing cost model for the BBS algorithm. Experimental results show that our techniques are able to devise the best query plans for a variety of group-by skyline queries. Our focus is on algorithms that can be directly implemented in today's commercial database systems without the addition of new access methods (which would require addressing the associated challenges of maintenance with updates, concurrency control, etc.).

281 citations