Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Document image retrieval using signatures as queries

[...]

Sargur N. Srihari¹, Shravya Shetty¹, Siyuan Chen¹, Harish Srinivasan¹, Chen Huang¹, Gady Agam², Ophir Frieder² - Show less +3 more•Institutions (2)

University at Buffalo¹, Illinois Institute of Technology²

27 Apr 2006-Scopus

TL;DR: A novel signature retrieval strategy is presented, which includes a technique for noise and printed text removal from signature images, previously extracted from business documents, based on a normalized correlation similarity measure using global shape-based binary feature vectors.

...read moreread less

Abstract: In searching a repository of business documents, a task of interest is that of using a query signature image to retrieve from a database, other signatures matching the query. The signature retrieval task involves a two-step process of extracting all the signatures from the documents and then performing a match on these signatures. This paper presents a novel signature retrieval strategy, which includes a technique for noise and printed text removal from signature images, previously extracted from business documents. Signature matching is based on a normalized correlation similarity measure using global shape-based binary feature vectors. In a retrieval task involving a database of 447 signatures, on an average 4.43 out of the top 5 choices were signatures belonging to the writer of the queried signature. On considering the Top 10 ranks, a F-measure value of 76.3 was obtained and the precision and recall values at this F-measure were 74.5% and 78.28% respectively.

...read moreread less

47 citations

Proceedings Article•DOI•

Supporting entity search: a large-scale prototype search engine

[...]

Tao Cheng¹, Xifeng Yan¹, Kevin Chen-Chuan Chang¹•Institutions (1)

Urbana University¹

11 Jun 2007

TL;DR: The WISDM project at UIUC is built and a prototype search engine over a 2TB Web corpus is evaluated, showing the feasibility and promise of a large-scale system architecture to support entity search.

...read moreread less

Abstract: As the Web has evolved into a data-rich repository, with the standard page view," current search engines are increasingly inadequate While we often search for various data "entities" (eg phone number, paper PDF, date), today's engines only take us indirectly to pages Therefore, we propose the concept of entity search, a significant departure from traditional document retrieval Towards our goal of supporting entity search, in the WISDM project at UIUC we build and evaluate our prototype search engine over a 2TB Web corpus Our demonstration shows the feasibility and promise of a large-scale system architecture to support entity search

...read moreread less

47 citations

Journal Article•DOI•

Knowledge transfer for cross domain learning to rank

[...]

Depin Chen¹, Yan Xiong¹, Jun Yan², Gui-Rong Xue³, Gang Wang², Zheng Chen² - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², Shanghai Jiao Tong University³

01 Jun 2010-Information Retrieval

TL;DR: This paper aims at improving the learning of a ranking model in target domain by leveraging knowledge from the outdated or out-of-domain data by proposing two novel methods to conduct knowledge transfer at feature level and instance level.

...read moreread less

Abstract: Recently, learning to rank technology is attracting increasing attention from both academia and industry in the areas of machine learning and information retrieval. A number of algorithms have been proposed to rank documents according to the user-given query using a human-labeled training dataset. A basic assumption behind general learning to rank algorithms is that the training and test data are drawn from the same data distribution. However, this assumption does not always hold true in real world applications. For example, it can be violated when the labeled training data become outdated or originally come from another domain different from its counterpart of test data. Such situations bring a new problem, which we define as cross domain learning to rank. In this paper, we aim at improving the learning of a ranking model in target domain by leveraging knowledge from the outdated or out-of-domain data (both are referred to as source domain data). We first give a formal definition of the cross domain learning to rank problem. Following this, two novel methods are proposed to conduct knowledge transfer at feature level and instance level, respectively. These two methods both utilize Ranking SVM as the basic learner. In the experiments, we evaluate these two methods using data from benchmark datasets for document retrieval. The results show that the feature-level transfer method performs better with steady improvements over baseline approaches across different datasets, while the instance-level transfer method comes out with varying performance depending on the dataset used.

...read moreread less

47 citations

Journal Article•DOI•

A model for enhancing Internet medical document retrieval with "medical core metadata".

[...]

Gary Malet¹, Felix Munoz¹, Richard Appleyard¹, William R. Hersh¹•Institutions (1)

Oregon Health & Science University¹

01 Mar 1999-Journal of the American Medical Informatics Association

TL;DR: A set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents to facilitate document retrieval by Internet search engines is defined.

...read moreread less

47 citations

Dissertation•

Automatic identification of causal relations in text and their use for improving precision in information retrieval

[...]

Christopher S. G. Khoo¹•Institutions (1)

Syracuse University¹

03 Oct 1996

TL;DR: This study investigated whether the information obtained by matching causal relations expressed in documents with the causal Relations expressed in users' queries could be used to improve document retrieval results in comparison to using just term matching without considering relations.

...read moreread less

Abstract: This study represents one attempt to make use of relations expressed in text to improve information retrieval effectiveness In particular, the study investigated whether the information obtained by matching causal relations expressed in documents with the causal relations expressed in users' queries could be used to improve document retrieval results in comparison to using just term matching without considering relations An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed The method uses linguistic clues to identify causal relations without recourse to knowledge-based inferencing The method was successful in identifying and extracting about 68% of the causal relations that were clearly expressed within a sentence or between adjacent sentences in Wall Street Journal text Of the instances that the computer program identified as causal relations, 72% can be considered to be correct The automatic method was used in an experimental information retrieval system to identify causal relations in a database of full-text Wall Street Journal documents Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query--as in an SDI or routing queries situation The best results were obtained when causal relation matching was combined with word proximity matching (matching pairs of causally related words in the query with pairs of words that co-occur within document sentences) An analysis using manually identified causal relations indicate that bigger retrieval improvements can be expected with more accurate identification of causal relations The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any term The study also investigated whether using Roget's International Thesaurus (3rd ed) to expand query terms with synonymous and related terms would improve retrieval effectiveness Using Roget category codes in addition to keywords did give better retrieval results However, the Roget codes were better at identifying the nonrelevant documents than the relevant ones

...read moreread less

47 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics