Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

User-Centered Perspective of Information Retrieval Research and Analysis Methods.

[...]

William Sugar

01 Jan 1995

TL;DR: There has been a paradigm shift from the system perspective to the user perspective, with a resulting need to design and redesign systems that focus on user needs and that requires analyses of users, their needs, and their habits as discussed by the authors.

...read moreread less

Abstract: The author in his chapter refers us back to a previous ARIST chapter by Brenda Dervin and Michael Nilan, in which the dichotomy between the system perspective and the user perspective is shown. He points out that the user perspective shows up in retrieval studies and questions how these studies have affected information retrieval research methods.There has been a paradigm shift from the system perspective to the user perspective, with a resulting need to design and redesign systems that focus on user needs and that requires analyses of users, their needs, and their habits. Two approaches that advocate the user-centered perspective are : (1) the cognitive approach, and (2° the holistic approach. Systems designed from the user-centered perspective would not only serve the intended audience but would further the user-centerd perspective of the entire information retrieival discipline

...read moreread less

58 citations

Book Chapter•DOI•

Retrieval from document image collections

[...]

A. Balasubramanian¹, Million Meshesha¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

13 Feb 2006

TL;DR: A novel DTW-based partial matching scheme is employed to take care of morphologically variant words to achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level.

...read moreread less

Abstract: This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTW-based partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process.The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. System-level issues for retrieval (eg. scalability, effective delivery etc.) are addressed in this paper.

...read moreread less

58 citations

Journal Article•DOI•

Fuzzy information retrieval

[...]

Valerie Cross¹•Institutions (1)

Miami University¹

01 Feb 1994

TL;DR: A general description of the main components of fuzzy information retrieval are given: document representation, query representation, computer-aided query formulation, document retrieval status, and performance measures.

...read moreread less

Abstract: Over the past decade, information retrieval has emerged as an active research area in the application of fuzzy set theory. Fuzzy information retrieval utilizes fuzzy sets to represent documents, membership degrees for query term relevance, fuzzy logical operators to define queries, and fuzzy compatibility measures to assess the retrieval status value of a document. This paper presents an overview of fuzzy relational databases and fuzzy information retrieval. A general description of the main components of fuzzy information retrieval are given: document representation, query representation, computer-aided query formulation, document retrieval status, and performance measures. Examples of areas currently being researched are provided. The relation between fuzzy information retrieval and fuzzy relational databases is examined.

...read moreread less

58 citations

Proceedings Article•DOI•

Automatic extraction of titles from general documents using machine learning

[...]

Yunhua Hu¹, Hang Li², Yunbo Cao², Dmitriy Meyerzon², Qinghua Zheng¹ - Show less +1 more•Institutions (2)

Xi'an Jiaotong University¹, Microsoft²

07 Jun 2005

TL;DR: It turns out that the use of formatting information can lead to quite accurate extraction from general documents, and one can significantly improve search ranking results in do document retrieval by using the extracted titles.

...read moreread less

Abstract: In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of specific genres, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods have been proposed mainly for title extraction from research papers. It has not been clear whether it could be possible to conduct automatic title extraction from general documents. As a case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize formatting information such as font size as features in the models. It turns out that the use of formatting information can lead to quite accurate extraction from general documents. Precision and recall for title extraction from Word is 0.810 and 0.837 respectively, and precision and recall for title extraction from PowerPoint is 0.875 and 0.895 respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to another domain, and more surprisingly we can even train models in one language and apply them to another language. Moreover, we can significantly improve search ranking results in document retrieval by using the extracted titles.

...read moreread less

58 citations

Proceedings Article•DOI•

A loosely-coupled integration of a text retrieval system and an object-oriented database system

[...]

W. Bruce Croft¹, Lisa A. Smith¹, Howard R. Turtle²•Institutions (2)

University of Massachusetts Amherst¹, West²

01 Jun 1992

TL;DR: This paper describes an approach to complex object retrieval using a probabilistic inference net model and an implementation of this approach using a loose coupling of an object-oriented database system (IRIS) and a text retrieval system based on inference nets (INQUERY).

...read moreread less

Abstract: Document management systems are needed for many business applications. This type of system would combine the functionality of a database system, (for describing, storing and maintaining documents with complex structure and relationships) with a text retrieval system (for effective retrieval based on full text). The retrieval model for a document management system is complicated by the variety and complexity of the objects that are represented. In this paper, we describe an approach to complex object retrieval using a probabilistic inference net model, and an implementation of this approach using a loose coupling of an object-oriented database system (IRIS) and a text retrieval system based on inference nets (INQUERY). The resulting system is used to store long, structured documents and can retrieve document components (sections, figures, etc.) based on their contents or the contents of related components. The lessons learnt from the implementation are discussed.

...read moreread less

58 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics