Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Patent•

Hypertext document retrieval system and method

[...]

Yanhong Li

05 Feb 1997

TL;DR: In this paper, the indexer traverses the hypertext database and finds hypertext information including the address of the document the hyperlinks point to and the anchor text of each hyperlink.

...read moreread less

Abstract: A search engine for retrieving documents pertinent to a query indexes documents in accordance with hyperlinks pointing to those documents. The indexer traverses the hypertext database and finds hypertext information including the address of the document the hyperlinks point to and the anchor text of each hyperlink. The information is stored in an inverted index file, which may also be used to calculate document link vectors for each hyperlink pointing to a particular document. When a query is entered, the search engine finds all document vectors for documents having the query terms in their anchor text. A query vector is also calculated, and the dot product of the query vector and each document link vector is calculated. The dot products relating to a particular document are summed to determine the relevance ranking for each document.

...read moreread less

373 citations

Journal Article•DOI•

GlOSS: text-source discovery over the Internet

[...]

Luis Gravano¹, Hector Garcia-Molina², Anthony Tomasic³•Institutions (3)

Columbia University¹, Stanford University², French Institute for Research in Computer Science and Automation³

01 Jun 1999-ACM Transactions on Database Systems

TL;DR: This article describes GlOSS, Glossary of Servers Server, with two versions: bGloss, which provides a Boolean query retrieval model, and vGlOSS, which providing a vector-space retrieval model and extensively describes the methodology for measuring the retrieval effectiveness of these systems.

...read moreread less

Abstract: The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.

...read moreread less

371 citations

Book Chapter•DOI•

Semantic annotation, indexing, and retrieval

[...]

Atanas Kiryakov¹, Borislav Popov¹, Damyan Ognyanoff¹, Dimitar Manov¹, Angel Kirilov¹, Miroslav Goranov¹ - Show less +2 more•Institutions (1)

Ontotext¹

20 Oct 2003

TL;DR: A simplistic upper-level ontology is introduced which starts with some basic philosophic distinctions and goes down to the most popular entity types, thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions.

...read moreread less

Abstract: The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration. Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained. Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.

...read moreread less

366 citations

Patent•

Method for document retrieval and for word sense disambiguation using neural networks

[...]

Stephen I. Gallant

07 Nov 1990

TL;DR: In this article, a dictionary of context vectors provides a context vector for each word stem in the dictionary, and a normalized summary vector is stored for each document by combining the context vectors of the words remaining in the document after uninteresting words are removed.

...read moreread less

Abstract: A method for storing and searching documents also useful in disambiguating word senses and a method for generating a dictionary of context vectors. The dictionary of context vectors provides a context vector for each word stem in the dictionary. A context vector is a fixed length list of component values corresponding to a list of word-based features, the component values being an approximate measure of the conceptual relationship between the word stem and the word-based feature. Documents are stored by combining the context vectors of the words remaining in the document after uninteresting words are removed. The summary vector obtained by adding all of the context vectors of the remaining words is normalized. The normalized summary vector is stored for each document. The data base of normalized summary vectors is searched using a query vector and identifying the document whose vector is closest to that query vector. The normalized summary vectors of each document can be stored using cluster trees according to a centroid consistent algorithm to accelerate the searching process. Said searching process also gives an efficient way of finding nearest neighbor vectors in high-dimensional spaces.

...read moreread less

361 citations

Journal Article•DOI•

Why are online catalogs still hard to use

[...]

Christine L. Borgman¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 1996-Journal of the Association for Information Science and Technology

TL;DR: The problems with query matching systems are discussed, which were designed for skilled search intermediaries rather than end‐users, and the knowledge and skills they require in the information‐seeking process, illustrated with examples of searching card and online catalogs.

...read moreread less

Abstract: Author(s): Borgman, Christine L. | Abstract: We return to arguments made 10 years ago (Borgman, 1986a) that online catalogs are difficult to use because their design does not incorporate sufficient understanding of searching behavior. The earlier article examined studies of information retrieval system searching for their implications for online catalog design; this article examines the implications of card catalog design for online catalogs. With this analysis, we hope to contribute to a better understanding of user behavior and to lay to rest the card catalog design model for online catalogs. We discuss the problems with query matching systems, which were designed for skilled search intermediaries rather than end-users, and the knowledge and skills they require in the information-seeking process, illustrated with examples of searching card and online catalogs. Searching requires conceptual knowledge of the information retrieval process—translating an information need into a searchable query; semantic knowledge of how to implement a query in a given system—the how and when to use system features; and technical skills in executing the query—basic computing skills and the syntax of entering queries as specific search statements. In the short term, we can help make online catalogs easier to use through improved training and documentation that is based on information-seeking behavior, with the caveat that good training is not a substitute for good system design. Our long term goal should be to design intuitive systems that require a minimum of instruction. Given the complexity of the information retrieval problem and the limited capabilities of today's systems, we are far from achieving that goal. If libraries are to provide primary information services for the networked world, they need to put research results on the information-seeking process into practice in designing the next generation of online public access information retrieval systems.

...read moreread less

357 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics