Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The application of morpho-syntactic language processing to effective phrase matching

[...]

Paraic Sheridan¹, Alan F. Smeaton•Institutions (1)

Dublin City University¹

03 Jan 1992-Information Processing and Management

TL;DR: This paper describes a process whereby a morpho-syntactic analysis of phrases or user queries is used to generate a structured representation of text to evaluate the effectiveness or quality of the matching and scoring of phrases.

...read moreread less

Abstract: The application of automatic natural language processing techniques to the indexing and the retrieval of text information has been a target of information retrieval researchers for some time. Incorporating semantic-level processing of language into retrieval has led to conceptual information retrieval, which is effective but usually restricted in its domain. Using syntactic-level analysis is domain-independent, but has not yet yielded significant improvements in retrieval quality. This paper describes a process whereby a morpho-syntactic analysis of phrases or user queries is used to generate a structured representation of text. A process of matching these structured representations is then described that generates a metric value or score indicating the degree of match between phrases. This scoring can then be used for ranking the phrases. In order to evaluate the effectiveness or quality of the matching and scoring of phrases, some experiments are described that indicate the method to be quite useful. Ultimately the phrasematching technique described here would be used as part of an overall document retrieval strategy, and some future work towards this direction is outlined.

...read moreread less

40 citations

Proceedings Article•DOI•

An adaptive teleportation random walk model for learning social tag relevance

[...]

Xiaofei Zhu¹, Wolfgang Nejdl¹, Mihai Georgescu¹•Institutions (1)

Leibniz University of Hanover¹

03 Jul 2014

TL;DR: This paper model the relationships among images by constructing a voting graph, and proposes an adaptive teleportation random walk process, in which a confidence factor is introduced to control the teleportation probability, on the voting graph.

...read moreread less

Abstract: Social tags are known to be a valuable source of information for image retrieval and organization. However, contrary to the conventional document retrieval, rich tag frequency information in social sharing systems, such as Flickr, is not available, thus we cannot directly use the tag frequency (analogous to the term frequency in a document) to represent the relevance of tags. Many heuristic approaches have been proposed to address this problem, among which the well-known neighbor voting based approaches are the most effective methods. The basic assumption of these methods is that a tag is considered as relevant to the visual content of a target image if this tag is also used to annotate the visual neighbor images of the target image by lots of different users. The main limitation of these approaches is that they treat the voting power of each neighbor image either equally or simply based on its visual similarity. In this paper, we cast the social tag relevance learning problem as an adaptive teleportation random walk process on the voting graph. In particular, we model the relationships among images by constructing a voting graph, and then propose an adaptive teleportation random walk, in which a confidence factor is introduced to control the teleportation probability, on the voting graph. Through this process, direct and indirect relationships among images can be explored to cooperatively estimate the tag relevance. To quantify the performance of our approach, we compare it with state-of-the-art methods on two publicly available datasets (NUS-WIDE and MIR Flickr). The results indicate that our method achieves substantial performance gains on these datasets.

...read moreread less

40 citations

Proceedings Article•DOI•

On the allocation of documents in multiprocessor information retrieval systems

[...]

Ophir Frieder¹, Hava T. Siegelmann•Institutions (1)

George Mason University¹

01 Sep 1991

TL;DR: A genetic algorithm for MDAP is developed and the effects of varying the communication cost matrix representing the interprocessor communication topology and the uniformity of the distribution of documents to the clusters are studied.

...read moreread less

Abstract: Information retrieval is the selection of documents that are potentially relevant to a user’s information need. Given the vast volume of data stored in modern information retrieval systems, searching the document database requires vast computational resources. To meet these computational demands, various researchers have developed parallel information retrieval systems. As efficient exploitation of parallelism demands fast access to the documents, data organization and placement significantly affect the total processing time. We describe and evaluate a data placement strategy for distributed memory, distributed 1/0 multicomputers. Initially, a formal description of the Multiprocessor Document Allocation Problem (MDAP) and a proof that MDAP is NP Complete are presented. A document allocation algorithm for MDAP based on Genetic Algorithms is developed. This algorithm assumes that the documents are clustered using any one of the many clustering algorithms. We define a cost function for the derived allocation and evaluate the performance of our algorithm using this function. As part of the experimental analysis, the effects of varying the number of documents and their distribution across the clusters as well the exploitation of various differing architectural interconnection topologies are studied. We also experiment with the several parameters common to Genetic Algorithms, e.g., the probability of mutation and the population size. 1.0 Introduction An efficient multiprocessor information retrieval system must maintain a low system response time and require relatively little storage overhead. As the volume of stored data continues to increase daily, the multiprocessor engines must likewise scale to a large number of processors. This demand for system scalability necessitates a distributed memory architecture as a large number of processors is not currently possible in a sharedmemory configuration. A distributed memory system, however, introduces the problem Perrrkion to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appaar, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0-89791 -448 -1/91 /0009 /0230 . ..$1 .50 Hava Tova Siegelmann Dept. of Computer Science Rutgers University New Brunswick, NJ 08903 associated with the proper placement of data onto the given architecture. We refer to this problem as the Multiprocessor Document Allocation Problem (MDAP), a derivative of the Mapping Problem originally described by Bokhari [Bok8 1]. We assume a clustered document database. A clustered approach is taken since an index file organization can introduce vast storage overhead (up to roughly 300% according to Haskin [Has8 1]) and a full-text or signature analysis technique results in lengthy search times. In this context, a proper solution to MDAP is any mapping of the documents onto the processors such that the average cluster diameter is kept to a minimum while still providing for an even document distribution across the nodes. To achieve a significant reduction in the total query processing time using parallelism, the allocation of data among the processors should be distributed as evenly as possible and the interprocessor communication among the nodes should be minimized. Achieving such an allocation is NP Complete. Thus, it is necessary to use heuristics to obtain satisfactory mappings, which may indeed be suboptimal. Genetic Algorithms [DeJ89, G0189, Gre85, Gre87, H0187, Rag87] approximate optimal solutions to computationally intractable problems. We develop a genetic algorithm for MDAP and examine the effects of varying the communication cost matrix representing the interprocessor communication topology and the uniformity of the distribution of documents to the clusters. 1.1 Mapping Problem Approximations As the Mapping Problem and some of its derivatives are NP complete, heuristic algorithms are commonly employed to approximate the optimal solutions. Some of these approaches [Bok81, B0188, Lee87] deal, in some manner, This work was partially supported by grants from DCS, Inc. under contract number 5-35071 and the Center for Innovative Technology under contract number 5-34042.

...read moreread less

40 citations

Proceedings Article•DOI•

Image-Mediated Learning for Zero-Shot Cross-Lingual Document Retrieval

[...]

Ruka Funaki¹, Hideki Nakayama¹•Institutions (1)

University of Tokyo¹

01 Sep 2015

TL;DR: This work uses the images in image-text documents of each language as the hub and derives a common semantic subspace bridging two languages by means of generalized canonical correlation analysis, which substantially enhances retrieval accuracy in zero-shot and few-shot scenarios where text-to-text examples are scarce.

...read moreread less

Abstract: We propose an image-mediated learning approach for cross-lingual document retrieval where no or only a few parallel corpora are available. Using the images in image-text documents of each language as the hub, we derive a common semantic subspace bridging two languages by means of generalized canonical correlation analysis. For the purpose of evaluation, we create and release a new document dataset consisting of three types of data (English text, Japanese text, and images). Our approach substantially enhances retrieval accuracy in zero-shot and few-shot scenarios where text-to-text examples are scarce.

...read moreread less

40 citations

Proceedings Article•DOI•

Visual information retrieval with the SuperTable + Scatterplot

[...]

Peter Klein¹, Frank Müller¹, Harald Reiterer¹, Maximilian Eibl•Institutions (1)

University of Konstanz¹

07 Nov 2002

TL;DR: A new visualization approach for metadata combining different visualizations into a so-called SuperTable accompanied by a Scatterplot, solving the problem which seemed to be immanent to visualization's in document retrieval: the change of modalities.

...read moreread less

Abstract: We present a new visualization approach for metadata combining different visualizations into a so-called SuperTable accompanied by a Scatterplot. The goal is to improve user experience during the information seeking process. Our new visualizations are based on our experiences developing a visual information retrieval system called INSYDER to supply small and medium size enterprises with business information front the Internet. Based on extensive user tests the original visualizations have been redesigned in two different design variants. Instead of offering multiple visualizations to choose front the SuperTable + Scatterplot combines them in a new way. Therefore, the user has the feeling that he is working with one single visualization in different states. Further the SuperTable solves a problem which seemed to be immanent to visualization's in document retrieval: the change of modalities.

...read moreread less

40 citations

Collapse

Network Information

Performance

Metrics

6,870

Papers

224,615

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics