Topic
Document retrieval
About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.
Papers published on a yearly basis
Papers
More filters
07 Dec 2006
TL;DR: The new `wave of documents' is `threatening' to bring these two areas of databases and information retrieval closer to each other.
Abstract: Approximately three decades ago researchers realized that they would have to structure data to be able to store and access large amounts of data streams that were produced each day. As a result, database management systems were designed and developed, used to keep the data in one place and for finding relevant information in this data. On the other hand, a large amount of textual documents was
still stored and accessed in unstructured format. Retrieval of such textual documents, containing relevant information with respect to a user query, has been an
open research question studied in the information retrieval area for half a century.
Information retrieval studies resulted in numerous retrieval models and retrieval systems whose goal is to rank relevant documents according to their estimated rel-
evance to a user query. Although having similar goals research areas of databases and information retrieval developed mostly independently from each other. Recently, the new `wave of documents' is `threatening' to bring these two areas closer to each other.
38 citations
••
TL;DR: A structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of image containing complex objects, which allows a user to pose both queries by sketch and queries by example.
Abstract: We propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects.
As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures.
Using our logical approach as a formal specification, we implemented a complete client-server image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval.
38 citations
••
07 Jul 2016TL;DR: The PFSDM and PFFDM are proposed, two novel models for entity retrieval from knowledge graphs, which infer the user's intent behind each individual query concept by dynamically estimating its projection onto the fields of structured entity representations based on a small number of statistical and linguistic features.
Abstract: Accurate projection of terms in free-text queries onto structured entity representations is one of the fundamental problems in entity retrieval from knowledge graphs. In this paper, we demonstrate that existing retrieval models for ad-hoc structured and unstructured document retrieval fall short of addressing this problem, due to their rigid assumptions. According to these assumptions, either all query concepts of the same type (unigrams and bigrams) are projected onto the fields of entity representations with identical weights or such projection is determined based only on one simple statistic, which makes it sensitive to data sparsity. To address this issue, we propose the Parametrized Fielded Sequential Dependence Model (PFSDM) and the Parametrized Fielded Full Dependence Model (PFFDM), two novel models for entity retrieval from knowledge graphs, which infer the user's intent behind each individual query concept by dynamically estimating its projection onto the fields of structured entity representations based on a small number of statistical and linguistic features. Experimental results obtained on several publicly available benchmarks indicate that PFSDM and PFFDM consistently outperform state-of-the-art retrieval models for the task of entity retrieval from knowledge graph.
38 citations
••
01 Jul 2004TL;DR: This work proposes a principled, datadriven Instance-Based approach to Question Answering that treats answer extraction as a binary classification problem in which text snippets are labeled as correct or incorrect answers.
Abstract: Anticipating the availability of large question-answer
datasets, we propose a principled, data-driven
Instance-Based approach to Question Answering.
Most question answering systems incorporate
three major steps: classify questions according
to answer types, formulate queries for document
retrieval, and extract actual answers. Under our approach,
strategies for answering new questions are
directly learned from training data. We learn models
of answer type, query content, and answer extraction
from clusters of similar questions. We view
the answer type as a distribution, rather than a class
in an ontology. In addition to query expansion, we
learn general content features from training data and
use them to enhance the queries. Finally, we treat
answer extraction as a binary classification problem
in which text snippets are labeled as correct or incorrect
answers. We present a basic implementation
of these concepts that achieves a good performance
on TREC test data.
38 citations
••
20 Oct 2004
TL;DR: CPSRS is a Web-based application, which provides a familiar and efficient way to search and identify plant species in the field, built on the Java Web infrastructure to support platform-independent application.
Abstract: In this paper, a computerized plant species recognition system (CPSRS) is presented. CPSRS is a Web-based application, which provides a familiar and efficient way to search and identify plant species in the field. It is built on the Java Web infrastructure to support platform-independent application. The Java applets and servlets are adopted to balance the computing burden in both client and server. The architecture of CPSRS is introduced to show how it is designed and works. Two types of plant species retrieval methods, text-based information retrieval and content-based leaf retrieval are discussed. With the text-based information retrieval method, the exact information of plant species is retrieved from the database according to the input searching criteria. For the content-based leaf retrieval, experimental results show that a recall rate of about 71.4% can be achieved when top five returned images are considered.
38 citations