Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

A Survey of Concept-based Information Retrieval Tools on the Web

[...]

Hele-Mai Haav, Tanel-Lauri Lubi

01 Jan 2001

TL;DR: An overview of conceptbased information retrieval techniques and software tools currently available as prototypes or commercial products using feature classification, which incorporates general characteristics of tools and their information retrieval features.

...read moreread less

Abstract: . In order to solve the problem of information overkill on the web current information retrieval tools need to be improved. Much more "intelligence" should be embedded to search tools to manage effectively search, retrieval, filtering and presenting relevant information. This can be done by concept-based (or ontology driven) information retrieval, which is considered as one of the high-impact technologies for the next ten years. Nevertheless, most of commercial products of search and retrieval category do not report about concept-based search features. The paper provides an overview of conceptbased information retrieval techniques and software tools currently available as prototypes or commercial products. Tools are evaluated using feature classification, which incorporates general characteristics of tools and their information retrieval features.

...read moreread less

109 citations

Journal Article•DOI•

The relevance of recall and precision in user evaluation

[...]

Louise T. Su¹•Institutions (1)

University of Pittsburgh¹

01 Apr 1994-Journal of the Association for Information Science and Technology

TL;DR: Analysis of users' verbal data shows that high precision does not always mean high quality to users because of different users' expectations, and four related measures of recall and precision are found to be significantly correlated with success.

...read moreread less

Abstract: The appropriateness of evaluation criteria and measures have been a subject of debate and a vital concern in the information retrieval evaluation literature. A study was conducted to investigate the appropriateness of 20 measures for evaluating interactive information retrieval performance, representing four major evaluation criteria. Among the 20 measures studied were the two most well-known relevance-based measures of effectiveness, recall and precision. The user's judgment of information retrieval success was used as the devised criterion measure with which all other 20 measures were to be correlated. A sample of 40 end-users with individual information problems from an academic environment were observed, interacting with six professional intermediaries searching on their behalf in large operational systems. Quantitative data consisting of values for all measures studied and verbal data containing users' reasons for assigning certain values to selected measures were collected. Statistical analysis of the quantitative data showed that precision, one of the most important traditional measures of effectiveness, is not significantly correlated with the user's judgment of success. Users appear to be more concerned with absolute recall than with precision, although absolute recall was not directly tested in the study. Four related measures of recall and precision are found to be significantly correlated with success. Among these are user's satisfaction with completeness of search results and user's satisfaction with precision of the search. This article explores the possible explanations for this outcome through content analysis of users' verbal data. The analysis shows that high precision does not always mean high quality (relevancy, completeness, etc.) to users because of different users' expectations. The user's purpose in obtaining information is suggested to be the primary cause for the high concern for recall. Implications for research and practice are discussed. © 1994 John Wiley & Sons, Inc.

...read moreread less

109 citations

Proceedings Article•

A Statistical Analysis of the TREC-3 Data

[...]

Jean Tague-Sutcliffe, James Blustein

01 Apr 1995

TL;DR: A statistical analysis of the TREC-3 data shows that performance differences across queries is greater thanperformance differences across participants runs.

...read moreread less

Abstract: A statistical analysis of the TREC-3 data shows that performance differences across queries is greater than performance differences across participants runs. Generally, groups of runs which do not differ significantly at lerge, sometimes accounting for over half the runs. Correlation among the various performance measures is high.

...read moreread less

108 citations

Posted Content•

Modeling Documents with Deep Boltzmann Machines

[...]

Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey E. Hinton

26 Sep 2013-arXiv: Learning

TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

108 citations

Journal Article•DOI•

A fuzzy model of document retrieval systems

[...]

Valiollah Tahani

01 Jan 1976-Information Processing and Management

TL;DR: To deal with the organization problems of data in this conceptual model, the conventional concept of a list is extended to a fuzzy list and the notion of an inverted file structure can be extended to the fuzzy data in the retrieval model.

...read moreread less

Abstract: This paper is concerned with the organization and retrieval of records in document retrieval systems which admit of imprecision in the form of fuzziness in document characterization and retrieval rules. A mathematical model for such systems, based on the theory of fuzzy sets, is introduced. A document retrieval system, as defined in this paper, is a quadruple (X, D, Q, γ), where X is a collection of the document descriptions (also referred to as index records, or records); D is the descriptor set; Q is a query set; γ: QxX → [0, 1], (called the matching function) assigns to each pair (q, x) where q ϵ Q and x ϵ X, a number γ(q, x) in the interval [0, 1], called the matching index for the query q and the document description x. In our system model, each document description x is defined as a fuzzy set in the descriptor set D. As a fuzzy subset of D, each x is characterized by a membership function μx: D → [0, 1], where μx(d), representing the grade of membership of d in x, is referred to as the index weight of the descriptor d for the document representation x. The retrieval response of the system is defined in terms of the matching function γ. More specifically, given a query q, the index record retrieval response, f(q), is defined to be a fuzzy set in X whose membership function is given by μ ƒ(q) (x) = γ(q, x) . To deal with the organization problems of data in our conceptual model, the conventional concept of a list is extended to a fuzzy list. Specifically, L(d), the fuzzy list corresponding to a descriptor d, is defined as a fuzzy set in the document description set X whose membership function is given by μ l (d) (x) = μ x , (d) . In this way, the notion of an inverted file structure can be extended to the fuzzy data in our retrieval model.

...read moreread less

108 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics