Showing papers on "Document retrieval published in 1988"

PDF

Open Access

Journal Article•DOI•

Recent trends in hierarchic document clustering: a critical review

[...]

Peter Willett¹•Institutions (1)

01 Aug 1988-Information Processing and Management

TL;DR: Algorithms that can be used to allow the implementation of hierarchic agglomerative clustering methods for document retrieval, and experimental evidence suggests that nearest neighbor clusters provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.

...read moreread less

Abstract: This article reviews recent research into the use of hierarchic agglomerative clustering methods for document retrieval. After an introduction to the calculation of interdocument similarities and to clustering methods that are appropriate for document clustering, the article discusses algorithms that can be used to allow the implementation of these methods on databases of nontrivial size. The validation of document hierarchies is described using tests based on the theory of random graphs and on empirical characteristics of document collections that are to be clustered. A range of search strategies is available for retrieval from document hierarchies and the results are presented of a series of research projects that have used these strategies to search the clusters resulting from several different types of hierarchic agglomerative clustering method. It is suggested that the complete linkage method is probably the most effective method in terms of retrieval performance; however, it is also difficult to implement in an efficient manner. Other applications of document clustering techniques are discussed briefly; experimental evidence suggests that nearest neighbor clusters, possibly represented as a network model, provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.

...read moreread less

842 citations

Book•

Using probabilistic models of document retrieval without relevance information

[...]

W. B. Croft, David J. Harper

01 Dec 1988

TL;DR: This paper considers the situation where no relevance information is available, that is, at the start of the search, based on a probabilistic model, and proposes strategies for the initial search and an intermediate search.

...read moreread less

Abstract: Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation where no relevance information is available, that is, at the start of the search. Based on a probabilistic model, strategies are proposed for the initial search and an intermediate search. Retrieval experiments with the Cranfield collection of 1,400 documents show that this initial search strategy is better than conventional search strategies both in terms of retrieval effectiveness and in terms of the number of queries that retrieve relevant documents. The intermediate search is shown to be a useful substitute for a relevance feedback search. Experiments with queries that do not retrieve relevant documents at high rank positions indicate that a cluster search would be an effective alternative strategy.

...read moreread less

453 citations

Journal Article•DOI•

Searching for information in a hypertext medical handbook

[...]

E. Frisse Mark¹•Institutions (1)

Washington University in St. Louis¹

01 Jul 1988-Communications of The ACM

TL;DR: Implementing a popular medical handbook in hypertext underscores the need to study hypertext in the context of full-text document retrieval, machine learning, and user interface issues.

...read moreread less

Abstract: Medicine is an ideal domain for hypertext applications and research. Implementing a popular medical handbook in hypertext underscores the need to study hypertext in the context of full-text document retrieval, machine learning, and user interface issues.

...read moreread less

395 citations

Journal Article•DOI•

Probabilistic and genetic algorithms in document retrieval

[...]

M. Gordon

01 Oct 1988-Communications of The ACM

TL;DR: Competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.

...read moreread less

Abstract: Document retrieval systems are built to provide inquirers with computerized access to relevant documents. Such systems often miss many relevant documents while falsely identifying many non-relevant documents. Here, competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.

...read moreread less

252 citations

Book•

Document Retrieval Systems

[...]

Peter Willett

01 Jan 1988

80 citations

Journal Article•DOI•

A document retrieval system based on nearest neighbour searching

[...]

Dario Lucarella¹•Institutions (1)

University of Milan¹

03 Jan 1988-Journal of Information Science

TL;DR: A document retneval system is presented, based upon the vector processing model, which employs an automatic indexing procedure with a weighting scheme to reflect term importance and an emphasis on nearest neighbour searching to locate documents closest to a given query.

...read moreread less

Abstract: Document filing and retrieval systems can be designed using advanced techniques resulting from recent research in information retneval.In this paper, a document retneval system is presented, based upon the vector processing model. The system employs an automatic indexing procedure with a weighting scheme to reflect term importance. Documents are stored using an in verted file organization. Natural language quenes are sup ported with a retrieval strategy based on best match techniques and relevance feedback.The emphasis is on nearest neighbour searching to locate documents closest to a given query. That means, after having defined a sirrularitv function, the identification of those docu ments in the collection which exhibit a higher degree of re semblance to the query.The problem is introduced with reference to a straightfor ward search procedure that returns the nearest neighbour set manipulating the inverted file entnes. Then. an improved al gorithm is presented which optimizes both the number of documen...

...read moreread less

71 citations

Journal Article•DOI•

Dissertation abstract

[...]

R. C. Groman

01 Sep 1988

TL;DR: A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system, and a novel term-similarity function could be defined.

...read moreread less

Abstract: Information retrieval can be defined as the extraction of specific information out of a great number of stored information items. Information retrieval systems, used for the retrieval of documents, try to answer more or less precise questions about interesting topics with a number of suitable documents or references to documents. Such systems should contain 'knowledge' about the meaning of questions, about the content of the stored information and the particular user's needs for information.Knowledge-bases systems claim to be able to store knowledge an draw conclusions from it. The goal of this thesis is to investigate the use of knowledge-based methods and technologies for information retrieval. A knowledge-based information retrieval system should represent its Information Structures, as well as knowledge in a common knowledge representation formalism. The retrieval process of the system should employ the inferential methods of the used knowledge representation formalism.A subset of first order logic is chosen for this thesis to represent knowledge. Specially designed retrieval rules represent knowledge for the purpose of retrieval. Retrieval rules capture knowledge about the user's vocabulary, his working domain and his way to perform the retrieval of documents. The problem of recall and precision of the answers of an information retrieval system is approached by an explicit representation of control knowledge.A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system. In particular a novel term-similarity function could be defined. Properties like completeness and termination could be derived and boundaries for the amount of overhead of false control strategies could be investigated.The proposed model is implemented in a prototype of a knowledge-based information retrieval system, called KIR. KIR is a single-user system for personal document- and knowledge retrieval running on computer workstations. It is implemented using Prolog and Modula-2.

...read moreread less

68 citations

Proceedings Article•DOI•

Experiments on incorporating syntactic processing of user queries into a document retrieval strategy

[...]

Alan F. Smeaton¹, C. J. van Rijsbergen²•Institutions (2)

National Institute for Higher Education¹, University of Glasgow²

01 May 1988

TL;DR: Investigating whether linguistic processes can be used as part of a document retrieval strategy by predefining a level of syntactic analysis of user queries only, suggests that the approach of using linguistic processing in retrieval, is valid.

...read moreread less

Abstract: Traditional information has relied on the extensive use of statistical parameters in the implementation of retrieval strategies. This paper sets out to investigate whether linguistic processes can be used as part of a document retrieval strategy. This is done by predefining a level of syntactic analysis of user queries only, to be used as part of the retrieval process. A large series of experiments on an experimental test collection are reported which use a parser for noun phrases as part of the retrieval strategy. The results obtained from the experiments do yield improvements in the level of retrieval effectiveness and given the crude linguistic process used and the way it was used on queries and not on document texts, suggests that the approach of using linguistic processing in retrieval, is valid.

...read moreread less

65 citations

Proceedings Article•DOI•

Coefficients of combining concept classes in a collection

[...]

E. A. Fox¹, G. L. Nunn¹, W. C. Lee²•Institutions (2)

Radford University¹, Virginia Tech²

01 May 1988

TL;DR: Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type, and the role of links is shown to be especially beneficial.

...read moreread less

Abstract: This report considers combining information to improve retrieval. The vector space model has been extended so different classes of data are associated with distinct concept types and their respective subvectors. Two collections with multiple concept types are described, ISI-1460 and CACM-3204. Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type. After sampling and transformation of data, the coefficient of determination for the best model was .48 (.66) for ISI (CACM). Average precision for the two collections was 11% (31%) better for probabilistic feedback with all types versus with terms only. These findings may be of particular interest to designers of document retrieval or hypertext systems since the role of links is shown to be especially beneficial.

...read moreread less

51 citations

Book•

New Horizons in Information Retrieval

[...]

David Ellis

01 Dec 1988

TL;DR: The origins of information retrieval research: the first information retrieval system tests testing indexing systems Cranfield I, and research on relevance judgement relevance as a performance criterion the Cranfield tradition and the information retrieval model.

...read moreread less

Abstract: Part 1 Introduction - the origins of information retrieval research: the first information retrieval system tests testing indexing systems Cranfield I testing indexing devices - Cranfield II relevance judgement and retrieval system tests research on relevance judgement relevance as a performance criterion the Cranfield tradition and the information retrieval model Part 2 Statistical and probabilistic retrieval: automatic indexing, classification and searching - SMART document clustering probabilistic models of relevance and relevance feedback achievements and limitations of the statistical approach Part 3 Cognitive user modelling: information retrieval through man-machine dialogue - the THOMAS program anomalous states of knowledge ASK-based retrieval stereotype-based fiction retrieval - the GRUNDY program cognitive models for retrieval Part 4 Expert intermediary systems: expert intermediary systems - CONIT, CANSEARCH and PLEXUS distributed expert-based intermediary system MONSTRAT intelligent intermediary for information retrieval I3r COmposite document expert/extended/effective retrieval CODER expert systems for information retrieval Part 5 Associations, relations and hypertext: 'as we may think' - MEMEX database browsing and navigation - TINman the origins of hypertext - Xanadu and NLS/augment card-based hypertext systems - NoteCards and HyperCard transforming text to hypertext guide hypercatalog information retrieval by association potential and problems of hypertext

...read moreread less

44 citations

Journal Article•DOI•

An extended relational document retrieval model

[...]

David C. Blair¹•Institutions (1)

University of Michigan¹

01 May 1988-Information Processing and Management

TL;DR: This article will present a relational logical model to support a sophisticated document retrieval system in which flexible forms of inferential and associative searching can be performed and several problems of particular importance to document retrieval will be discussed.

...read moreread less

Abstract: Relational Data Base Management Systems offer a commercially available tool with which to build effective document retrieval systems. The full potential of the relational model for supporting the kind of ad hoc inquiry characteristic of document retrieval has only recently been explored. In addition, commercially available relational DBMS's also provide effective tools for managing document data bases by providing facilities for, inter alia , concurrency control, data migration and reorganization routines, authorization mechanisms, enforcement of integrity constraints, dynamic data definition, etc. This article will present a relational logical model to support a sophisticated document retrieval system in which flexible forms of inferential and associative searching can be performed. Examples of ad hoc inquiry will be presented in SQL. Several problems of particular importance to document retrieval will be discussed, including the importance of Conjunctive Normal Form in query formulation, unique aspects of document retrieval storage and processing overhead, and techniques for reducing the size of storage without severely impacting retrieval effectiveness.

...read moreread less

Journal Article•DOI•

Knowledge based document classification supporting integrated document handling

[...]

Helmut Eirund, Klaus Kreplin

01 Apr 1988-ACM Sigois Bulletin

TL;DR: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution that closes the gap between electronic document entry systems and processing of (semi-) structured document content.

...read moreread less

Abstract: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution In this system documents are described and classified by their semantic structure that provides access to abstract concepts contained in the document The derivation of the semantic structure of a document supports both an efficient retrieval by content and an intelligent mail filtering through document semantics A knowledge based classification system automatically generates the conceptual description of a document to be inserted into the system by means of content analysis, and associates the document to an appropriate predefined type The classification system closes the gap between electronic document entry systems and processing of (semi-) structured document content

...read moreread less

Proceedings Article•DOI•

Retrieving documents by plausible inference: a priliminary study

[...]

W. B. Croft¹, T. J. Lucia¹, P. R. Cohen¹•Institutions (1)

University of Massachusetts Amherst¹

01 May 1988

TL;DR: The approach to plausible inference for retrieval is explained and some preliminary experiments designed to test this approach using a spreading activation search to implement the plausible inference process show that significant effectiveness improvements are possible.

...read moreread less

Abstract: Choosing an appropriate document representation and search strategy for document retrieval has been largely guided by achieving good average performance instead of optimizing the results for each individual query. A model of retrieval based on plausible inference gives us a different perspective and suggests that techniques should be found for combining multiple sources of evidence (or search strategies) into an overall assessment of a document's relevance, rather than attempting to pick a single strategy. In this paper, we explain our approach to plausible inference for retrieval and describe some preliminary experiments designed to test this approach. The experiments use a spreading activation search to implement the plausible inference process. The results show that significant effectiveness improvements are possible using this approach.

...read moreread less

Journal Article•DOI•

Fostering creativity: Enhancing the browsing environment

[...]

Brian C. O'Connor¹•Institutions (1)

University of California, Los Angeles¹

01 Sep 1988-International Journal of Information Management

TL;DR: The role of the bibliographical agency in the browsing situation is changed from one which derives concepts to one which maximizes a user's physical inspection capabilities.

...read moreread less

Journal Article•DOI•

How to use controlled vocabularies more effectively in online searching

[...]

Marcia J. Bates

01 Nov 1988-Online

TL;DR: The authors present des different types of vocabulaire controles: vocabulaires post-controles, vedettes-matieres et descripteurs, codes de categorie, classifications hierarchiques and a facettes.

...read moreread less

Abstract: Presentation des differents types de vocabulaires controles: vocabulaires post-controles, vedettes-matieres et descripteurs, codes de categorie, classifications hierarchiques et a facettes. Discussion sur les meilleures facons d'utiliser ces differents types de vocabulaire dans la recherche documentaire en ligne

...read moreread less

Journal Article•DOI•

Integrating Boolean queries in conjunctive normal form with probabilistic retrieval models

[...]

Robert M. Losee¹, Abraham Bookstein²•Institutions (2)

University of North Carolina at Chapel Hill¹, University of Chicago²

01 May 1988-Information Processing and Management

TL;DR: Experimental results compare the performance of a sequential learning probabilistic retrieval model with both the proposed integrated Boolean-probabilistic model and with a fuzzy-set model.

...read moreread less

Abstract: Most commercial document retrieval systems require queries to be valid Boolean expressions that may be used to split the set of available documents into a subset consisting of documents to be retrieved and a subset of documents not to be retrieved. Research has suggested that the ranking of documents and use of relevance feedback may significantly improve retrieval performance. We suggest that by placing Boolean database queries into Conjunctive Normal Form, a conjunction of disjunctions, and by making the assumption that the disjunctions represent a hyperfeature, documents to be retrieved can be probabilistically ranked and relevance feedback incorporated, improving retrieval performance. Experimental results compare the performance of a sequential learning probabilistic retrieval model with both the proposed integrated Boolean-probabilistic model and with a fuzzy-set model.

...read moreread less

Journal Article•DOI•

A total relevance and document interaction effects model for the evaluation of information retrieval processes

[...]

Mutawakilu A. Tiamiyu¹, Isola Ajiferuke¹•Institutions (1)

University of Western Ontario¹

02 May 1988-Information Processing and Management

TL;DR: It is proposed that an appropriate metric for gauging the performances of information retrieval systems is a measure of the (relative) total relevance that a user can obtain from a set of documents sequentially scanned and evaluated in an information retrieval environment.

...read moreread less

Abstract: The article presents a model based on the notion of the total relevance of a set of documents. The concept of a total relevance function is subsequently derived from the notion of cumulated relevance implied in the traditional summation of relevance ratings over the documents in a collection or in retrieved sets of documents. The model is intended to make explicit the perceptual underpinnings of relevance assessments while allowing for the consideration of interdocument dependencies as perceived by the user. Within this framework, it is proposed that an appropriate metric for gauging the performances of information retrieval systems is a measure of the (relative) total relevance that a user can obtain from a set of documents sequentially scanned and evaluated in an information retrieval environment. Some implications of the model are noted.

...read moreread less

Journal Article•DOI•

INSTRUCT: a teaching package for experimental methods in information retrieval. Part 111. Browsing, clustering and query

[...]

Stephe J. Wade¹, Peter Willett¹•Institutions (1)

University of Sheffield¹

03 Jan 1988

TL;DR: INSTRUCT as discussed by the authors is a multi-user text retrieval system which was developed as an interactive teaching package for demonstrating modern information retrieval techniques, these including natural language query processing, best match searching and automatic relevance feedback based on probabilistic term weighting.

...read moreread less

Abstract: INSTRUCT is a multi‐user, text retrieval system which was developed as an interactive teaching package for demonstrating modern information retrieval techniques, these including natural language query processing, best match searching and automatic relevance feedback based on probabilistic term weighting INSTRUCT has recently been extended and now additionally has facilities for query expansion using both relevance and term co‐occurrence data, for cluster‐based searching and for two browsing search strategies These retrieval mechanisms are used to search a file of 26,280 titles and abstracts from the Library and Information Science Abstracts database; both menu‐based and command‐based searching are allowed

...read moreread less

Journal Article•DOI•

Selection devices for users of an electronic encyclopedia: an empirical comparison of four possibilities

[...]

Daniel Ostroff¹, Ben Shneiderman¹•Institutions (1)

University of Maryland, College Park¹

01 Nov 1988-Information Processing and Management

TL;DR: The touch screen was found to be the fastest in time, the least accurate but the overall favorite of the participants.

...read moreread less

Abstract: This study measured the speed, error rates, and subjective evaluation of arrow-jump keys, a jump-mouse, number keys, and a touch screen in an interactive encyclopedia. A summary of previous studies comparing selection devices and strategies is presented to provide the background for this study. We found the touch screen to be the fastest in time, the least accurate but the overall favorite of the participants. The results are discussed and improvements are suggested accordingly.

...read moreread less

Journal Article•DOI•

Trends in research on information retrieval—the potential for improvements in conventional Boolean retrieval systems

[...]

Tadeusz Radecki¹•Institutions (1)

University of Nebraska–Lincoln¹

01 May 1988-Information Processing and Management

TL;DR: This article reviews recent advances in information retrieval research and examines their practical potential for overcoming deficiencies, although earlier results published elsewhere have been considered.

...read moreread less

Abstract: Operational retrieval systems are firmly embedded within the pure Boolean framework, and the theoretical model underlying these systems is based on the implicit assumption that documents and user information needs can be precisely and completely characterized by sets of index terms and Boolean search request formulations, respectively. However, this assumption must be considered grossly inaccurate since uncertainty is intrinsic to the document retrieval process. The inability of the standard Boolean model to deal effectively with the inherent fallibility of retrieval decisions is the main reason for a number of serious deficiencies exhibited by present-day operational retrieval systems. This article reviews recent advances in information retrieval research and examines their practical potential for overcoming these deficiencies. The primary source for this review is the subsequent articles that comprise this special issue of Information Processing & Management, although earlier results published elsewhere have also been considered.

...read moreread less

Book•

Testing of a natural language retrieval system for a full text knowledge base

[...]

Lionel M. Bernstein, Robert E. Williamson

01 Dec 1988

TL;DR: A Navigator of Natural Language Organized Data (ANNOD) as discussed by the authors is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users.

...read moreread less

Abstract: “A Navigator of Natural Language Organized Data” (ANNOD) is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users. ANNOD includes common word deletion, word root isolation, query expansion by a thesaurus, and application of a complex empirical matching (ranking) algorithm. The Hepatitis Knowledge Base, the text of a prototype information system, was the file used for testing ANNOD. Responses to a series of users' unrestricted natural language queries were evaluated by three testers. Information needed to answer 85 to 95‰ of the queries was located and displayed in the first few selected paragraphs. It was successful in locating information in both the classified (listed in Table of Contents) and unclassified portions of text. Development of this retrieval system resulted from the complementarity of and interaction between computer science and medical domain expert knowledge. Extension of these techniques to larger knowledge bases is needed to clarify their proper role.

...read moreread less

Journal Article•DOI•

Full text retrieval based on syntactic similarities

[...]

Bernd Teufel¹, Stephanie Schmidt¹•Institutions (1)

University of Zurich¹

03 Jan 1988-Information Systems

TL;DR: That the presented theory is useful for the retrieval of information in natural language information systems, is shown by the results of the prototype TRIGIR based on trigrams.

...read moreread less

Journal Article•DOI•

OAKDEC, a program for studying the effects on users of a procedural expert system for database searching

[...]

Charles T. Meadow¹•Institutions (1)

University of Toronto¹

02 May 1988-Information Processing and Management

TL;DR: OAKDEC is a program that uses expert system techniques to assess the status of a database search done through the intermediary program, OAK, and to provide a recommendation on how to proceed, based on decision-making logic in the original OAK system.

...read moreread less

Abstract: OAKDEC is a program that uses expert system techniques to assess the status of a database search done through the intermediary program, OAK, and to provide a recommendation to the user on how to proceed Based on decision-making logic in the original OAK system, OAKDEC works at a far greater degree of detail, or finer grain size, in resolving the situation and in making the decision on the next step to be recommended OAKDEC is intended as a research tool for studying user behavior and, in particular, for studying the effect of decision detail on user behavior and search outcome

...read moreread less

Proceedings Article•DOI•

On the nature and fuction of explanation in intelligent information retrieval

[...]

Nick Belkin¹•Institutions (1)

Rutgers University¹

01 May 1988

TL;DR: A general model of clarity in human-computer systems, of which explanation is one component, is proposed, and a model for explanation by the computer intermediary in information retrieval is proposed.

...read moreread less

Abstract: We discuss the complexity of explanation activity in human-human goal-directed dialogue, and suggest that this complexity ought to be taken account of in the design of explanation in human-computer interaction. We propose a general model of clarity in human-computer systems, of which explanation is one component. On the bases of: this model; of a model of human-intermediary interaction in the document retrieval situation as one of cooperative model-building for the purpose of developing an appropriate search formulation; and, on the results of empirical observation of human user-human intermediary interaction in information systems, we propose a model for explanation by the computer intermediary in information retrieval.

...read moreread less

Journal Article•DOI•

An improved algorithm for the calculation of exact term discrimination values

[...]

Abdelmoula El-Hamdouchi¹, Peter Willett¹•Institutions (1)

University of Sheffield¹

03 Jan 1988-Information Processing and Management

TL;DR: An efficient algorithm for the calculation of term discrimination values that may be used when the interdocument similarity measure used is the cosine coefficient and when the document representatives have been weighted using one particular term-weighting scheme is described.

...read moreread less

Abstract: The term discrimination model provides a means of evaluating indexing terms in automatic document retrieval systems. This article describes an efficient algorithm for the calculation of term discrimination values that may be used when the interdocument similarity measure used is the cosine coefficient and when the document representatives have been weighted using one particular term-weighting scheme. The algorithm has an expected running time proportional to Nn2 for a collection of N documents, each of which has been assigned an average of n terms.

...read moreread less

Journal Article•

Applications of artificial intelligence (AI) and expert systems for online searching

[...]

Donald T. Hawkins

03 Jan 1988-Online

Book•

Text retrieval and document databases

[...]

John H. Ashford, Peter Willett

01 Jan 1988

Journal Article•DOI•

Probabilistic design principles for conventional and full-text retrieval systems

[...]

M. E. Maron¹•Institutions (1)

University of California, Berkeley¹

01 May 1988-Information Processing and Management

TL;DR: It turns out that a front end designed to permit searchers to attach probabilistically interpreted weights to their query terms could be adapted for conventional IR systems, and such an enhancement could lead to improved performance.

...read moreread less

Abstract: In order for conventionally designed commercial document retrieval systems to perform perfectly, the following two (logical) conditions must be satisfied for every search: (1) There exists a document property (or combination of properties) that belongs to those (and only those) documents that are relevant. (2) That property (or combination of properties) can be correctly guessed by the searcher. In general, the first assumption is false, and the second is impossible to satisfy; hence no conventional IR system can perform at a maximum level of effectiveness. (We are painfully aware of the current poor performance values for Recall and Precision. Furthermore, Recall deteriorates rapidly as document corpora continue to grow in size.) However, different design principles can lead to improved performance. This article presents a view of the document retrieval problem that shows that since the relationship between document properties (whether they be humanly assigned index terms or words that occur in the running text) and relevance is at best probabilistic, one should approach the design problem using probabilistic principles. It turns out that a front end designed to permit searchers to attach probabilistically interpreted weights to their query terms could be adapted for conventional IR systems. Such an enhancement could lead to improved performance.

...read moreread less

Journal Article•

Compact disk databases: are they good for users?

[...]

Linda Stewart, Jan Olsen

01 May 1988-Online

TL;DR: Efficacite comparee de la recherche documentaire dans ERIC, version CD-ROM et version imprimee islamique comparee.

...read moreread less

Abstract: Efficacite comparee de la recherche documentaire dans ERIC, version CD-ROM et version imprimee

...read moreread less

Book•

A review of the use of inverted files for best match searching in information retrieval systems

[...]

Shirley A. Perry, Peter Willett

01 Dec 1988

TL;DR: In this article, the use of inverted files for the calculation of similarity coefficients and other types of matching function is discussed in the context of mechanised document retrieval systems and a critical evaluation is presented of a range of algorithms which have been described for the matching of documents with queries.

...read moreread less

Abstract: The use of inverted files for the calculation of similarity coefficients and other types of matching function is discussed in the context of mechanised document retrieval systems. A critical evaluation is presented of a range of algorithms which have been described for the matching of documents with queries. Particular attention is paid to the computational efficiency of the various procedures, and improved search heuristics are given in some cases. It is suggested that the algorithms could be implemented sufficiently efficiently to permit the provision of nearest neighbour searching as a standard retrieval option.

...read moreread less