scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Documentation in 1979"


Journal ArticleDOI
TL;DR: In this paper, the authors consider the situation where no relevance information is available, that is, at the start of the search, and propose strategies based on a probabilistic model for the initial search and an intermediate search.
Abstract: Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation where no relevance information is available, that is, at the start of the search. Based on a probabilistic model, strategies are proposed for the initial search and an intermediate search. Retrieval experiments with the Cranfield collection of 1,400 documents show that this initial search strategy is better than conventional search strategies both in terms of retrieval effectiveness and in terms of the number of queries that retrieve relevant documents. The intermediate search is shown to be a useful substitute for a relevance feedback search. Experiments with queries that do not retrieve relevant documents at high rank positions indicate that a cluster search would be an effective alternative strategy.

399 citations


Journal ArticleDOI
Gerard Salton1
TL;DR: The main mathematical approaches to information retrieval are examined in this study, including both algebraic and probabilistic models, and the difficulties which impede the formalization of information retrieval processes are described.
Abstract: The development of a given discipline in science and technology often depends on the availability of theories capable of describing the processes which control the field and of modelling the interactions between these processes. The absence of an accepted theory of information retrieval has been blamed for the relative disorder and the lack of technical advances in the area. The main mathematical approaches to information retrieval are examined in this study, including both algebraic and probabilistic models, and the difficulties which impede the formalization of information retrieval processes are described. A number of developments are covered where new theoretical understandings have directly led to the improvement of retrieval techniques and operations.

90 citations


Journal ArticleDOI
TL;DR: It is shown that citation data conform well to the Brookes model, but the chief findings regard the nature of the aging process and its apparent range within scientific literatures.
Abstract: Scientific writings age; individual documents, issues or volumes of scientific journals are, eventually, less valued and less used with the passage of time Long periods of time, say more than several decades, render portions of the literature obsolete, and ‘aging’ is evident However, controversy has developed recently about quantitative models, particularly Brookes, which proposes a systematic exponential aging process for the corpus of library periodical holdings In disagreement with these models, Sandison presents use patterns showing no aging; and Line points to methodological difficulties in demonstrating aging Both the models, and the questions raised regarding their validity are of considerable interest and importance to our understanding of the nature of scientific information and the management of collections We show, here, that citation data conform well to the Brookes model, but the chief findings regard the nature of the aging process and its apparent range within scientific literatures A scientific journal which is used as an archive ages slowly; one which supports a research front ages quickly Aging depends not merely on the material itself, but its user, and a single journal may be aged very differently by different user communities Lastly, aging rates vary among journals, and it is relatively easy to identify journals which age at about the rate at which the literature grows and journals which appear to exhaust most of their utility within a few years

68 citations


Journal ArticleDOI
TL;DR: Experiments with the Cranfield test collection show that trigram encoding of words performs noticeably better than the use of digrams; however, use of the least frequent digram in each term produces more acceptable results.
Abstract: This paper describes the use of fixed‐length character strings for controlling the size of indexing vocabularies in reference retrieval systems. Experiments with the Cranfield test collection show that trigram encoding of words performs noticeably better than the use of digrams; however, use of the least frequent digram in each term produces more acceptable results. Hashing of terms gives a better performance than that obtained from a vocabulary of comparable size produced by right‐hand truncation. The application of small indexing vocabularies to the sequential searching of large document files is discussed.

54 citations


Journal ArticleDOI
Maurice B. Line1
TL;DR: In this article, the social science citation analyses carried out as part of the DISISS programme, references were collected from 140 journals, including forty-seven drawn at random from a comprehensive list, and also from 148 monographs.
Abstract: Most citation analyses are based on references taken from two or three source journals. There are good theoretical reasons for believing that these may not be representative of all references. In the social science citation analyses carried out as part of the DISISS programme, references were collected from 140 journals, including forty‐seven drawn at random from a comprehensive list, and also from 148 monographs. Analyses of references drawn from high ranking and randomly selected journals showed differences in date distribution, forms of material cited and rank order of journals cited. Analyses of references drawn from journals and monographs showed differences, some of them large, in date distributions, forms of material cited, subject self‐citation and citations beyond the social sciences, and countries of publication cited. These differences may be peculiar to the social sciences, but any citation analyses that are based on only a limited number and type of sources without specific justification must be regarded with suspicion.

50 citations



Journal ArticleDOI
TL;DR: A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections.
Abstract: A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable‐length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containing between 120 and 720 members were tested both using an application of the Cluster Hypothesis and in a series of linear associative retrieval experiments. The effectiveness of the smaller sets is low but the larger ones exhibit retrieval characteristics comparable to those of words.

25 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to summarize some of the research which has already been carried out and which is of relevance to present‐day problems, and to suggest where further research is most needed.
Abstract: In the past, legibility research has been mainly concerned with the conventionally typeset and printed word. ‘Printed’ materials are now produced by a variety of other methods, however, and other media such as microforms and cathode ray tubes (CRTs) are commonly used for information display. The effects of these new methods and media on legibility are often given scant consideration, but because of their visual limitations, it is all the more important that the legibility and ease of use of the information should be taken into account. The scope of legibility research must therefore be extended to cover the products of modern information technology. The aim of this paper is to summarize some of the research which has already been carried out and which is of relevance to present‐day problems, and to suggest where further research is most needed.

18 citations


Journal ArticleDOI
TL;DR: A survey of current work on database systems is presented and there is a tutorial component and evaluation, which in both cases is related to the application of database ideas to documentation.
Abstract: A survey of current work on database systems is presented. The area is divided into three main sectors: data models, data languages and support for database operations. Data models are presented as the link between the database and the real world. Languages range from formal algebraic languages to attempts to use a dialogue in English to formulate queries. The support includes hardware for content addressing, database machines and software techniques for optimizing and evaluating group expressions. Mathematical models are used to organize this support. Throughout there is a tutorial component and evaluation, which in both cases is related to the application of database ideas to documentation.

9 citations


Journal ArticleDOI
TL;DR: The characteristics, problems, achievements, and achievements particular to the documentation and handling of non‐book materials (NBM) in many types of libraries are drawn attention.
Abstract: In addition to providing a review of the literature recently published in the librarianship of non‐book materials this survey aims to draw attention to the characteristics, problems and achievements particular to the documentation and handling of non‐book materials (NBM) in many types of libraries. The materials are briefly described and considerations of selection, acquisition, organization, storage and in particular bibliographic control are dealt with in some detail. Other areas of concern to the librarian dealing with media resources, including the organization and training of staff, planning, equipment, exploitation and copyright, are also discussed. The past decade has seen the widespread introduction of NBM into libraries as additional or alternative sources of information. Librarians have been given an opportunity to rethink many basic principles and adapt existing practice to encompass the new materials. The survey reflects the achievements and some of the failures or problems remaining to be solved in this rapidly expanding area of library work.

4 citations


Journal ArticleDOI
TL;DR: A thesaurus of terms used in the British Library PRECIS indexes is needed, and some consideration should be given to the possible simplification of PRECis or modification to suit the needs of different users.
Abstract: The background to the Liverpool Polytechnic study of indexer reactions to the PRECIS indexing system and the methodology of the study are described. Some of the findings are discussed, special attention being given to points which some indexers regarded as advantages and others as disadvantages; the alleged labour‐intensiveness of PRECIS; the British Library and PRECIS; and the impact of PRECIS on the British library community. A thesaurus of terms used in the British Library PRECIS indexes is needed, and some consideration should be given to the possible simplification of PRECIS or modification to suit the needs of different users. Feedback from users of PRECIS indexes is required.


Journal ArticleDOI
A. Sandison1
TL;DR: The use per issue‐day of recent issues of 125 periodicals on open access at the Science Reference Library has been studied and ‘magazines’ were more heavily used than ‘journals’ and issues in the English language more heavily than those in foreign languages.
Abstract: The use per issue‐day of recent issues of 125 periodicals on open access at the Science Reference Library (SRL) has been studied for over four months. There is statistical evidence for the expected separate phase within ‘updating’ of ‘current awareness scanning’ of issues which have arrived since the reader's last visit to the library: its half‐life of two to three months reflects the pattern of reader‐visit frequencies rather than any characteristics of the literature. The isolated uses of issues of periodicals, representing a part of ‘following up searches’, showed no significant relation between use and age. The irregular monographic and conference series, with their unpredictable dates of arrival at the shelf, were not subject to current awareness scanning. Of the regular periodicals in the two chemical technologies, ‘magazines’ were more heavily used than ‘journals’ and, as expected, issues in the English language more heavily than those in foreign languages: these differences were less marked in che...

Journal ArticleDOI
TL;DR: Many bibliometric investigations study changes in different variables as a function of time, where the unit of time is the year of publication.
Abstract: Many bibliometric investigations study changes in different variables as a function of time, where the unit of time is the year of publication.