scispace - formally typeset
Search or ask a question
Author

Cyril W. Cleverdon

Other affiliations: Cranfield University
Bio: Cyril W. Cleverdon is an academic researcher from Vaughn College of Aeronautics and Technology. The author has contributed to research in topics: Index (economics) & Search engine indexing. The author has an hindex of 14, co-authored 28 publications receiving 1978 citations. Previous affiliations of Cyril W. Cleverdon include Cranfield University.

Papers
More filters
Journal ArticleDOI
01 Dec 1997
TL;DR: The authors investigated the effect of different devices on the performance of index languages and found that the most important consideration was the specificity of the index terms; within the context of the conditions existing in this test, singleword terms were more effective than concept terms or a controlled vocabulary.
Abstract: The investigation dealt with the effect which different devices have on the performance of index languages It appeared that the most important consideration was the specificity of the index terms; within the context of the conditions existing in this test, single‐word terms were more effective than concept terms or a controlled vocabulary

554 citations

01 Jan 1966
TL;DR: An essential requirement of the project involved cooperation of a large number of research scientists, and the response to the request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists.
Abstract: and is still engaged on the final stages of the project. In addition, some sixty-three other persons have worked part-time at some stage. To all these people, I have to express my appreciation for their efforts. An essential requirement of the project involved cooperation of a large number of research scientists. The response to our request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists, many of whom are known to me only by name. i As before, As lib administered the grant and also, on this occasion made accommodation available in their headquarters in London, and I am grateful for the help given by the Director, Mr. Leslie Wilson, and many members of his staff. I would also express my appreciation to the Principal and Senate of the College of Aeronautics for agreeing to my taking part in this project while continuing my normal duties. Finally, there are many friends and colleagues with whom, during the past three years, I have had the opportunity of discussing the Aslib-Cranfield projects. Their comments and suggestions have always been helpful, and I am most grateful for the interest which they have shown.

195 citations

01 Jan 1966
TL;DR: The detailed analysis of the reasons for failure to retrieve relevant documents or for the retrieval of non-relevant documents was an important part of Cranfield II.
Abstract: Bedford SUMMARY The test results are presented for a number of different index languages using various devices which affect recall or precision. Within the environment of this test, it is shown that the best performance was obtained with the K group of eight index languages which used single terms. The group of fifteen index languages which were based on concepts gave the worst performance , while a group of six index languages based on the Thesaurus of Engineering Terms of the Engineers Joint Council were intermediary. Of the single term index languages, the only method of improving performance was to group synonyms and word forms, and any broader groupings of terms depressed performance. The use of precision devices such as links gave no advantage as compared to the basic device of simple coordination. All results have to be considered within the context of the experimental environment, but they can be said to substantiate or clarify many of the findings of Cranfield I. It is conclusively shown that an inverse relationship exists between recall and precision, whatever the variable may be that is being changed. The two factors which appear most likely to affect performance are the level of exhaustivity of indexing and the level of specificity of the terms in the index language. For any given operational situation, the optimum levels cannot be categorically stated in advance, but can only be determined by an evaluation of the system, the main consideration probably being the subject field. It would be unusual if the characteristics of the subject field used for this test were such as to make it unique, so the high performance obtained with the single terms in natural language can be considered to be of some importance in regard to the use of natural language text as input to mechanised systems. PREFACE It was intended that this should be the final volume of the Report on Cranfield II. This may still be the case, but as the results were being prepared for publication, we were continually aware of the gaps that needed to be filled. The delay in the appearance of this volume is partly due to attempts to obtain some of the missing data, but a great deal still remains to be done. The detailed analysis of the reasons for failure to retrieve relevant documents or for the retrieval of non-relevant documents was an important part of Cranfield …

112 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

9,460 citations

Journal ArticleDOI
03 Jun 1988-Science
TL;DR: For diagnostic systems used to distinguish between two classes of events, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy.
Abstract: Diagnostic systems of several kinds are used to distinguish between two classes of events, essentially "signals" and "noise". For them, analysis in terms of the "relative operating characteristic" of signal detection theory provides a precise and valid measure of diagnostic accuracy. It is the only measure available that is uninfluenced by decision biases and prior probabilities, and it places the performances of diverse systems on a common, easily interpreted scale. Representative values of this measure are reported here for systems in medical imaging, materials testing, weather forecasting, information retrieval, polygraph lie detection, and aptitude testing. Though the measure itself is sound, the values obtained from tests of diagnostic systems often require qualification because the test data on which they are based are of unsure quality. A common set of problems in testing is faced in all fields. How well these problems are handled, or can be handled in a given field, determines the degree of confidence that can be placed in a measured value of accuracy. Some fields fare much better than others.

8,569 citations

Journal ArticleDOI
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

7,572 citations

Journal ArticleDOI
TL;DR: The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.
Abstract: Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.

5,686 citations

Journal ArticleDOI
TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.
Abstract: The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

3,559 citations