scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1978"


Journal ArticleDOI
TL;DR: This article describes a study to ascertain the reasons why certain old papers are still highly cited many years after their publication, and it was found that about 40% of the citations were for historical reasons, but that in the remaining 60% of cases, the old paper is still begin actively used.
Abstract: This article describes a study to ascertain the reasons why certain old papers are still highly cited many years after their publication. Twenty-three old papers in the subject fields of physics and physical chemistry which are still highly cited were selected, and 978 of the papers that cited them in the period 1974–1975 were studied. A new typology of reasons for citing the papers was devised, and using this typology, it was found that about 40% of the citations were for historical reasons, but that in the remaining 60% of the cases, the old paper is still begin actively used. We discuss some discrepancies between our citation figures and citation figures quoted by Garfield. We also discuss a number of errors which were found, both in citing articles and in Science Citation Index. Finally, calculations indicate that there is a rule that each cited paper is referred to, on average, 1.05-1.15 times in every paper that cites it, and that this rule has general validity.

182 citations


Journal ArticleDOI
TL;DR: A method of determining core journals for a discipline, using data from the Journal Citation Reports to generate discipline impact factors, is described.
Abstract: A method of determining core journals for a discipline, using data from the Journal Citation Reports to generate discipline impact factors, is described.

135 citations


Journal ArticleDOI
TL;DR: It is shown that the number of papers referred to per article is much lower for Soviet journals than for European, US, or Japanese journals, and that high energy physics shows a significantly higher percentage of organic citations and a somewhat lower percentage of evolutionary citations than either nuclear physics or solid state physics.
Abstract: The classification of scientific citations according to their quality and function, established in a previous paper, is strengthened by tests of its reproducibility and universality. The method is then applied to articles in various specialties of theoretical physics published in various journals, and conclusions are drawn about differences by specialty and by geographical areas. Specifically, it is shown for the sample investigated (a) that the number of papers referred to per article is much lower for Soviet journals than for European, US, or Japanese journals; (b) that this number is much lower for solid state physics than for high energy or nuclear physics; (c) that US journals have a higher percentage of conceptual citations, and a much higher percentage of organic citations than Soviet journals; (d) that the Soviet Journal of Nuclear Physics has citation patterns markedly different than the other two Soviet journals investigated and is rather similar to that of the US journals; (e) that high energy physics shows a significantly higher percentage of organic citations and a somewhat higher percentage of evolutionary citations than either nuclear physics or solid state physics. Some speculations are presented to “explain” these effects.

91 citations



Journal ArticleDOI
TL;DR: It is shown that the ratings of departments within a university are not independent, and that these dependencies are associated with the bibliometric size of the university, as well as the extent to which a university's prestige in one department correlates with the assessment of other departments within that university.
Abstract: This paper presents a quantitative comparison of peer versus bibliometric procedures for rating the quality of US universities The peer ratings used are the Roose-Andersen rating of the quality of graduate faculty in 10 scientific fields The bibliometric ratings used are (1) the number of university papers in each field; (2) the average “quality” of the papers based on their citation rates, expressed as influence per paper in each field; and (3) total influence, the product of number of papers and influence per paper The bibliometric ratings are based on 127,000 university papers, from 450 journals in 10 fields for 1965 to 1973 Roose-Andersen ranks and scores are found to correlate most highly with the total influence of the university's papers, followed closely by correlations with the total number of papers, and much less closely with the average influence per paper A partial correlation and regression analysis indicate that the Roose-Andersen scores have two additive components: bibliometric size and bibliometric quality Further analysis explores the extent to which a university's prestige in one department correlates with the assessment of other departments within that university, and the extent to which the university's overall bibliometric size correlates with the assessment of departments within that university It is shown that the ratings of departments within a university are not independent, and that these dependencies are associated with the bibliometric size of the university University ranks and scores in different fields are shown to be much more highly correlated when based on peer assessment than when based on bibliometric measures

68 citations


Journal ArticleDOI
Gertrud Herlach1
TL;DR: In this article, the authors tested and accepted that the mechanistically identifiable citation link characteristic, mention of a given reference more than once within the same research paper, indicates a close and useful relationship of a citing to a given cited paper.
Abstract: The hypothesis is tested and accepted that the mechanistically identifiable citation link characteristic, mention of a given reference more than once within the same research paper, indicates a close and useful relationship of a citing to a given cited paper. Closeness and usefulness of the relationship between papers linked by citation were determined by means of users' judgments. It is shown that as a selection criterion for document retrieval, multiple mention of a reference would yield good precision but low recall, since a considerable number of papers with corresponding single mention were judged closely related to the given cited paper. Frequency counts showed that approximately one-third of all bibliographic references in the research papers checked are mentioned in the text more than once.

62 citations


Journal Article
TL;DR: It is shown that as a selection criterion for document retrieval, multiple mention of a reference would yield good precision but low recall, since a considerable number of papers with corresponding single mention were judged closely related to the given cited paper.

61 citations


Journal ArticleDOI
TL;DR: This paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval by discussing the Swets model of information retrieval and showing the situation of precision and recall not being inversely related is shown to be possible under certain conditions.
Abstract: The Swets model of information retrieval, based on a decision theory approach, is discussed, with the overall performance measure being the crucial element reexamined in this paper. The Neyman-Pearson criterion from statistical decision theory, and based on likelihood ratios, is used to determine an optimal range of Z, the variable assigned to each document by the retrieval system in an attempt to discriminate between relevant and nonrelevant documents. This criterion is shown to be directly related to both precision and recall, and is equivalent to the maximization of the expected value of the retrieval decision for a specific query and a given document under certain conditions. Thus, a compromise can be reached between those who advocate precision as a measure, due partially to its ability to be easily measurable empirically, and those who advocate consideration of recall. Several cases of the normal and Poisson distributions for the variable Z are discussed in terms of their implications for the Neyman-Pearson decision rule. It is seen that when the variances are unequal, the Swets rule of retrieving a document if its Z value is large enough is not optimal. Finally, the situation of precision and recall not being inversely related is shown to be possible under certain conditions. Thus, this paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval.

55 citations


Journal ArticleDOI
TL;DR: The new edition of Journal Citation Reports (JCR), based on a complete year of 1974 data, extends the scope of the preliminary edition which was based on one quarter year of 1969 data and lists 1974 citing journals and the earlier journals cited by them.
Abstract: The new edition of Journal Citation Reports (JCR), based on a complete year of 1974 data, extends the scope of the preliminary edition which was based on one quarter year of 1969 data. It is published as a bound companion volume to the 1975 Science Citation Index (SCI) and lists 1974 citing journals and the earlier journals cited by them. Some 2,400 1974 citing journals are covered and these journals contained about 4.2 million references to earlier journal items. For each journal JCR shows the total number of citations received by 1972 and 1973 volumes, and by all years. It also shows the number of articles published in the cited journal in 1972 and 1973-that is the articles available for citing in those years. From this data the impact factor is calculated—the average number of 1972/1973 citations received by each 1972/1973 article. This discounts the effect of size (larger journals receive more citations). JCR also shows the number of articles published in each cited journal during 1974, and the number of 1974 citations to them from all other journals. The immediacy index is the average number of times each 1974 article was cited in 1974. Finally JCR shows, for each journal, the journals which cite it most heavily, and the journals which it cites most heavily.

43 citations


Journal ArticleDOI
TL;DR: Decision theory and utility theory have been proposed as a possible theoretical framework within which to formulate decision rules for indexing, and may be interpreted operationally as guides to the thought processes appropriate to the making of sound indexing decisions.
Abstract: Although it is widely accepted that indexing and cataloging are central to the document and reference retrieval process, there is as yet little unanimity on the issue of how these operations ought to be carried out. Since indexing is a decision-making process, decision theory and utility theory have been proposed as a possible theoretical framework within which to formulate decision rules for indexing. These decision rules may be interpreted operationally as guides to the thought processes appropriate to the making of sound indexing decisions. One such thought process involves “gedanken experimentation,” in which the indexer is led to an indexing decision by performing suitable thought experiments. This approach to indexing, in addition to providing a theoretical basis for what the indexer does, lends itself to various training schemes for the education of indexers, to the provision of graphic aids to indexing, and to the evaluation of indexing performance. There are also ramifications for automatic indexing.

40 citations


Journal ArticleDOI
TL;DR: Attempts have been made to rank the output of a Boolean search in terms of the overlap of index terms in the request and document, but a closer examination reveals disturbing ambiguities.
Abstract: Attempts have been made to rank the output of a Boolean search in terms of the overlap of index terms in the request and document. As attractive as the merger of two approaches may seem, a closer examination reveals disturbing ambiguities, with equivalent Boolean expressions yielding different weights for the retrieval documents. An alternative approach is suggested.

Journal ArticleDOI
TL;DR: A procedure for automated indexing of pathology diagnostic reports at the National Institutes of Health is described and it is of interest that effective automatic encoding can be based upon an existing vocabulary and code designed for manual methods.
Abstract: A procedure for automated indexing of pathology diagnostic reports at the National Institutes of Health is described. Diagnostic statements in medical English are encoded by computer into the Systematized Nomenclature of Pathology (SNOP). SNOP is a structured indexing language constructed by pathologists for manual indexing. It is of interest that effective automatic encoding can be based upon an existing vocabulary and code designed for manual methods. Morphosyntactic analysis, a simple syntax analysis, matching of dictionary entries consisting of several words, and synonym substitutions are techniques utilized.

Journal ArticleDOI
TL;DR: Results seem to indicate that the automated selection of index terms from a frequency list holds some promise for automatic indexing.
Abstract: A method of selecting index terms directly from a word frequency list is described. The original idea was suggested by Goffman who reasoned that the most content-bearing words of a given text would occur at the transition region at which Zipf's First Law of words of high frequency of occurrences begins to take on properties of words of low frequency of occurrences. Word frequencies of two articles were analyzed. Results seem to indicate that the automated selection of index terms from a frequency list holds some promise for automatic indexing.

Journal ArticleDOI
TL;DR: The findings show that Bradford's Law is the reflection of some underlying process not related to the characteristics of the search mechanism or the nature of the literature, and there is instead a basic probabilistic mechanism underlying the law.
Abstract: Twenty-three data sets representing the documents retrieved by a wide variety of searches were examined for correspondence to Bradford's Law. Regression lines fit to the data sets showed all correlations in excess of 0.96. Thus, the fitted line, as customarily specified by slope and intercept, can serve as a good representation of an entire data set. Slope can be shown to be almost entirely determined by the total number of articles retrieved. Over two-thirds of the variance in the intercept is accounted for by the total number of journal titles retrieved. These findings weigh against earlier speculation that slope and intercept depended on such characteristics as breadth of subject area, topic, time period, or search technique. The findings show that Bradford's Law is the reflection of some underlying process not related to the characteristics of the search mechanism or the nature of the literature. The authors conclude that there is instead a basic probabilistic mechanism underlying the law.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the cost effectiveness of retrospective manual and on-line bibliographic searching and found that online searching is generally faster, less costly, and more effective than manual searching.
Abstract: A study to compare the cost effectiveness of retrospective manual and on-line bibliographic searching is described. Forty search queries were processed against seven abstracting--indexing publications and the corresponding SDC/ORBIT data bases. Equivalent periods of coverage and searcher skill levels were used for both search modes. Separate task times were measured for question analysis, searching, photocopying, shelving, and output distribution. Component costs were calculated for labor, information, reproduction, equipment, physical space, and telecommunications. Results indicate that on-line searching is generally faster, less costly, and more effective than manual searching. However, for certain query/information-source combinations, manual searching may offer some advantages in precision and turn-around time. The results of a number of related studies are reviewed. 1 figure, 6 tables, 43 references.

Journal ArticleDOI
TL;DR: Support for the hypothesis that the indicativity measure does not fully reflect the value of the fields is developed and the question of the cost effectiveness of the longer fields is unresolved.
Abstract: The indicativity of a type of catalog information (or catalog field) is intended as a measure of how well the information in the field conveys the contents of the document it represents. In the experiments reported here, indicativity is measured for several catalog fields by comparing users' evaluations of the relevance of documents on the basis of the information in a given field with their judgments on the basis of full text. A small but statistically significant increase in indicativity is found as the length of a catalog field (as measured by the number of different content-word stems) is increased. The title field is found to have an indicativity of 0.64; matching subjects, 0.67; subjects, 0.70; abstract, 0.73. Despite the relatively small gain in indicativity for the longer fields, users value the longer fields highly for determining relevance if one judges by the amount of time they spend on them. Support for the hypothesis that the indicativity measure does not fully reflect the value of the fields is developed. Thus, the question of the cost effectiveness of the longer fields is unresolved. Other aspects of catalog field utility studied under the Project Intrex equipments are also reported.

Journal ArticleDOI
TL;DR: The paper concludes with a critical discussion of the correlation of frequency‐rank distributions and offers a new measure‐the transfer coefficient.
Abstract: Frequency, rank, and frequency-rank distributions are distinguished and discussion then focuses on frequency-rank distributions. The key example of frequency-rank distributions is the so-called “law of anomalous numbers” which, modeled as a “very mixed” Poisson process and simulated by computer, provides a means of exploring the stability of ranks in sample frequency-rank distributions. The paper concludes with a critical discussion of the correlation of frequency-rank distributions and offers a new measure-the transfer coefficient.

Journal ArticleDOI
TL;DR: The proposed solution is to provide the user with a simple, clearly designed subset of the language that nevertheless includes all important query functions, while the additions to modify, shorten, improve, and extend it are left to the experienced user.
Abstract: Query languages for document retrieval systems should be simple and easy to learn for the casual user; they should provide all conceivable facilities for the experienced user. These goals comprise the most serious contradictions that evolve between all the design criteria collected, compared, and evaluated in this paper. The proposed solution or, at least, relief to this conflict is to provide the user with a simple, clearly designed subset of the language that nevertheless includes all important query functions, while the additions to modify, shorten, improve, and extend it are left to the experienced user. It is stressed that the simple data formats available with most systems are insufficient; the need for more elaborate structures is substantiated. A point is made for a formal rather than a natural language for document retrieval.

Journal ArticleDOI
TL;DR: An analysis of the various kinds of imprecision that can occur indicates strongly that fuzzy set theory is not an appropriate formalism for these models.
Abstract: The imprecision of some of the concepts which are used in formal models in information science has led to a spate of attempts to apply fuzzy set theory to aspects of information science. An analysis of the various kinds of imprecision that can occur indicates strongly that fuzzy set theory is not an appropriate formalism for these models.

Journal ArticleDOI
TL;DR: This paper proves a statistical relationship which links two types of productivity distributions, size‐frequency and rank‐frequency, which can be estimated given the number of journals and references.
Abstract: If g(x) is the number of journals having x references, we speak of a size-frequency relationship. If f(r) is the number of references in a journal of rank r, we speak of a rank-frequency relationship. In the former we can estimate the number of journals, given the number of references; in the latter we can estimate the number of references given the rank of the journal. In this paper we prove a statistical relationship which links these two types of productivity distributions.

Journal ArticleDOI
TL;DR: Recent mathematical descriptions of Bradford's distribution are examined and a new formulation presented that provides a more accurate estimation of the total number of papers and sources on a given scientific subject.
Abstract: Recent mathematical descriptions of Bradford's distribution are examined and a new formulation presented that provides a more accurate estimation of the total number of papers and sources on a given scientific subject. The deviation from linearity at the end of the distribution is shown to be an instrinsic feature.

Journal ArticleDOI
T. D. C. Kuch1
TL;DR: The number of authors of papers in four scientific journals is positively correlated with the number of significant words in the titles of the same papers, and a fifth journal showed no correlation.
Abstract: The number of authors of papers in four scientific journals is positively correlated with the number of significant words in the titles of the same papers. A fifth journal showed no correlation. Implications for information retrieval are explored and tentative explanations are offered.

Journal ArticleDOI
TL;DR: There are marked disparities in the extent to which the research produced in different countries is referenced, and these disparities may be accounted for by cultural, ethnocentric, and linguistic factors; by the varying quality of research performed inDifferent countries; and by simple ignorance of research that is being undertaken in locales out of the research mainstream.
Abstract: Cross-national information flows are investigated for three energy-related fields of physics using the referencing data appearing in physics journals. In all three fields, U.S. research appears to be the most heavily referenced research in the world, with Soviet research coming in second. An examination of “balance of information” flows shows that the U.S. consistently experiences a negative balance. That is, foreign scientists reference U.S. work more frequently than do U.S. scientists reference foreign work. In contrast, the USSR consistently experiences a positive balance of information flow. An examination of cross-national information flows shows that there are marked disparities in the extent to which the research produced in different countries is referenced. These disparities may be accounted for by cultural, ethnocentric, and linguistic factors; by the varying quality of research performed in different countries; and by simple ignorance of research that is being undertaken in locales out of the research mainstream.

Journal ArticleDOI
TL;DR: CHEMLINE, the chemical nomenclature adjunct file to TOXLINE, was designed and implemented through a three‐step process based on the requirements of the three search and retrieval systems.
Abstract: CHEMLINE, the chemical nomenclature adjunct file to TOXLINE, was designed and implemented through a three-step process. The file was first available for on-line interactive searching in January 1974 as the TOXLINE Chemical Dictionary using the STIMS/RECON software system. The second version was available from the National Library of Medicine (NLM) using the ELHILL2 software system. The third version is currently available from the NLM using the ELHILL3 software system. The file content was determined by the availability of data and the needs of the TOXLINE users. The data in CHEMLINE is obtained from the CAS Registry system files. The organization of the file was based on the requirements of the three search and retrieval systems. Changes in file structures required evaluation of the data and the development of new specifications prior to creating the file in each software system. Techniques for creating inverted file terms for search purposes from chemical names, molecular formula, ring information, and Wis-wesser Line Notations are described.

Journal ArticleDOI
TL;DR: This paper has two objectives: to explain the theoretical links between the demand for library service and key economic variables and to describe how demand was estimated in one specific experimental context: the demandFor library service of institutional users of the Cleveland Health Sciences Library.
Abstract: Issues related to the demand for library service are not new to library administrators and other information scientists. However, formal economic analysis of that demand is just beginning. This paper has two objectives: to explain the theoretical links between the demand for library service and key economic variables and to describe how demand was estimated in one specific experimental context: the demand for library service of institutional users of the Cleveland Health Sciences Library. This study is cited to illustrate some problems that are likely to arise in attempting to estimate statistically the parameters of library demand functions. The need for more precise economic analysis of library demand has grown as forms of information that have traditionally been provided free to users begin to acquire explicit price tags. This trend is likely to continue. Most economic models for setting library user fees require specific inputs about the demand for information and the sensitivity of demand to changes in economic variables. Even in situations where user fees are not applicable, an understanding of the demand function can be useful in predicting how the amount of library service demanded might change if underlying economic or noneconomic variables change. As more complete data become available, economic analysis of library demand will be employed more frequently as a policymaking tool.


Journal ArticleDOI
TL;DR: The Neyman-Pearson lemma is shown to yield a better a priori decision rule for retrieval; maximize precision subject to a fixed level of recall, instead of setting a lower limit upon precision, as does the threshold rule.
Abstract: The retrieval decision problem is considered from the viewpoint of a decision theory approach. A threshold rule based on a suggested rule for indexing is considered and analyzed for retrieval decisions. The threshold rule is seen as a good descriptive measure of what a reasonable retrieval system should be able to accomplish. A retrieval mechanism of randomly drawing documents is analyzed to determine the relative strength of the threshold rule. The Neyman-Pearson lemma is shown to yield a better a priori decision rule for retrieval; maximize precision subject to a fixed level of recall, instead of setting a lower limit upon precision, as does the threshold rule. The threshold rule is seen as a necessary, but not sufficient, condition for effective retrieval. A sufficient condition for the threshold rule illustrates the relationship between it and the rule derived from the Neyman-Pearson lemma. Finally, a new measure of information retrieval system performance is presented, based on the threshold rule.

Journal ArticleDOI
TL;DR: It is advocated that collective action be undertaken by the information science community to formulate a curriculum plan to orient educators, prospective students, and present and prospective clienteles.
Abstract: This paper relates factors pertaining to the development of professional education programs to the current status of education for information science. Information science education has not yet matured to the level of being articulated in a program plan representing the consensus of members of the profession. Components of curriculum plans are discussed. A topic inventory for information science is presented with indications of its utility in curriculum design and assessment. The author advocates that collective action be undertaken by the information science community to formulate a curriculum plan to orient educators, prospective students, and present and prospective clienteles.


Journal ArticleDOI
TL;DR: A library network model is reviewed and a new derivation of the model is presented that requires fewer assumptions and allows more robust representations of library networks.
Abstract: A library network model is reviewed and a new derivation of the model is presented that requires fewer assumptions and allows more robust representations of library networks. It is then shown how uncertainty in model parameters, due to estimation on the basis of finite sample sizes, propagates through the model and thereby introduces uncertainty in the model's predictions. Using these results, it is shown how one can determine the appropriate amount of data to collect and/or the appropriate level of aggregation of data.