scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1973"


Journal ArticleDOI
TL;DR: A new form of document coupling called co-citation is defined as the frequency with which two documents are cited together, and clusters of co- cited papers provide a new way to study the specialty structure of science.
Abstract: A new form of document coupling called co-citation is defined as the frequency with which two documents are cited together. The co-citation frequency of two scientific papers can be determined by comparing lists of citing documents in the Science Citation Index and counting identical entries. Networks of co-cited papers can be generated for specific scientific specialties, and an example is drawn from the literature of particle physics. Co-citation patterns are found to differ significantly from bibliographic coupling patterns, but to agree generally with patterns of direct citation. Clusters of co-cited papers provide a new way to study the specialty structure of science. They may provide a new approach to indexing and to the creation of SDI profiles.

3,846 citations


Journal ArticleDOI
TL;DR: It is argued that a user's subjective evaluation of the personal utility of a retrieval system's output to him, if it could be properly quantified, would be a near-ideal measure of retrieval effectiveness.
Abstract: It is argued that a user's subjective evaluation of the personal utility of a retrieval system's output to him, if it could be properly quantified, would be a near-ideal measure of retrieval effectiveness. A hypothetical methodology is presented for measuring this utility by means of an elicitation procedure. Because the hypothetical methodology is impractical, compromise methods are outlined and their underlying simplifying assumptions are discussed. The more plausible the simplifying assumptions on which a performance measure is based, the better the measure. This, along with evidence gleaned from ‘validation experiments’ of a certain kind, is suggsted as a criterion for selecting or deriving the best measure of effectiveness to use under given test conditions.

253 citations


Journal ArticleDOI
TL;DR: A cluster analysis procedure is described in which 288 journals in the disciplines of physics, chemistry and molecular biology are grouped into clusters and two‐step citation maps linking the clusters are presented for each discipline.
Abstract: A cluster analysis procedure is described in which 288 journals in the disciplines of physics, chemistry and molecular biology are grouped into clusters. Most of the clusters are easily identified as subdisciplinary subject areas. The data source was the cross citing amongst the journals derived from the Journal Citation Index (JCI), a file derived in turn from the Science Citation Index (SCI)®. The JCI consists of journal by journal tabulation of citings to and from each journal processed in the SCI. Two-step citation maps linking the clusters are presented for each discipline. Within the disciplines the clusters of journals form fully transitive hierarchies with very few relational conflicts.

133 citations


Journal ArticleDOI
Larry J. Murphy1
TL;DR: It is stressed that Lotka's Law was originally only applicable in physical science, specifically chemistry and physics, and the more recent general application of Lotkas Law in non‐physical science, without appropriate new tests of validity, is bemoaned.
Abstract: It is stressed that Lotka's Law was originally only applicable in physical science, specifically chemistry and physics. The more recent general application of Lotka's Law in non-physical science, without appropriate new tests of validity, is bemoaned. A recent test in the humanities is discussed, showing that Lotka's Law does apply reasonably in that speciality. A plea is made for more “spot checks” of so called general “Laws,” which were determined using specific subject samples—not only for Lotka's Law, which is used here as an example, but in all such “laws” applied in information science, in general.

79 citations


Journal ArticleDOI
Susan Artandi1
TL;DR: The concept of information is examined within the framework of the Mathematical Theory of Communication and semiotics, the study of signs and sign systems, for the better understanding of information.
Abstract: The concept of information is examined within the framework of the Mathematical Theory of Communication and semiotics, the study of signs and sign systems. The implications of these theories for the better understanding of information as we deal with this concept in the context of information systems are discussed.

65 citations


Journal ArticleDOI
TL;DR: Two kinds of solution to the problems of the impracticality of the naive evaluation procedure are taken up, and the first answers the questions in terms of the reasonableness of the simplifying assumptions needed to get from the naive measure to the proposed substitute.
Abstract: It was argued in Part I (see JASIS, March-April 1973 p. 87) that the best way to evaluate a retrieval system is, in principle at least, to elicit subjective estimates of the system's utility to its users, quantified in terms of the numbers of utiles (e.g. dollars) they would have been willing to give up in exchange for the privilege of using the system; and a naive methodology was outlined for evaluating retrieval systems on this basis. But the impracticality of the naive evaluation procedure as it stands raises the questions: How can one decide which practical measure is likely to yield results most closely resembling those of the naive methodology? And how can one tell whether the resemblance is close enough to make applying the measure worth while? In the present paper two kinds of solution to these problems are taken up. The first answers the questions in terms of the reasonableness of the simplifying assumptions needed to get from the naive measure to the proposed substitute. The second answers it by experimentation.

52 citations


Journal ArticleDOI
TL;DR: A combination of quantitative and qualitative analyses were used on the journal articles indexed in one vollume of Library Literature, showing the dispersion of articles among journals followed a Bradford‐type distribution except for a "collapse" at the end.
Abstract: A combination of quantitative and qualitative analyses were used on the journal articles indexed in one vollume of Library Literature. Findings include: the dispersion of articles among journals followed a Bradford-type distribution except for a “collapse” at the end, possibly showing low level of interaction of librarianship with other fields; considerable proportion of articles was of news-type; administration was the largest single subject covered. The methodology may be appropriate for analysis of activities in other fields.

50 citations


Journal ArticleDOI
TL;DR: A framework for developing analytical and conceptual relationships involving the flow of information and a suggested rule which is convenient and reasonable is proposed for evaluating the decision state of a decision‐maker at any point in time are proposed.
Abstract: A generalized framework for developing analytical and conceptual relationships involving the flow of information has previously been suggested. This paper provides further refinement, rigor, and extension for some of the earlier relationships suggested. In particular, a measure of the amount of information is defined as the difference of the value of the decision state of the decision-maker after and before receipt of the message. This measure is universally applicable for all information that is concerned with the effectiveness of the message upon the recipient. It is accordingly called pragmatic information. The definition is a direct consequence of the interdependence between information and decision-making and of the definition that information is data of value in decision-making. In order to evaluate this measure of information, it is convenient to use a generalized information system model which has previously been proposed and which has virtually universal applicability. The use of this model permits the evaluation of the measure of information in terms of the reduction of uncertainty to a decision maker. Six different types of uncertainty are identified. Specifically, a type of uncertainty which is generally overlooked in the decision-making literature is found to be important. This is called executional uncertainty. It is pointed out that the information science aspects of decision theory must cover comprehensively those decision-makers who not only are expert but those decision-makers who may be mediocre or even rather poor. Although any decision rule may be utilized in terms of the framework outlined, a suggested rule which is convenient and reasonable is proposed for evaluating the decision state of a decision-maker at any point in time. The measure of information suggested is situation, time, and decision-maker dependent. The framework and relationships developed are an important step toward the development of a true theory of information science. It is further suggested that each data set or document might have some average (over time) amount of information content for a decision-maker of any given “effectiveness”. The relationship of this average amount of information as a function of the decision-maker effectiveness is suggested as an important functional relationship that exists for every document. It is called an information profile of that document or data set. A typical information profile is suggested.

50 citations


Journal ArticleDOI
TL;DR: The purpose is to present a rationale for the modification phase of an abstracting system and to describe several modification rules whose implementation is an initial step toward the automated production of abstracts that contain sentences written especially for the abstract.
Abstract: We have undertaken to extend the capabilities of the abstracting system described by Rush, Salvador and Zamora by adding to the system a modification procedure that could be employed to make the abstracts produced by the system more acceptable to the reader. Results of this study are reported in this paper. Our purpose is to present a rationale for the modification phase of an abstracting system and to describe several modification rules whose implementation is an initial step toward the automated production of abstracts that contain sentences written especially for the abstract. We have described several methods for improving the readability of abstracts produced by computer program. The research described in this paper was performed as a part of a larger project whose aim is the development of an operational automatic abstracting system. 18 references are given. 1

31 citations


Journal ArticleDOI
TL;DR: Frequencies of the number of references per paper were obtained for the information science literature represented by all papers cited in Information Science Abstracts, volumes 1–6 (1966-1971), and information science was found to be less scholarly than pharmacy.
Abstract: Frequencies of the number of references per paper were obtained for the information science literature represented by all papers cited in Information Science Abstracts, volumes 1–6 (1966-1971). Almost one-third of the papers had no references. Half of them had four references or fewer; two-thirds had eight or fewer references. The ratio of papers without references to those with references was proposed as a measure of the scholarly status of a field. On this basis, information science was found to be less scholarly than pharmacy.

28 citations


Journal ArticleDOI
TL;DR: An overall framework for considering the information retrieval decision problem is presented, incorporating the aspects of cost-effectiveness and alternative evaluation, which allows one to better understand the contributions made by many researchers in this crucial area.
Abstract: A decision theory approach is used to model the information retrieval decision problem of which documents to retrieve from a library collection in response to a specific user query for information. A thorough discussion of decision theory, including the components of the alternatives, states-of-nature, outcomes, and evaluations–as well as of the optimization process under the cases of certainty, risk, and uncertainty–is presented. Bayesian statistics are also discussed to show how prior information about the various documents via classification analysis can affect the decision process under risk. An example problem is used to illustrate the decision theory approach and to compare the overall performance of the retrieval system under risk with and without the document classification information. Thus, the operations research technique of decision theory is used to model the retrieval decision process, illustrate how important evaluation is, and to demonstrate the value of prior information via document classification analysis. Moreover, the paper presents, in a somewhat tutorial mode, an overall framework for considering the information retrieval decision problem, incorporating the aspects of cost-effectiveness and alternative evaluation, which allows one to better understand the contributions made by many researchers in this crucial area.

Journal ArticleDOI
TL;DR: The authors examined the journal citations that appeared in Education and English theses at the University of Rhode Island to develop one means of measuring the use of the collection and noted some of the implications for more rational decision‐making and inter‐library cooperation.
Abstract: The authors examined the journal citations that appeared in Education and English theses at the University of Rhode Island. The purpose of the investigation was to develop one means of measuring the use of the collection. Additionally, the study looks at the use/cost aspect and closes by noting some of the implications of the study for more rational decision-making and inter-library cooperation.

Journal ArticleDOI
TL;DR: Since little substantiated evidence exists concerning the features that should or should not be included in the man-machine interface of interactive bibliographic search and retrieval systems, an informal survey tapping the opinions of scientists active in this research area showed significant level of agreement concerning interface features.
Abstract: Since little substantiated evidence exists concerning the features that should or should not be included in the man-machine interface of interactive bibliographic search and retrieval (IBSR) systems, an informal survey tapping the opinions of scientists active in this research area was conducted. An analysis of the responses showed a significant level of agreement concerning interface features.

Journal ArticleDOI
TL;DR: Experiments with two distinct collections, using three levels of indexing exhaustivity for both documents and requests, show that substantially the same performance is obtained for very different levels of document indexing, if suitable choices are made of request level.
Abstract: Indexing exhaustivity, which may be broadly defined as the number of terms assigned to a document, is thought to be of some importance in retrieval, and it has been suggested that there may be an optimal level of exhaustivity for a particular collection. Experiments with two distinct collections, using three levels of indexing exhaustivity for both documents and requests, show that substantially the same performance is obtained for very different levels of document indexing, if suitable choices are made of request level.

Journal ArticleDOI
TL;DR: This study tends to support the conclusion of Sparck‐Jones that weighted index terms provide better retrieval performance than unweighted terms, and concludes that the results are highly dependent upon the document collection, and the technique should be employed with caution.
Abstract: The objectives of this paper are to describe the effect of using weighted index terms in a document retrieval system, and to evaluate retrieval performance when queries are expanded by terms occurring in clusters with the query terms. Three data collections, each indexed by several methods, two of which were studied and reported on in previous work, are used to develop explicit results. The study both expands upon and extends previous work at the University of Maryland. The effect of weighting index terms in the document collection, the queries and the formation of clusters is analyzed. Eight cases are investigated in which index terms are weighted and unweighted. The best results are obtained when weighted index terms are used in forming clusters, in queries, and in documents. In this case, the results on the new collection demonstrate a significant improvement in retrieval performance relative to the performance with the unmodified data base, when clustered terms are added to queries. The improvement is in contrast to the results in the previous study, where a degradation in performance, or at best an insignificant improvement, was obtained. Comparisons are made to related work by Sparck-Jones and her colleagues. This study tends to support the conclusion of Sparck-Jones that weighted index terms provide better retrieval performance than unweighted terms. The cluster addition of index terms to queries yields unpredictable results. Some collections show an improvement in retrieval performance, others a degradation or no change in performance. Sparck-Jones obtained an improvement in retrieval performance for her document collection. We conclude that the results are highly dependent upon the document collection, and the technique should be employed with caution.

Journal ArticleDOI
TL;DR: In efficient coding, Information‐Communication‐Coding Theory based on probabilities of occurrence assigns short codes to events with little information content and long codes to Events with high information content, which provides a direct relationship of code size to amount of information content.
Abstract: In efficient coding, Information-Communication-Coding Theory based on probabilities of occurrence assigns short codes to events with little information content and long codes to events with high information content. This provides a direct relationship of code size to amount of information content. Entropies of surrogates such as citations, abstracts, first paragraphs, last paragraphs, and first and last paragraphs are measures of how well each class of surrogates predicts the relevancy of documents. They are measures of meaningful information in the text of surrogates. Such measures of information are important to information system designers.

Journal ArticleDOI
TL;DR: Findings show that thesauri viewed from the communications point of view do not allow a cybernetic process of communication (“both‐way” communication) if a thesaurus utilized in a system is not updated by both indexers and question negotiators.
Abstract: It was argued that the present-day thesaurus-construction and maintenance rules and conventions are not theoretically based. For this redson, there are few rules and conventions for updating a thesaurus. Consequently, most of the thesauri adopted by operating information storage and retrieval systems are not systematically updated. In order to investigate how thesauri are actually updated, a survey was conducted. The working hypothesis was that the communication process between authors and readers is linear in nature (“one-way” communication allowing no reciprocal feedback) if a thesaurus utilized in a system is not updated by both indexers and question negotiators. Findings show that thesauri viewed from the communications point of view do not allow a cybernetic process of communication (“both-way” communication). The survey indicated that the present practice of updating thesauri is largely done by indexers alone. No attempt was made to develop a theory of thesaurus-construction and updating. It was, however, argued that such a theory, if developed, should at least account for the concepts of meaning and knowledge. Within this theoretical framework, two techniques are suggested to be considered for the systematic updating of a thesaurus.

Journal ArticleDOI
John O'Connor1
TL;DR: The results of the study are surprisingly good for retrieval in such a “soft science,” and it is reasonable to hope that in less "soft” sciences and technologies the techniques described will work even better.
Abstract: Some new text searching retrieval techniques are described which retrieve not documents but sentences from documents and sometimes (on occasions determined by the computer) multi-sentence sequences. Since the goal of the techniques is retrieval of answer-providing documents, “answer-passages” are retrieved. An “answer-passage” is a passage which is either answer-providing or “answer-indicative,” i.e., it permits inferring that the document containing it is answer-provding. In most cases answer-sentences, i.e., single-sentence answer-passages, are retrieved. This has great advantages for screening retrieval output. Two new automatic procedures for measuring closeness of relation between clue words in a sentence are described. One approximates syntactic closeness by counting the number of intervening “syntactic joints” (roughly speaking, prepositions, conjunctions and punctuation marks) between successive clue words. The other measure uses word proximity in a new way. The two measures perform about equally well. The computer uses “enclosure” and “connector words” for determining when a multi-sentence passage should be retrieved. However, no procedure was found in this study for retrieving multi-paragraph answer-passages, which were the only answer-passages occurring in 6% of the papers. In a test of the techniques they failed to retrieve two answer-providing documents (7% of those to be retrieved) because of one multi-paragraph answer-passage and one complete failure of clue word selection. For the other answer-providing documents they retrieved at all recall levels with greater precision than SMART, which has produced the best previously reported recall-precision results. The retrieval questions (mostly from real users) and documents used in this study were from the field of information science. The results of the study are surprisingly good for retrieval in such a “soft science,” and it is reasonable to hope that in less “soft” sciences and technologies the techniques described will work even better. On this basis a dissemination and retrieval system of the near future is predicted.


Journal ArticleDOI
TL;DR: There does not appear to be any evidence to support the assumption that abstracts, accompanying a document, have any significant effect.
Abstract: s (and other short surrogates for the complete text) of a document have been generally considered a valuable aid to the reader in quickly determining the relevance of a document (as well as, in some cases, serving as a substitute for reading the text, and as a separate surrogate in secondary services). A test of this hypothesis was conducted in the form of a field experiment in three military laboratories. Based on what a sample of 85 scientists and engineers reported on the time they took and the relevance judgments they made with respect to the documents which came across their desks over a four week period, there does not appear to be any evidence to support the assumption that abstracts, accompanying a document, have any significant effect.

Journal ArticleDOI
TL;DR: It is observed that out‐standing genius appears to pay scant regard to existing classifications and is more likely to be involved in an integrated approach to problems, which results in a paradox which is probably unresolvable.
Abstract: It has been suggested that information science is still at the stage of alchemy: if this is so then mutual exclusivity must form its philosopher's stone Mutual exclusivity appears to be alien to the observable universe: that this is so is displayed through a series of examples Some of these relate to everyday things like trees, beaches and man himself, whilst others relate to more obscure phenomena like continental drift and black holes The act of observation is also considered as this has a considerable bearing on the problem Nevertheless, mutual exclusivity must form part of man's mental powers and this has found expression in the relatively exclusive series of symbols used in communication The dangers of exclusive thinking in relation to environmental problems are considered, and this results in a paradox which is probably unresolvable Finally, it is observed that out-standing genius appears to pay scant regard to existing classifications and is more likely to be involved in an integrated approach to problems

Journal ArticleDOI
Gerard Salton1
TL;DR: The citations appearing in two recent comprehensive bibliographies in information science and technology are reviewed, and a comparison is made with bibliography dating back to 1962.
Abstract: The citations appearing in two recent comprehensive bibliographies in information science and technology are reviewed, and a comparison is made with bibliographies dating back to 1962. Some conclusions are drawn concerning the development and current state of information science.

Journal ArticleDOI
TL;DR: It is shown that the mechanical procedure is capable of achieving simultaneous average relevance and recall figures above 80% in a corpus of 261 physics research papers.
Abstract: A study was undertaken to classify mechanically a document collection using the free-language words in the titles and abstracts of a corpus of 261 physics research papers. Using a clustering algorithm, results were obtained which closely duplicated the clusters obtained by previous experiments with citations. A brief comparison is made with a traditional manual classification system. It is shown that the mechanical procedure is capable of achieving simultaneous average relevance and recall figures above 80%.

Journal ArticleDOI
TL;DR: An experiment is described which attempts to define a quantitative methodology for the identification and evaluation of journal publications to be acquired for a specialized collection and indicates that the technique has value, but requires individual handling and refinement in the developmental stage.
Abstract: An experiment is described which attempts to define a quantitative methodology for the identification and evaluation of journal publications to be acquired for a specialized collection. In previous acquisition procedures, the subjective opinion of one or more persons has, sometimes idiosyncratically, determined the material available to the user. In an attempt to identify and recommend all possibly relevant periodical titles containing toxicological-biological information, a statistical decision model that permits an objective approach was designed and employed. A list of yes/no criteria questions was developed with the advice of a “subject-oriented” Advisory Committee. A training technique was developed in an attempt to assure uniformity in interpretation, and a quality control check was imposed to determine consistency of judgment. The design, implementation, and quality control testing of the model have indicated that the technique has value, but requires individual handling and refinement in the developmental stage.

Journal ArticleDOI
TL;DR: A statistical model for characterizing the growth patterns of data base utilization and for estimating future utilization levels of demand has been developed and illustrations of the model applied to a typical information retrieval organization are given.
Abstract: A statistical model for characterizing the growth patterns of data base utilization and for estimating future utilization levels of demand has been developed for information retrieval organizations. The model developed is γ = β (1-e−at) where γ is the number of users of a data base at time †, and α and β are parameters to be estimated. Illustrations of the model applied to a typical information retrieval organization are given and discussed.

Journal ArticleDOI
TL;DR: This article deals with the promotion of information services, specifically the formation and subsequent evaluation of different promotional programs for selective dissemination of information (SDI) services provided by the Mechanized Information Center at The Ohio State University.
Abstract: This article deals with the promotion of information services, specifically the formation and subsequent evaluation of different promotional programs for selective dissemination of information (SDI) services provided by the Mechanized Information Center (MIC) at The Ohio State University. Three programs—opinion leadership, “blitz,” and telephone solicitation—were developed. Data were collected to show, for each of the programs: (1) the level of market penetration achieved; (2) the level of user satisfaction generated from the service; (3) the effect in terms of influence, of the various media employed; and (4) cost effectiveness. Data analysis focused on a determination of the most effective methods to promote SDI services.

Journal ArticleDOI
TL;DR: A number of more basic terms have been selected as the starting point for a project of defining terms which are important in communicating about computer and information science.
Abstract: The impetus for the work described in this paper arose from observations reported in a letter to the Editor of this Journal (1). The sense of that letter was that those of us who have worked in the field now broadly called Computer and Information Science, have depended too long on the use of terms defined by example or through the vague specification of relationships (often weak) among entities. The term to which the author of the letter made specific reference (a term to which altogether too many interpretations are allied) was “thesaurus.” The reader will find that the term “thesaurus” is not defined in this paper but, as will be seen, a number of more basic terms have been selected as the starting point for a project of defining terms which are important in communicating about computer and information science.

Journal ArticleDOI
TL;DR: The scientific mission is a higher order concept and subsumes both idealized polar types of basic and applied science and there is a distinct tendency for elites to cluster at the basic research end of the continuum.
Abstract: Scientific elitism must be viewed as a multidimensional phenomenon. Ten variables of elitism are considered and a principal components factor analysis is used to scale this multivariate domain. Two significant dimensions of elitism were found; one in basic and one in applied science. Sociometric techniques were used to identify the elite of a scientific discipline in a large metropolitan area. An abstract analytical continuum, the scientific mission, was generated using a Thurstone-type scale. The scientific mission is a higher order concept and subsumes both idealized polar types of basic and applied science. A scientist's scale score reflects his professional interests and the breadth of his interest space. There is a distinct tendency for elites to cluster at the basic research end of the continuum. It was found that: (a) the ten variables of elitism provide a scale that successfully discriminates between elites and non-elites; (b) elites process more information than non-elites; (c) elites had more narrowly defined, less diffuse interest spaces than non-elites; and (d) elites prefer literature-oriented methods of procuring scientific information as opposed to person-oriented methods.

Journal ArticleDOI
TL;DR: The conclusion drawn is that the two models are very similar in their approach to understanding communication processes with the major difference being in the scope of methodology rather than in the model's construction.
Abstract: Goffman's epidemic theory is presented and compared to the contagion theory developed by Menzel and his co-author(s). An attempt is made to compare the two models presented and examine their similarities and differences. The conclusion drawn is that the two models are very similar in their approach to understanding communication processes with the major difference being in the scope of methodology rather than in the model's construction.

Journal ArticleDOI
TL;DR: The Educational Resource Information Center system is summarized, and the criticisms of that system in terms of decentralization, information quality, and indexing vocabulary and thesaurus structure are presented.
Abstract: This paper sets the ERIC information system in the context of the more general characteristics and problems of social science information storage and retrieval. Social science information studies and attempts at organizing social science information are surveyed. The Educational Resource Information Center system is summarized, and the criticisms of that system in terms of decentralization, information quality, and indexing vocabulary and thesaurus structure are presented. The recommendations of the Fry study are considered in the context of the whole problem of social science information handling.