scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1974"


Journal ArticleDOI
TL;DR: Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner, but terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance, whereas terms with low frequency in the whole collection are useful forhigh precision.
Abstract: Most existing automatic content analysis and indexing techniques are based on word frequency characteristics applied largely in an ad hoc manner. Contradictory requirements arise in this connection, in that terms exhibiting high occurence frequencies in individual documents are often useful for high recall performance (to retrieve many relevant items), whereas terms with low frequency in the whole collection are useful for high precision (to reject nonrelevant items).

422 citations




Journal ArticleDOI
Henry Voos1
TL;DR: The literature of information science has been examined between 1966 and 1970 and it was determined that a new constant, 1/n3.5 fitted information science best.
Abstract: Productivity in terms of scientific publication was described by Lotka in 1926. He discovered that in the hard sciences he could predict the number of papers an author would write providing he knew how many authors wrote only one paper during a given time period. The factor for predicting the number of papers in a field like chemistry was found to be 1/n2 of the number of authors writing only one paper. That is, if 100 authors wrote one paper, only 25 would write two papers, and only 11 would write three papers, etc. If the Lotka constant holds for the hard sciences it was hypothesized (and tested) that other disciplines would have other constants, and thereby form a continuum based on productivity from the hard sciences to the non-sciences. The literature of information science has been examined between 1966 and 1970. It was determined that a new constant, 1/n3.5 fitted information science best.

75 citations


Journal ArticleDOI
TL;DR: Chen's data for the raw frequency of use of 138 physics journals in the science library at M I T are re‐examined and converted to densities of use‐per‐metre of shelf, and other units of size for obtaining densities are discussed.
Abstract: Chen's data for the raw frequency of use of 138 physics journals in the science library at M I T are re-examined and converted to densities of use-per-metre of shelf. Other units of size for obtaining densities, and their measurement, are discussed. There is no evidence for synchronous obsolescence in the 1955 to 1968 volumes of these journals: instead there is some statistically significant evidence of greater density of use with greater age. Similar evidence elsewhere is cited. The ranking order for heaviness of use is also radically altered by converting raw frequencies to densities of use. It is suggested that, for comparing the relative values of different journals, or age groups, in library use or citation studies, analyses of raw frequencies are valueless, and indeed potentially dangerously misleading, until they are converted to allow for the numbers of available items in each group examined.

43 citations



Journal ArticleDOI
C. T. Yu1
TL;DR: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented and experimental results indicate that the proposed method is superior to the other methods.
Abstract: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented. It is compared to Bonner's Method, Rocchio's Method, Dattola's Method and the Single Link Method in three different aspects, namely system effectiveness, system efficiency and the time required for clustering. Experimental results using the Cranfield 424 collection indicate that the proposed method is superior to the other methods.

22 citations


Journal ArticleDOI
TL;DR: The tradition, or world view, bearing on the scientific study of information is explored and the currently prevalent world view is the scientific tradition which extends from the Enlightenment to the present.
Abstract: Throughout my years of work as an information scientist I have been plagued by a personal and professional sense of doubt with respect to the field. A central theme of every conference that I have attended related to: “What is information science?” Or “Is information science a science?” In this paper I hope to take these questions head on. I must begin by saying that I do take information science seriously as a science. I see it as the quest for understanding of the nature of information and man's interaction with it. That we lack so much in this quest for understanding is the greatest challenge of the science. I intend to explore the tradition, or world view, bearing on the scientific study of information. The currently prevalent world view is the scientific tradition which extends from the Enlightenment to the present. I also intend to look critically at what I perceive to be the premises underlying most of our. current efforts to understand the phenomenon of information. The criticism will by necessity be speculative. I intend to stick my neck out, not because I can prove my assertions, but because I believe these ideas must be discussed.

20 citations


Journal ArticleDOI
TL;DR: Following the construction and assessment of a network describing the development of amorphous semiconductors, it is concluded that citation diagrams are a valuable aid for socio-scientific studies.
Abstract: The deductions which may be made from inspection of a citation map of the literature are briefly described, and a search algorithm is given for the construction of such maps. Following the construction and assessment of a network describing the development of amorphous semiconductors, it is concluded that citation diagrams are a valuable aid for socio-scientific studies.

19 citations


Journal ArticleDOI
TL;DR: This work describes a new graphical method of storing and retrieving concept relations of various kinds that serves as a kind of presentation of the essentials of a document to the reader that is much more lucid than a natural language text.
Abstract: Successful information retrieval from a mechanized file is heavily dependent on the fidelity of the representation of concepts in the particular language of the system and on the predictability of this representation. If an index language is employed, predictability is guaranteed and the quality of the retrieval is predominantly governed by the fidelity of the representation, i.e., by the extent to which conceptual distortion of the concepts to be represented can be avoided. The various index languages vary widely with respect to their fidelity. Differences in their performance are correspondingly great. The lack of fidelity in most of the present day indexing languages is due mainly to insufficient representation of the relationships among concepts. We describe a new graphical method of storing and retrieving concept relations of various kinds. The points of such a graph are occupied by concepts, and the connecting lines between these points represent concept relations. In a special field of chemistry, these graphs also serve as a kind of presentation of the essentials of a document to the reader that is much more lucid than a natural language text.

17 citations


Journal ArticleDOI
TL;DR: A rationale for constructing the metalanguage is provided in the context of a long‐range program of research, and illustrated by reference to an existing Computer‐Assisted Language Analysis System (CALAS) for use with English language texts.
Abstract: Systematic research on human communication will be enhanced by access to a metalanguage, which analyzes natural language texts rapidly and accurately into their structural counterparts. A rationale for constructing the metalanguage is provided in the context of a long-range program of research, and illustrated by reference to an existing Computer-Assisted Language Analysis System (CALAS) for use with English language texts. Such a metalanguage has immediate practical applications. Its underlying rationale also may be extended to encompass the study of policy-oriented communications among persons or groups within or across human cultures.

Journal ArticleDOI
TL;DR: Using sociometric techniques and ratings of professional status as a researcher, a highly elite invisible college was identified which crossed international boundaries and was considered by the professional community in high energy physics to be doing the most important work in the area.
Abstract: This paper presents the first empirical sociometric evidence of an international invisible college for information exchange. Using sociometric techniques and ratings of professional status as a researcher, a highly elite invisible college was identified which crossed international boundaries. Members of this elite group were found to be in relatively frequent contact with one another and were considered by the professional community in high energy physics, the context of this study, to be doing the most important work in the area. An important part of the information infrastructure of this invisible college consists of intermediaries who perform a gatekeeping and linkage function. Selected differences in communication behavior between the key people of the invisible college and the intermediaries is noted briefly. Also selected communication differences are noted between members of the invisible college and a matched sample wholly outside this special information network. The paper concludes with suggested questions for future studies.

Journal ArticleDOI
TL;DR: Benefits of Swets's theory of information retrieval are the beginnings of a quantitative description of retrieval languages, a clear distinction between retrieval `systems' and `language', a recognition that retrieval performance can be tailored to suit individual needs in a systematic way, and confirmation that question Generality is a pivotal feature of the retrieval process.
Abstract: Swets's theory of information retrieval allows the threads of document weighting formulae, probabilistic measures of effectiveness, and management theory to be woven into a coherent pattern. Benefits of the theory are the beginnings of a quantitative description of retrieval languages, a clear distinction between retrieval `systems' and `language', a recognition that retrieval performance can be tailored to suit individual needs in a systematic way, and confirmation that question Generality is a pivotal feature of the retrieval process. The hypotheses involved are still in need of rigorous experimental testing.

Journal ArticleDOI
TL;DR: The problem addressed is that of optimally choosing the source from which to order a book by determining the tradeoff between discount and service time most consistent with the librarian's preferences.
Abstract: The problem addressed is that of optimally choosing the source from which to order a book. Each acquisition source is characterized by frequency distributions for discount and service time. A technique called decision analysis is used to determine the tradeoff between discount and service time most consistent with the librarian's preferences.

Journal ArticleDOI
TL;DR: The particular DIS experiment under discussion investigates the effects of explicit DIS training on an individual's world-view as well as the structure of syntehsizing contexts and the ability to “think” dialectically.
Abstract: This paper describes an experiment concerned with the investigation of presentation formats for Dialectic Information Systems (DIS). DIS are information systems which generate information for a decision maker by means of intense conflict between proponents of two radically opposing positions, theories, points of view, etc. In a DIS, information is generated through the confrontation between “data” and the world-views (Weltanschauungen) of opposing experts. The particular DIS experiment under discussion investigates the effects of explicit DIS training on an individual's world-view. The perception of drama and emotion, the structure of syntehsizing contexts, and the ability to “think” dialectically are also discussed as important features of the particular experiment.

Journal ArticleDOI
TL;DR: The problem of providing linguistic assistance to scientists in a large institution is reviewed on the basis of the author's personal experience, recommending specific solutions to the problems caused by the foreign‐language literature and language‐induced difficulties in communication.
Abstract: The problem of providing linguistic assistance to scientists in a large institution is reviewed on the basis of the author's personal experience. The various aspects of the problems caused by the foreign-language literature and language-induced difficulties in communication are listed, recommending specific solutions.

Journal ArticleDOI
TL;DR: Obsolescence rates of traditional geoscience fields seem to vary little between 1949 and 1969 in contrast to those of fast‐changing fields such as solid earth geophysics, which suggests the operation of factors that control research fronts.
Abstract: The United States (U.S.) geoscience literature is employed as a vehicle to study the phenomenon of obsolescence. Problems investigated include the classical and ephemeral aspects of subject literatures, diversity among narrowly defined literatures within broadly defined subject literatures, and the effect of literature growth on obsolescence. Comparisons are made: 1. among time-frequency bibliographs based on citation counts from each of twelve major journals published in 1969; 2. between bibliographs of three major journals for the years 1969 and 1949; and 3. between uncorrected and corrected obsolescence curves. Each journal yields citation patterns comprised of both an ephemeral and a classical literature component. Within this framework apparent obsolescence varies across a broad spectrum, from physics/chemistry-oriented geoscience subdisciplines with relatively short “half-lives,” to those biology-oriented with relatively long “half-lives.” Obsolescence rates of traditional geoscience fields seem to vary little between 1949 and 1969 in contrast to those of fast-changing fields such as solid earth geophysics. The relationship between obsolescence curves uncorrected and corrected for growth suggests the operation of factors that control research fronts. The effect of literature size on obsolescence, though minor for the recent literature, is more pronounced for the classical literature.

Journal ArticleDOI
TL;DR: A study of citations of scientific periodicals in technical reports prepared by the staff of an industrial research laboratory indicates that there exists a Bradford set of periodicals for a multidisciplinary library, and that the size and contents of the set have at least short‐term stability.
Abstract: A study of citations of scientific periodicals in technical reports prepared by the staff of an industrial research laboratory indicates that there exists a Bradford set of periodicals for a multidisciplinary library, and that the size and contents of the set have at least short-term stability. The effect on library coverage of varying the extent to which the Bradford set is held was determined. A discard policy based upon a single aging factor does not appear to be very well suited to a multidisciplinary library.

Journal ArticleDOI
TL;DR: Question analysis and search strategy development, two major components in the process of answering reference questions, were characterized as nine decision making steps, concluding that it would be very difficult, if not impossible, to have these reference process steps performed by machine.
Abstract: Question analysis and search strategy development, two major components in the process of answering reference questions, were characterized as nine decision making steps. Twenty-eight reference questions were analyzed in terms of these nine steps. The answers to each step were examined to determine whether rules for performing the individual steps by machine could be developed. Reasons are given for concluding that it would be very difficult, if not impossible, to have these reference process steps performed by machine.

Journal Article
TL;DR: A mapping of a portion of Medical Subject Headings to three other controlled vocabularies has been constructed and a preliminary test of its effectiveness based on searches in the subject areas of applications of physics and engineering in medicine shows limitations.
Abstract: Two trends, the growth of interdisciplinary research and the proliferation of machine‐readable data bases, require that new techniques and tools be applied to facilitate use of scientific and medical literature in support of research. In order to permit systematic searching of multiple data bases to satisfy requests in interdisciplinary areas, a mapping of a portion of Medical Subject Headings (MeSH) to three other controlled vocabularies has been constructed. A description of the development of the mapping is followed by details of a preliminary test of its effectiveness based on searches in the subject areas of applications of physics and engineering in medicine. The test shows: (1) Index Medicus provides an average of 81% of the citations retrieved in searches using more than one source; (2) in some subject areas use of the mapping does allow an increase in the number of items retrieved, but with a loss in precision. Limitations of the mapping revealed by the test suggest possible alternative applications of the mapping in controlled vocabulary development and in on‐line searching.

Journal ArticleDOI
Ivo Steinacker1
TL;DR: The algorithm is proposed to solve the problem of sequential indexing which does not use any grammatical or semantic analysis, but follows the principle of emulating human judgement by evaluation of machine‐recognizable attributes of structured word assemblies (text).
Abstract: Intellectual indexing proceeds on three levels: The selection of phrases occurring in the document text (sequential indexing), the posting of specific phrases from the text to generic descriptors (generic indexing), and the choice of descriptors which are implicit to the document text (symbolic indexing). Automation has been attempted on all three levels: by concordance and autoposting. Here an algorithm is proposed to solve the problem of sequential indexing which does not use any grammatical or semantic analysis, but follows the principle of emulating human judgement by evaluation of machine-recognizable attributes of structured word assemblies (text). The algorithm is based on producing “text cuts” of a few words in length and ordering them alphabetically. Afterwards, every “text cut” which appears with a certain limit frequency or above is considered significant (by human standards). The algorithm has been applied to a text body of about 220,000 words from the NASA bibliographic file and an “established” dictionary of significant terms has been created by this algorithm. As any phrase not occurring in the established dictionary is not suppressed, but posted to a floating dictionary, from which it may, if usage increases above the limit frequency, be transferred to the established dictionary, the algorithm presents a tool for the creation and maintenance of a “self-adaptive” data base of text information.


Journal ArticleDOI
TL;DR: The problem of whether to bind, microcopy or discard back issues of journals in a university library branch system is considered and an algorithm is developed to solve the journal disposition problem, employing an analytical model.
Abstract: The problem of whether to bind, microcopy or discard back issues of journals in a university library branch system is considered and an algorithm is developed to solve the journal disposition problem A cost-effectiveness approach is pursued, employing an analytical model This model uses a set of weighted factors to quantify the value of a specific journal to the library Such factors as relevance, usage, availability elsewhere and capital investment are specified Budget constraints, which involve the relevant costs of binding and microcopying, are considered, as are upper and lower threshold values on the worth of a journal to guarantee the retention of exceptionally good journals and the disposal of very poor ones An example based on data from a real university special library situation is presented as an illustration of the model Thus, this paper extends the work of others in modeling of the library collection development decision process

Journal ArticleDOI
TL;DR: A mathematical model is presented that can be used to predict a library's circulation and requires little data other than what libraries normally collect.
Abstract: A mathematical model is presented that can be used to predict a library's circulation. Variables considered include changing user population, decreasing demand for older items in the collection, purchasing and circulation policies, weeding and other losses, and inflation. The model requires little data other than what libraries normally collect. An example of applying the model to a particular library is discussed. Possible extensions of the model are noted.

Journal ArticleDOI
TL;DR: On‐line interactive searching of several information bases through several service operators was introduced in an industrial research environment and is now an established search tool at Exxon Research and Engineering Company.
Abstract: On-line interactive searching of several information bases through several service operators was introduced in an industrial research environment. Thorough knowledge of the information base and its structure in the search system is a major factor for successful searching, and differences among search systems do not present serious barriers. This new technique was most effectively used when the information specialist and the scientist searched as a team. On-line searching is now an established search tool at Exxon Research and Engineering Company.

Journal ArticleDOI
TL;DR: The terms from 228 individual SDI profiles of 104 research chemists were matched against the citations retrieved by these profiles in eight consecutive runs of Basic Journal Abstracts to determine the number of citations that would have been retrieved for each individual user if titles only had been searched.
Abstract: The terms from 228 individual SDI profiles of 104 research chemists were matched against the citations retrieved by these profiles in eight consecutive runs of Basic Journal Abstracts to determine the number of citations that would have been retrieved for each individual user if titles only had been searched. Computing the data as a ratio of title hits to total hits, the mean percentage of retrieval of citations if only titles are searched when compared with searching of titles and abstract text was 27.0%. This represents an average loss of 73% of the potential output to the individual SDI user. One can expect a higher level of retrieval by titles alone if only a single term is required for a match than in searches requiring more than one term to be present to constitute a match (39% compared with 16.8% in this study).

Journal ArticleDOI
TL;DR: An evaluation of a current awareness service in Physics and Astronomy at The University of Texas at Austin is presented and it was found that 67% rated the current awareness printouts favorably and 20% unfavorably.
Abstract: An evaluation of a current awareness service in Physics and Astronomy at The University of Texas at Austin is presented. The service is provided currently to over 130 physicists and astronomers; about half of them participated in the evaluation. The computer-produced printouts were derived from the monthly SPIN tapes of the American Institute of Physics. It was found that 67% rated the current awareness printouts favorably and 20% unfavorably. Participants tended to use SPIN to supplement their usual literature needs. The primary complaints about the effectiveness of the service were the broadness of the classification scheme, insufficient journal coverage, and slowness in receipt of abstracts on the tapes. A retrospective search facility is available but not heavily used at the present time, due to the short time span covered by the tapes.

Journal ArticleDOI
TL;DR: In this article, a mapping of a portion of Medical Subject Headings (MeSH) to three other controlled vocabularies has been constructed, and a preliminary test of its effectiveness based on searches in the subject areas of applications of physics and engineering in medicine is presented.
Abstract: Two trends, the growth of interdisciplinary research and the proliferation of machine-readable data bases, require that new techniques and tools be applied to facilitate use of scientific and medical literature in support of research. In order to permit systematic searching of multiple data bases to satisfy requests in interdisciplinary areas, a mapping of a portion of Medical Subject Headings (MeSH) to three other controlled vocabularies has been constructed. A description of the development of the mapping is followed by details of a preliminary test of its effectiveness based on searches in the subject areas of applications of physics and engineering in medicine. The test shows: (1) Index Medicus provides an average of 81% of the citations retrieved in searches using more than one source; (2) in some subject areas use of the mapping does allow an increase in the number of items retrieved, but with a loss in precision. Limitations of the mapping revealed by the test suggest possible alternative applications of the mapping in controlled vocabulary development and in on-line searching.

Journal ArticleDOI
TL;DR: It is suggested that if scholarliness is to be measured by the presence or absence of references, then it should be measurement by reference to the work of others in the field.
Abstract: I would like to comment on the article, “Citation of the Literature by Information Scientists in Their Own Publication,” by Donald A. Windsor and Diane M. Windsor, Journal of the American Society for Information Science 24 (No. 5): 377 (1 973). My comments pertain to the authors’ premise: “We consider the distinction between scholarly and nonscholarly literature to hinge on the presence or absence of references.” In their analysis of the references contained in papers, there is no distinction made between “self-citation” and “other-authorcitation.” I believe that this distinction has a direct bearing on their premise. Consider the following sequence of events. Author A submits a paper containing no references, which is reviewed, accepted, and published. Since it contains no references, it is “nonscholarly.” Author A continues research along the Same lines, writes a paper containing a reference to his first paper, and it is published. The second paper, according to the Windsors’ criterion, is a “schokriy” paper. Has Author A indeed published a more scholarly paper the second time? Perhaps, but the scholarliness has nothing to do with the appearance of a reference in the paper. If third, fourth, and fifth papers are published by Author A as he continues to work in the same area of inquiry, does the scholarliness of each succeeding paper increase (since he can offer two, three and four references)? I suggest that if scholarliness is to be measured by the presence or absence of references, then it should be measured by reference to the work of others in the field.

Journal ArticleDOI
TL;DR: As one who is not a librarian, but deeply interested in the application of computers to information services, 1 found the author's viewpoint disturbing and the article left me with the overall impression of a study in apologetics for reference librarians.
Abstract: As one who is not a librarian, but deeply interested in the application of computers to information services, 1 found the author's viewpoint disturbing. The article left me with. the overall impression of a study in apologetics for reference librarians. This impression is all the more unfortunate since the author is involved in teaching students of library science. There is no need for the refernce librarian to be on the defensive. There is a need, however, for researchers to ask just where in the process of reference analysis new tools and capabilities (including computerized techniques) can most efficaciously be introduced. It is time that the spectre of the computer replacing librarians be laid to rest as it has been elsewhere. The computer represents a source of powerful tools which librarians, as others, must learn to employ, not excuse. October 1972 The area of general question answering is that of machine translation of natural language. Here, as elsewhere, the only truly successful role of the computer has been a synergistic one: that is, assisting the human intellect-not replacing it. The efforts of Meredith (ref. S), et al., are along these lines. It is hoped that a review of their more recent findings will be reported soon.