scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1977"


Journal ArticleDOI
TL;DR: Investigators studying the applicability of “Lotka's law” to the humanities and to map librarianship may have misinterpreted Lotka's Law and concluded erroneously that the law applies to these fields.
Abstract: In 1926, Alfred Lotka examined the frequency distribution of scientific productivity of chemists and physicists. After analyzing the number of publications of chemists listed in Chemical Abstracts 1907–1916 and the contributions of physicists listed in Auerbach's Geschi-chtstafeln der Physik, he observed that the number of persons making n contributions is about 1/n2 of those making one and the proportion of all contributors that make a single contribution is about 60%. Recently, investigators studying the applicability of “Lotka's law” to the humanities and to map librarianship may have misinterpreted Lotka's law and have concluded erroneously that the law applies to these fields. Corrected calculations indicate that Lotka's law does not apply.

200 citations


Journal ArticleDOI
TL;DR: This paper shows how aboutness is related to probability of satisfaction and shows that about is, in fact, not the central concept in a theory of document retrieval.
Abstract: The primary objective of this paper is to examine the concept of about as it is used in its information retrieval sense when, for example, an indexer judges that a document is (or is not) about some given subject. The problem with about is that it is a very complex notion and we are unable to say precisely what it is we do when we make judgment of aboutness. Since about is at the heart of indexing, how are we to formulate any proper theory of indexing if we cannot explicate precisely the key concept of about? In this paper we look at this concept of about and offer a solution to the problem mentioned; it consists of an operational definition of about which interprets about in terms of search behavior. A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person's information need. This paper shows how aboutness is related to probability of satisfaction.

170 citations


Journal ArticleDOI
TL;DR: The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query.
Abstract: This study examined the effectiveness and efficiency of employing a fully automatic algorithm for ranking the results of Boolean searches of an inverted file design document retrieval system. The study indicated that with minor modification of file designs, such as those implemented in the Syracuse Information Retrieval Experiment (SIRE), document retrieval systems could efficiently provide users with output lists on which the rank order of a document is a good indicator of its probable relevance to the user's information need. The study found that relevant documents were ranked significantly higher than nonrelevant documents in the set of documents retrieved in response to a Boolean query. By utilizing an augmented inverted file design the variable incremental cost for ranked output was only ten cents per query. There was no increased user effort.

92 citations


Journal ArticleDOI
TL;DR: A comparison of clustering times with other methods show that large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.
Abstract: A method for clustering large files of documents using a clustering algorithm which takes O(n2) operations (single-link) is proposed. This method is tested on a file of 11,613 documents derived from an operational system. One property of the generated cluster hierarchy (hierarchy connection percentage) is examined and it indicates that the hierarchy is similar to those from other test collections. A comparison of clustering times with other methods showsthat large files can be clustered by single-link in a time at least comparable to various heuristic algorithms which theoretically require fewer operations.

72 citations


Journal ArticleDOI
TL;DR: It appears that people may be less successful than the authors have thought in using subject catalogs, as overall matching success was strikingly low.
Abstract: The study examined the effects of two variables on success in searching an academic library subject catalog that uses Library of Congress subject headings. The vari-ables were “subject familiarity,” and “catalog familiar-ity,” representing patron knowledge of a subject field and of the principles of the subject heading system, re-spectively. Testing was done in a laboratory setting which reproduced a real search situation. The n varied with the particular test, but about 20 university students in each of the following majors participated: psychology, economics, librarianship. Success was measured as degree of match between search term and term used by the library for desired books on the subject. Catalog familiarity was found to have a very signifi-cant beneficial effect on search matching success, and subject familiarity a slight, but not significant, detri-mental effect. An interview substudy of subject experts suggested causes for the failure of subject expertise to help in catalog search term formulation. Surprising results were that overall matching success was strikingly low. Since the methodology used enabled a more precise determination of match success than has been typical of catalog use studies, it appears that people may be less successful than we have thought in using subject catalogs.

71 citations


Journal ArticleDOI
Donald T. Hawkins1
TL;DR: Many more than subject information is available in most of the data bases currently available, such as author names, corporate affiliations, journal titles, and CODEN, which are useful for bibliometric‐type studies, that is, quantitative analysis of the bibliographic features of a body of literature
Abstract: On-line interactive literature searching systems have “come of age” and have revolutionized information retrieval techniques. They are now widely used for subject-oriented searching. Much more than subject information is available in most of the data bases currently available, such as author names, corporate affiliations, journal titles, and CODEN. These are useful for bibliometric-type studies, that is, quantitative analysis of the bibliographic features of a body of literature. Several examples are given, including journal comparison studies, corporate affiliation studies, and statistical studies. Inconsistencies and errors in data bases become important, and the searcher must be alert to their existence. Indexing policies of the different data bases must also be taken into consideration.

66 citations


Journal ArticleDOI
TL;DR: An index of concentration for rank-frequency distributions is proposed which permits comparison of subject and journal concentration in various fields and holds some promise of providing a common measure by which to compare the large number of specific usage and citation studies already completed, and providing a point of departure for new ones.
Abstract: An index of concentration for rank-frequency distributions is proposed which permits comparison of subject and journal concentration in various fields. A mathematical model of random dispersion (the Whit-worth distribution) of articles is suggested. Applications of the measure to several different aspects of bibliometrics are suggested. The measure holds some promise of providing a common measure by which to compare the large number of specific usage and citation studies already completed, and providing a point of departure for new ones.

60 citations


Posted ContentDOI
TL;DR: This paper examines the application of symmetry principles to bibliometric laws, using Lotka's law for illustration, and it is shown that the function occurring in Lotka’s law is essentially the only one with desireable properties that is consistent with the symmetry constraint.
Abstract: This paper examines the application of symmetry principles to bibliometric laws, using Lotka's law for illustration. A general model describing scientific productivity is defined and modified to be consistent with a generalized form of Lotka's law; the model in this form is shown to be stable with regard to at least two forms of social change. Then the consequences of invariance under change of time span are investigated, and it is shown that the function occurring in Lotka's law, with an arbitrary exponent, is essentially the only one with desireable properties that is consistent with the symmetry constraint.

52 citations


Journal ArticleDOI
TL;DR: A reinterpretation of Shannon's mathematical theory of communication based on the definition of new symbol sets, comprising approximately equally-frequent strings of characters, is presented and is shown to have wide applicability in computer-processing of texts.
Abstract: The conventional interpretation of Shannon's mathematical theory of communication in relation to textual material is unduly restrictive and unhelpful. A reinterpretation which is based on the definition of new symbol sets, comprising approximately equally-frequent strings of characters, is presented. It is shown to have wide applicability in computer-processing of texts. Moreover, it provides a more general formalism for considering methods of representing, storing and retrieving the subject content of documents.

46 citations


Journal ArticleDOI
TL;DR: The pattern of the empirical results conforms to the prediction that approximately 30% of the relevance-decision variance is attributable to variables tapping openness to information, and it is argued that this finding is due to a decrease in power resulting from the decrease in sample size rather than from an inadequate or erroneous model.
Abstract: The possibility that the relevance decision may be affected by individual differences in openness to information is examined. Openness to information is operationally defined by a series of cognitive style variables (openmindedness, rigidity, category width, locus of control, anxiety, and defensiveness). A multiple-regression technique was utilized to simultaneously test the effect of cognitive variables and previously tested variables tapping judges' interest and expertise in the problem area. Subjects made relevance decisions on a randomly generated list of citations for a question provided by the experimenter. Divergent behavior on the dependent variable (number of citations deemed relevant) by two groups of subjects necessitated splitting the initial sample of 48 into two independent groups of 25 and 23. The pattern of the empirical results conforms to the prediction that approximately 30% of the relevance-decision variance is attributable to variables tapping openness to information. The empirical results do not reach the normative criterion of α = 0.05. It is argued that this finding is due to a decrease in power resulting from the decrease in sample size rather than from an inadequate or erroneous model. The results are discussed in terms of the relationship between information systems and the “epistemic who”.

41 citations


Journal ArticleDOI
TL;DR: A six-month journal evaluation study was conducted at the two NOAA libraries in Boulder, Colorado, from 28 October 1975 to 28 April 1976 to assist in subscription renewal activities, enhancement of collection relevance, and determination of efficient methodology.
Abstract: A six-month journal evaluation study was conducted at the two NOAA libraries in Boulder, Colorado, from 28 October 1975 to 28 April 1976 to assist in subscription renewal activities, enhancement of collection relevance, and determination of efficient methodology. Data were collected from a use study, circulation and interlibrary loan statistics, a core list, local availability, questionnaire returns, subscription costs, and librarian and patron input. Results showed that (1) monitoring use for three months pulled 84% of the low-use titles obtained in the six-month study indicating that a three-month study is sufficient; (2) considering the lower 31% of the collection in terms of six-month useage, the same titles were pulled 92% of the time when using raw use versus use density data, indicating that use density data need not be obtained for determination of low value titles; (3) when two or more scientists (about 0.8% of scientists responding to a questionnaire) recommended a title, it appeared as a low-use (two or fewer uses) title 5.5% of the time (13.7% of the time for one or more scientists) thus indicating the significance of scientists' recommendations; (4) a balance index for determination of collection balance was derived.

Journal ArticleDOI
TL;DR: A theoretical framework for future research in this area is proposed because reviews appear to fulfill two interlocking roles: that of forming an integral part of the development of science and that of supplying individual workers with information about the current development ofScience and its literature.
Abstract: Following a general discussion of reviews and of the problems of identification and definition, the literature on user studies, both of reviews and scientific literature in general, is briefly and selectively surveyed. Given the apparent usefulness of reviews and their high cost of production, there has been surprisingly little research into the uses made of reviews. A theoretical framework for future research in this area is proposed. Reviews appear to fulfill two interlocking roles: that of forming an integral part of the development of science (historical functions) and that of supplying individual workers with information about the current development of science and its literature (contemporary functions).

Journal ArticleDOI
TL;DR: The experimental results obtained indicate that the fast single‐pass method is effective in the assessment of the relationships between terms and the improvement achieved in retrieval performance over simple keyword matching is quite significant.
Abstract: A fast single-pass method for the automatic determination of the semantic relationships between terms is presented. The computing time required for the method is small enough for it to be feasible in a practical environment. The experimental results obtained indicate that the method is effective in the assessment of the relationships between terms. The improvement achieved in retrieval performance over simple keyword matching is quite significant.

Journal ArticleDOI
TL;DR: It is suggested that the core journals may not play an integrating role within sociology and that greater attention should be focused on the specialty journals.
Abstract: Citations to articles published during 1960 in three “core” sociological journals, cited in ten sociological journals (1961-70), are examined. One-third of the articles were not cited at all, while only eleven percent of the cited articles were cited in more than one-half (six or more) of the journals. Articles are more likely to be cited in core journals than in specialized journals. Studies of the reading and citing practices of authors are needed. Finally, it is suggested that the core journals may not play an integrating role within sociology and that greater attention should be focused on the specialty journals.

Journal ArticleDOI
TL;DR: A variety of pricing methods can be used to meet a given pricing objective, and several of these pricing practices are described using a basic classification scheme developed for the field of marketing: cost-oriented, competition-oriented and demand-oriented approaches to pricing.
Abstract: The pressures of limited budgets and of increased competition for use of funds are forcing information centers and libraries to initiate user fees. There have been, however, few precedents or guidelines to which the information service administrator might turn for assistance in formulating pricing policy. In this paper, components of the pricing decision are identified: the pricing objectives pursued, the pricing policies these aims are translated into, and the actual pricing methods employed in calculating what price to charge. A variety of pricing methods can be used to meet a given pricing objective, and several of these pricing practices are described using a basic classification scheme developed for the field of marketing: cost-oriented, competition-oriented, and demand-oriented approaches to pricing. Costs represent a starting point for developing pricing structure, and cost-oriented pricing techniques are taken up in detail. Cost functions are identified and their impact on price setting is suggested. Examples and data are provided from recent research into the pricing of computer-assisted selective dissemination of information (SDI) services. The techniques of price discrimination, marginal cost pricing, and break-even analysis are discussed. In making policy decisions, the need for accurate and complete cost data and demand information is emphasized, and several common approaches to estimating demand elasticity are suggested. The paper ends with a brief enumeration of sources of models to be explored for their applicability to the pricing of information services.

Journal ArticleDOI
TL;DR: Performance of interlibrary loan networks in terms of probability of success and average time to satisfy a request is enhanced when location and availability information can be accessed.
Abstract: Performance of interlibrary loan networks in terms of probability of success and average time to satisfy a request is enhanced when location and availability information can be accessed. Existing computer technologies such as shared cataloging networks and automated circulation systems can be of use in obtaining this information. A procedure is presented for quantitative assessment of the impact of these technologies and their various combinations on interlibrary loan activities. As an example, the procedure is utilized for predicting the impact of these computer technologies on the Illinois Library and Information Network (ILLINET). Results show that the value of location information as obtained from a shared cataloging network or similar technology is highly dependent on the information being specific enough to free the lending library from searching their own main catalog. The value of availability information is shown to be related to the processing time that can be avoided by having prior information about the circulation status of the desired item. These results are dependent on the policies employed in Illinois. However, the assessment procedures presented here have general applicability to interlibrary loan networks.

Journal ArticleDOI
TL;DR: In an evaluation of user reaction, over ninety percent of the respondents rated the machine translation (MT) service “good” or “acceptable” on translations of their subject specialty.
Abstract: Since 1964, as an adjunct to its automated technical information processing services to ERDA and other federal agencies, a generalized language translation system has been used by the Oak Ridge National Laboratory (ORNL) to translate Russian scientific text to English. The translation system, first implemented at Georgetown University around 1960, has been rewritten and improved through the years as computer models changed. Although the translations lack high literary quality, the system, by means of its context sensitive dictionary, nevertheless provides inexpensive, fast and highly useful translations of scientific literature. The method used involves a linguistically-oriented programming language called Simulated Linguistic Computer (SLC), with which a language-specific dictionary can be written for use by the translation system. The dictionary entry for any word can be augmented by procedures which permit its meaning to be modified by its context; more general linguistic procedures operate on the sentence as a whole. In an evaluation of user reaction, over ninety percent of the respondents rated the machine translation (MT) service “good” or “acceptable” on translations of their subject specialty. Development, implementation, and documentation of the system are continuing, as we meet increasing requests for service and attempt new applications of the MT system.

Journal ArticleDOI
TL;DR: NEPHIS is a system of computer-aided permuted subject indexing designed to be an easy as possible for the indexer, for the programmer, and for the user of the index.
Abstract: NEPHIS is a system of computer-aided permuted subject indexing designed to be an easy as possible for the indexer, for the programmer, and for the user of the index. The indexer needs to learn only four commands in order to be able to construct any input string. Yet the permutations produced are elegant and browsable, while providing a complete description of even quite complicated subjects. The program is cheap to run, requiring only an input file and an output file and 1K of core.


Journal ArticleDOI
TL;DR: Of the four independent variables, educational level exerts the largest effect followed by per capita library collection, and findings suggest several important policy implications to those who allocate library resources to competing libraries.
Abstract: Modern systems theory is applied in constructing a recursive, multivariate causal model for an analysis of intercounty variations of the public library output. In the model, drawing the data on population and libraries from the counties in Florida, selected sociodemographic variables of the community are considered as exogenous variables, while the resource characteristics of the library are designated as intermediate variables which link the exogenous variables to the endogenous variable, per capita circulation. The results of our analysis indicate that four independent variables (educational level, proportion of registered library users, per capita operating cost, and per capita library collection) account for about 74% of the variance in per capita library circulation. Of the four independent variables, educational level exerts the largest effect followed by per capita library collection. Findings suggest several important policy implications to those who allocate library resources to competing libraries.

Journal ArticleDOI
TL;DR: From the sample data it is concluded that the cost‐effectiveness measures are not sensitive to input‐variable changes and there is a good statistical relation between the cost of an on‐line search and the value of the input variables.
Abstract: A methodology is proposed to aid in system design and evaluation of on-line bibliographic search systems. It uses multiple linear regression analysis to ascertain if there is a relationship between measures of search out-put (such as cost-effectiveness and cost) and input (such as the characteristics of the search-number of descriptors used, number of Boolean operators, charateristics of the requestor, and characteristics of the searcher). Four measures of cost-effectiveness and cost alone are pre-sented and their values are calculated using empirical data from the DIALOG system for a sample of searches. From the sample data it is concluded that the cost-effectiveness measures are not sensitive to input-variable changes. In contrast, there is a good statistical relation between the cost of an on-line search and the value of the input variables.

Journal ArticleDOI
TL;DR: A metatheory simplifies the evaluation of existing theories by making clear when theories are comparable in a meaningful way and, when they are not comparable, by showing what type of transformations are necessary to make them comparable.
Abstract: Information science would do well to develop more and better theories, for without adequate theoretical support, we may do a technically brillant job of solving the wrong problems. However, the interdisciplinary nature of what information science is becoming means we must use a metatheory to guide the development of these theories. Without metatheory we cannot compare or unify the theories. A major statement in such a metatheory is to claim that there are three levels that should be used for information theorizing: casual, macroscopic, and microscopic. Use of these levels, and of the rules they imply, simplifies the evaluation of existing theories by making clear when theories are comparable in a meaningful way and, when they are not comparable, by showing what type of transformations are necessary to make them comparable.

Journal Article
TL;DR: Three models for the design of information systems are presented: microanalytic, macroAnalytic, and esoteric, which have characteristics which are illustrated by a discussion of systems selected from the literature.
Abstract: Three models for the design of information systems are presented: microanalytic, macroanalytic, and esoteric. Each model has characteristics which are illustrated by a discussion of systems selected from the literature. The underlying design principles of each model are identified and evaluated in terms of their effectiveness in solving the central design problem. This central problem is the encroachment of technology into areas which traditionally have been the sole domain of human thought. The result has been conflict between subconscious expectations of users and explicit system goals. The discussion emphasizes those principles of design which are most useful in identifying and resolving this conflict.

Posted Content
TL;DR: The basic assumptions of the different methods of thesaurus construction are analyzed both from the epistemological point of view as well as from the information retrieval needs of the users.
Abstract: The basic assumptions of the different methods of thesaurus construction are analyzed both from the epistemological point of view as well as from the information retrieval needs of the users. The construction of thesauri for social sciences becomes particularly difficult, because of the inherent ambiguity of the terms used in social sciences. Since the idea of a “growing thesaurus” is impracticable, it is suggested that an empirical study of the semantic differential of different sociological groups would suffice for the purpose of thesaurus construction.

Journal ArticleDOI
TL;DR: A semantic differential scale was designed to assess users' attitudes toward a batch mode retrieval system based on the ERIC tape data base and credence was given to more recent semantic differential study results as opposed to the classic conclusions of Osgood, Suci, and Tannen-baum (1957).
Abstract: A semantic differential scale was designed to assess users' attitudes toward a batch mode retrieval system based on the ERIC tape data base operated by the Information/Knowledge Research Center of the Faculty of Education of the University of British Columbia. Ten concepts representative of the system were grouped (Input, Output, and General) and matched with sixteen adjective pairs (Evaluative, Desirability, and Enormity). Demographic information was obtained from the 35 faculty members and graduate student users responding to the questionnaire. Although questionnaire returns were somewhat low (37%), statistical analysis by grouping and item analysis generally confirmed the hypothetical clustering of both concepts and adjective pairs. Moreover, credence was given to more recent semantic differential study results (Katzer, 1972) as opposed to the classic conclusions of Osgood, Suci, and Tannen-baum (1957). A short form of the questionnaire was designed to provide for continuous evaluation of the system.

Journal ArticleDOI
TL;DR: During the past few years the National Center of Scientific and Technological Information in Israel, has made an effort to plan the national information services.
Abstract: During the past few years the National Center of Scientific and Technological Information (COSTI) in Israel, has made an effort to plan the national information services. This planning activity has required the formalization of a science information policy as a background for its development. Following a short description of present COSTI activities, some policy considerations specific to Israel are elaborated upon and their application to future planning is discussed.

Journal ArticleDOI
TL;DR: The structure and mechanism of a Very Early Warning System (VEWS) which was devel‐oped as a tool for the early recognition and transfer of new technology, affecting both present and future R&D activities in the pharmaceutical sector is described.
Abstract: The article describes the structure and mechanism of a Very Early Warning System (VEWS) which was devel-oped as a tool for the early recognition and transfer of new technology, affecting both present and future R&D activities in the pharmaceutical sector The VEWS de-scribed is concerned with the collection, interpretation, analysis, and exploitation of new data from the world-wide scientific and patent literature, scientific meetings, government reports, and personal communications Sys-tems activities are grouped into four phases: data acquisition, data expansion, critical analysis, and data exploitation Iterative processing of an expanding com-puterized data base is a key feature of the system

Journal ArticleDOI
TL;DR: The general theory and a specific case study are presented in an effort to indicate both the flexibility and practicality of the method.
Abstract: Many authors have studied statistics on time since last use as possible useful aids for weeding collections. These approaches are special cases of a more general identifier method which allows greater flexibility in the selection and determination of objective retirement criteria. The general theory and a specific case study are presented in an effort to indicate both the flexibility and practicality of the method. The report concludes with a list of some of the more important considerations for execution of the general approach.

Journal ArticleDOI
TL;DR: The paradigm statement for a predictive science of semantic information is presented in the form of an informative act, a one-sentence concatenation of a performative preamble with one bit of perceived “hard” information which produces stress in the receiver which can be measured subjectively using the Holmes-Rahe stress scale.
Abstract: The paradigm statement for a predictive science of semantic information is presented in the form of an informative act, a one-sentence concatenation of a performative preamble with one bit of perceived “hard” information. Such sentences produce stress in the receiver which can be measured subjectively using the Holmes-Rahe stress scale. A unit called the whomp is introduced which describes the net effect of receiving a message with both hard information and human stress points. Such messages can be said to “produce” records. In an extension of the concept of the informative act to a major personage, such as a Head of State, we examine the public records in the form of books produced. Although the prediction is not as refined as we would like, we project somewhere between 16 and 32 titles to be listed in Cumulative Book Index under the subject heading “Nixon, Richard M.” during the twenty-year period 1974-1994, all other things being equal.

Journal ArticleDOI
TL;DR: The “heart” of the industrial technical library universe is encompassed in the library systems of 311 of FORTUNE'S “500” Corporations, and a “Library Penetration” ratio was evolved from the three measures.
Abstract: The “heart” of the industrial technical library universe is encompassed in the library systems of 311 of FORTUNE'S “500” Corporations. Of the 30+ variables used to delineate these 311 systems, the number of professional librarians, libraries, and corporations were most important. A “Library Penetration” ratio was evolved from the three measures. To meet the budget-setting need for cross-industry comparison, FORTUNE'S 29 industry classifications were used, with Chemicals and Petroleum Refining most important. Systems including overseas units, or internationally operating systems, were statistical leaders in every measure. Twenty-four pairs of measures were found to have a correlation (R) of 0.80 (or higher) with significance R of 0.00001. How the re-maining 189 of FORTUNE'S “500” operate without formal information services requires study.