scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1987"


Journal ArticleDOI
TL;DR: A system that provides a number of FACILITIES and SEARCH STRATEGIES based on an EMPHASIS on domain knowledge used for refining the model of the information need, and the provision of a blowing mechanism that allows the user to NAVIGATE through the knowledge base.
Abstract: THE MOST EFFECTIVE METHOD OF IMPROVING THE RETRIEVAL PERFORMANCE OF A DOCUMENT RETRIEVAL SYSTEM IS TO ACQUIRE A DETAILED SPECIFICATION OF THE USER''S INFORMATION NEED. THE SYSTEM DESCRIBED IN THIS PAPER, (I(EXPONENT 3)R), PROVIDES A NUMBER OF FACILITIES AND SEARCH STRATEGIES BASED ON THIS APPROACH. THE SYSTEM USES A NOVEL ARCHITECTURE TO ALLOW MORE THAN ONE SYSTEM FACILITY TO BE USED AT A GIVEN STAGE OF A SEARCH SESSION. USERS INFLUENCE THE SYSTEM ACTIONS BY STATING GOALS THEY WISH TO ACHIEVE, BY EVALUATING SYSTEM OUTPUT, AND BY CHOOSING PARTICULAR FACILITIES DIRECT- LY. THE OTHER MAIN FEATURES OF (I(EXPONENT 3)R)) ARE AN EMPHASIS ON DOMAIN KNOWLEDGE USED FOR REFINING THE MODEL OF THE INFORMATION NEED, AND THE PROVISION OF A BROWSING MECHANISM THAT ALLOWS THE USER TO NAVIGATE THROUGH THE KNOWLEDGE BASE.

323 citations


Journal ArticleDOI
TL;DR: A geometric analysis is advanced and its utility demonstrated through its application to six conventional information retrieval similarity measures and a seventh spreading activation measure, intended to complement, and perhaps to guide, the empirical analysis of similarity measures.
Abstract: We want computer systems that can help us assess the similarity or relevance of existing objects (e.g., documents, functions, commands, etc.) to a statement of our current needs (e.g., the query). Towards this end, a variety of similarity measures have been proposed. However, the relationship between a measure's formula and its performance is not always obvious. A geometric analysis is advanced and its utility demonstrated through its application to six conventional information retrieval similarity measures and a seventh spreading activation measure. All seven similarity measures work with a representational scheme wherein a query and the database objects are represented as vectors of term weights. A geometric analysis characterizes each similarity measure by the nature of its iso-similarity contours in an n-space containing query and object vectors. This analysis reveals important differences among the similarity measures and suggests conditions in which these differences will affect retrieval performance. The cosine coefficient, for example, is shown to be insensitive to between-document differences in the magnitude of term weights while the inner product measure is sometimes overly affected by such differences. The context-sensitive spreading activation measure may overcome both of these limitations and deserves further study. The geometric analysis is intended to complement, and perhaps to guide, the empirical analysis of similarity measures. © 1987 John Wiley & Sons, Inc.

279 citations


Journal ArticleDOI
TL;DR: It is demonstrated that certain unintended logical connections within the scientific literature are unmarked by reference citations or other bibliographic clues, and may be inherently and peculiarly difficult to solve because there are virtually no references in either literature to the other.
Abstract: This study demonstrates that certain unintended logical connections within the scientific literature, connections potentially revealing of new knowledge, are unmarked by reference citations or other bibliographic clues. Specifically, 25 biomedical articles central to the argument that dietary fish oil causes certain blood changes are compared with 34 articles on how similar blood changes might ameliorate Raynaud's disease. The two groups of articles are thus connected by a chain of reasoning implicitly suggesting that dietary fish oil might benefit Raynaud patients, an hypothesis not heretofore published explicitly. By retrieving and bringing together these two literatures, that implicit, unstated, and perhaps unnoticed hypothesis becomes apparent. The more general problem is posed of whether systematic search techniques for bringing together logically connected literatures can be developed and described, in the hope of discovering other implicit, unstated hypotheses. The example analyzed shows that the problem, while solved in this case by trial-and-error search methods, may be inherently and peculiarly difficult because there are virtually no references in either literature to the other, nor are there any clues from cocitation, bibliographic coupling, or statistical association of descriptors that the two literatures are logically related. © 1987 John Wiley & Sons, Inc.

181 citations


Journal ArticleDOI
Ronald R. Yager1
TL;DR: Methode de ponderation des termes de recherche dans le cadre des systemes relevant des ensembles flous dans la cadre oficiale de l'États-Unis.
Abstract: Methode de ponderation des termes de recherche dans le cadre des systemes relevant des ensembles flous

149 citations


Journal ArticleDOI
TL;DR: The authors compare deux approches du concept de base de donnees, bâti sur les ensembles flous, by comparing two approaches: base-de-donnees and base-of-models.
Abstract: L'auteur compare deux approches du concept de base de donnees, bâti sur les ensembles flous

62 citations



Journal ArticleDOI
TL;DR: The lines of research leading up to and forming the subfield of bibliometrics are traced from earliest times to the year 1969, when this term was proposed as a substitute for "statistical bibliography" as discussed by the authors.
Abstract: The lines of research leading up to and forming the subfield of bibliometrics are traced from earliest times to the year 1969, when this term was proposed as a substitute for “statistical bibliography.” © 1987 John Wiley & Sons, Inc.

51 citations


Journal ArticleDOI
TL;DR: Une interface basee sur the technique des ensembles flous est proposede pour gerer les incertitudes liees a la semantique du langage naturel.
Abstract: Description du modele conceptuel et du traitement des recherches en langage naturel dans les systemes d'information automatises. Une interface basee sur la technique des ensembles flous est proposee pour gerer les incertitudes liees a la semantique du langage naturel

48 citations


Journal ArticleDOI
TL;DR: The Indexing Aid Project is described, which aims to develop interactive knowledge-based systems for computer-assisted indexing of the periodical medical literature using an experimental frame-based knowledge representation language, FrameKit, implemented in Franz Lisp.
Abstract: This article describes the Indexing Aid Project for conducting research in the areas of knowledge representation and indexing for information retrieval in order to develop interactive knowledge-based systems for computer-assisted indexing of the periodical medical literature. The system uses an experimental frame-based knowledge representation language, FrameKit, implemented in Franz Lisp. The initial prototype is designed to interact with trained MEDLINE indexers who will be prompted to enter subject terms as slot values in filling in document-specific frame data structures that are derived from the knowledge-base frames. In addition, the automatic application of rules associated with the knowledge-base frames produces a set of Medical Subject Heading (MeSH) keyword indices to the document. Important features of the system are representation of explicit relationships through slots which express the relations; slot values, restrictions, and rules made available by inheritance through “is-a” hierarchies; slot values denoted by functions that retrieve values from other slots; and restrictions on slot values displayable during data entry.

33 citations



Journal ArticleDOI
TL;DR: This article presents a conceptual model of the retrieval process of a document-retrieval system, which has been prototypically implemented in modular form to test system response to changes in model parameters.
Abstract: This article presents our conceptual model of the retrieval process of a document-retrieval system. The retrieval mechanism input is an unambiguous intermediate form of a user query generated by the language processor using the method described previously. Our retrieval mechanism uses a two-step procedure. In the first step a list of documents pertinent to the query are obtained from the document database, and then an evidence-combination scheme is used to compute the degree of support between the query and individual documents. The second step uses a ranking procedure to obtain a final degree of support for each document chosen, as a function of individual degrees of support associated with one or more parts of the query. The end result is a set of document citations presented to the user in ranked order in response to the information request. Numerical examples are given to illustrate various facets of the overall system, which has been prototypically implemented in modular form to test system response to changes in model parameters. © 1987 John Wiley & Sons, Inc.


Journal ArticleDOI
TL;DR: Natural language texts are used extensively in a range of information science tasks and one such device, anaphoric reference, was investigated in a frequently used text type, namely, scientific abstracts, showing high feasibility of future algorithmic recognition of anaphic uses of terms.
Abstract: Natural language texts are used extensively in a range of information science tasks. Such use requires increased attention to discourse level linguistic phenomena which have the potential for impact on these tasks. One such device, anaphoric reference, was investigated in a frequently used text type, namely, scientific abstracts. Descriptive data on the extent of use of discourse anaphora in abstracts was gathered and rules for distinguishing anaphoric functioning of terms were compiled and tested. Results show a mean use of 3.67 functioning anaphors per abstract in a random sample of 600 abstracts from two databases. Testing of rules indicates high feasibility of future algorithmic recognition of anaphoric uses of terms. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A natural‐language text‐processing system designed as an automatic aid to subject indexing at BIOSIS with the most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed.
Abstract: This article describes a natural‐language text‐processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts — Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural‐language biological titles into title‐semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles against the system's Semantic Vocabulary (SV). The SV currently contains approximately 15,000 biological natural‐language terms and their translations in the language of Concept Primitives. For the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, rules based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are described. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A model in which the user of an online system continually updates his/her estimated probability of success, and quits or continues according to the expected utility of each action is examined.
Abstract: We examine a model in which the user of an online system continually updates his/her estimated probability of success, and quits or continues according to the expected utility of each action. The prior distribution of the unknown probability is a beta distribution, with mean determined by the a priori expectation of success, and variance determined by the confidence with which the user has that prior expectation. The stopping criterion depends upon the accumulated number of positive and negative reinforcements, and is a straight line in a suitable coordinate system. The user of an information-retrieval system reasons at two different levels. At the “working” level, she decides what steps to take next in carrying out the search at hand. At the “monitor” level, she simultaneously decides whether to continue the search at all or to terminate it. The same is true of a researcher in a library, an auditor reviewing financial records, or a detective of any kind. In this article we propose a very general model for the “monitor” process and apply it to the information-retrieval situation. We show that there is a very simple cutoff criterion which determines whether the search will be terminated. The resilience of the searcher to repeated failure is found to depend in a natural way on both the a priori

Journal ArticleDOI
TL;DR: In this article, the authors show that the geometric distribution is more concentrated than the Lotka distribution only for high values of the maximal production a source can have, while data following Lotka's law are more highly concentrated than data following Zipf's law.
Abstract: Pratt's measure C on the class concentration of distributions is calculated and interpreted for the laws of Zipf, Mandelbrot, and Lotka, and for the geometric distribution. Comparisons between each are made. We show that phenomena agreeing with Zipf's law are more concentrated than phenomena agreeing with Mandelbrot's law. On the other hand, data following Lotka's law are more concentrated than data following Zipf's law. We also find that the geometric distribution is more concentrated than the Lotka distribution only for high values of the maximal production a source can have. An explicit mathematical formula (in case of the law of Lotka) between C and x(θ), the fraction of the sources needed to obtain a fraction θ of the items produced by these sources (see my earlier article on the 80/20 rule), is derived and tested, unifying these two theories on class concentration. So far, C and x(θ) appeared separate in the literature. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A new concept called the cover coefficient (CC) will be used in computing TDVs, and it is shown that the computational cost of the CC approach in the calculation of TDVs is favorably comparable to the cost of a different approach that uses similarity coefficients.
Abstract: Indexing in information retrieval (IR) is used to obtain a suitable vocabulary of index terms and optimum assignment of these terms to documents for increasing the effectiveness and efficiency of an IR system. The concept of term discrimination value (TDV) is one of the criteria used for index-term selection. In this article a new concept called the cover coefficient (CC) will be used in computing TDVs. After a brief introduction to the theory of indexing and the CC concept, an efficient way of computing TDVs by use of the CC concept, index-term selection, and weight modification are discussed. It is also shown that the computational cost of the CC approach in the calculation of TDVs is favorably comparable to the cost of a different approach that uses similarity coefficients. Furthermore, the TDVs obtained by the CC approach are consistent with those of the latter approach.


Journal ArticleDOI
Padmini Das-Gupta1
TL;DR: If the two conjuncts are semantically similar then the conjunction is best interpreted as a Boolean OR, otherwise as an AND, which resulted in an algorithm which utilizes semantic information and some syntactic information to obtain the appropriate Boolean interpretation.
Abstract: It is generally recognized that the conjunction “and” plays an ambiguous role in natural language. When considered within the domain of Boolean document retrieval, this ambiguity makes the automatic Boolean interpretation of statements representing information needs a difficult task. The human analyst is able to resolve this ambiguity with relative ease. However, the processes employed appear complex and are not well understood. This article examines a semantic property of the conjunction, i.e., the semantic similarity between the conjuncts with a view to automatically resolving this ambiguity. Specifically, the idea examined is that if the two conjuncts are semantically similar then the conjunction is best interpreted as a Boolean OR, otherwise as an AND. The study resulted in an algorithm which utilizes semantic information and some syntactic information (both of which are derivable from a standard dictionary) to obtain the appropriate Boolean interpretation. The algorithm was successful when evaluated against human decisions. In addition to contributing the algorithm, this article draws attention to the effects of this ambiguity on the derivation of appropriate Boolean search specifications from natural‐language statements representing information needs. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A bivariate beta-binomial should be of help to the librarian in predicting future book circulations and three examples are given which indicate the superiority of the beta—over the negative binomial distribution.
Abstract: Library book circulation does not appear to be a Poisson process. It is proposed that a binomial process is more logical and that the mixture distribution for individual book popularities is a continuous beta distribution. Three examples are given which indicate the superiority of the beta—over the negative binomial distribution. A bivariate beta-binomial should be of help to the librarian in predicting future book circulations. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A nearest-neighbor algorithm is presented which allows probabilistic sequential models consistent with two-Poisson or binary-independence assumptions to easily locate the “best” document using temporary sets of documents at a given coordination level.
Abstract: Probabilistic models of document-retrieval systems incorporating sequential learning through relevance feedback may require frequent and time-consuming reevaluations of documents. Coordination level matching is shown to provide equivalent document rankings to binary models when term discrimination values are equal for all terms; this condition may be found, for example, in probabilistic systems with no feedback. A nearest-neighbor algorithm is presented which allows probabilistic sequential models consistent with two-Poisson or binary-independence assumptions to easily locate the “best” document using temporary sets of documents at a given coordination level. Conditions under which reranking is unnecessary are given. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the relevance for sub-Saharan Africa of the Western world's electronic information supply systems and summarize F.W. Lancaster's forecast of a coming electronic paperless society.
Abstract: This Opinion Paper considers the relevance for sub‐Saharan Africa of the Western world's electronic information supply systems. It summarizes F.W. Lancaster's forecast of a coming electronic paperless society. From the perspective of Africa'a economic and social difficulties it looks at Lancaster's recommendation that libraries in developing countries should attempt to bypass the book and leap from oral to electronic communication. It discusses who benefits from libraries at present and who would most likely benefit from electronic libraries. It cites examples of the “book famine” from which the existing libraries now suffer. It criticizes the view that supplying facts to important people via computer will help poor areas to develop. It outlines Africa's dependence and instances some of the inappropriate foreign advice and aid it receives. Most of the examples are taken from ex‐British Africa, the countries that in all but one or two cases have kept on English as their official language. The conclusion is that the electronic library and indeed information science in general distract from what African librarians ought to be doing: helping the illiterate majority of their people learn to read and write. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the statistical validity of the associated author partitions as a function of productivity and collaborative thresholds, and showed that the highly selective nature of the collaborative relationship produces a wide range of threshold values for which the associated partitions are statistically valid.
Abstract: The structure of coauthor graphs and the statistical validity of the associated author partitions are investigated as a function of productivity and collaborative thresholds. The productivity threshold determines the number of authors (points) in a coauthor graph, and the collaborative threshold determines the number of coauthor pairs (lines) in the graph. The statistical validity of author partitions is determined by the random‐graph hypothesis. The results show that for “small” databases, statistically preferred partitions occur when all authors and coauthor pairs appear in the graph. For “large” databases, statistically preferred partitions occur when authors and coauthor pairs who publish only one article are excluded from the graph. Unlike other bibliometric relationships, the highly selective nature of the collaborative relationship produces a wide range of threshold values for which the associated partitions are statistically valid. It remains to be shown how the statistical validity of partitions is related to the empirical significance of the same partitions. © 1987 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: The challenge of ethical problem solving in the context of the information-science professions has fostered debate over whether a code of ethics might not provide a valuable guide to decision making in situations involving ethical conflicts as discussed by the authors.
Abstract: The challenge of ethical problem solving in the context of the information-science professions has fostered debate over whether a code of ethics might not provide a valuable guide to decision making in situations involving ethical conflicts. Sample issues and questions are presented with an aim toward placing them in historical and philosophical frameworks for considering the tension between knowledge and power. While the article concludes that ethical choices are too dynamic and unpredictable to make a fixed code useful, practical guidelines to help information professionals act out of wisdom are offered. © 1987 John Wiley & Sons, Inc.



Journal ArticleDOI
TL;DR: The National Library of Medicine celebrated its sesquicentennial in 1986 and research now going on in the Library's Lister Hill Center shows promise of greatly altering the way future scientists and health practitioners have access to biomedical information.
Abstract: The national Library of Medicine celebrated its sesquicentennial in 1986. The Library has played an important role in medical communication in this country for most of the 150 years since its founding. In the last 20 years, especially, the NLM's responsibilities have been expanded to include a variety of nontraditional library activities. Many of these new activities are based on computer and communications technology. Research now going on in the Library's Lister Hill Center shows promise of greatly altering the way future scientists and health practitioners have access to biomedical information.


Journal ArticleDOI
TL;DR: Interestingly, languages for automation that you really wait for now is coming, and it's significant to wait for the representative and beneficial books to read.
Abstract: Interestingly, languages for automation that you really wait for now is coming. It's significant to wait for the representative and beneficial books to read. Every book that is provided in better way and utterance will be expected by many peoples. Even you are a good reader or not, feeling to read this book will always appear when you find it. But, when you feel hard to find it as yours, what to do? Borrow to your friends and don't know when to give back it to her or him.

Journal ArticleDOI
TL;DR: In this article, the authors present a list of events in their professional career which were of significance to them, including their final military assignment to the Air Documents Research Office in London, which exposed them to the state of information handling at that time.
Abstract: It has been my good fortune to have been involved in the information science field continuously from 1945 to the present. It all started with my final military assignment to the Air Documents Research Office in London. This experience exposed me to the state of information handling at that time. It opened up for me an interest, latent for a few years, in the contribution that could be made by an emerging information technology. Since that time I gained experience at Essex Chemicals (research chemist), Interscience Publishers (scientific editor), M.I.T. (research associate), Battelle Memorial Institute (principal documentation engineer), Western Reserve University, and the University of Pittsburgh. Early in 1986, Donald Kraft invited me to submit a contribution to JASZS on historical perspectives for the 1987 fiftieth anniversary of the Society. I submitted a list of events in my professional career which were of significance to me. Perspectives on those chosen follow.