scispace - formally typeset
Search or ask a question

Showing papers in "Information Processing and Management in 1981"


Journal ArticleDOI
TL;DR: This paper tackles the problem of how one might select further search terms, using relevance feedback, given the search terms in the query, by generating a number of different spanning trees from a variety of association measures.
Abstract: This paper tackles the problem of how one might select further search terms, using relevance feedback, given the search terms in the query. These search terms are extracted from a maximum spanning tree connecting all the terms in the index term vocabulary. A number of different spanning trees are generated from a variety of association measures. The retrieval effectiveness for the different spanning trees is shown to be approximately the same. Effectiveness is measured in terms of precision and recall, and the retrieval tests are done on three different test collections.

143 citations


Journal ArticleDOI
TL;DR: The trigram analysis technique developed determined the error site within a misspelling accurately, but did not distinguish effectively between different error types or between valid words and misspellings.
Abstract: Work performed under the SPElling Error Detection COrrection Project (SPEEDCOP) supported by National Science Foundation (NSF) at Chemical Abstracts Service (CAS) to devise effective automatic methods of detecting and correcting misspellings in scholarly and scientific text is described. The investigation was applied to 50,000 word/misspelling pairs collected from six datasets (Chemical Industry Notes (CIN), Biological Abstracts (BA). Chemical Abstracts (CA), Americal Chemical Society primary journal keyboarding (ACS), Information Science Abstracts (ISA), and Distributed On-Line Editing (DOLE) (a CAS internal dataset especially suited to spelling error studies). The purpose of this study was to determine the utility of trigram analysis in the automatic detection and/or correction of misspellings. Computer programs were developed to collect data on trigram distribution in each dataset and to explore the potential of trigram analysis for detecting spelling errors, verifying correctly-spelled words, locating the error site within a misspelling, and distinguishing between the basic kinds of spelling errors. The results of the trigram analysis were largely independent of the dataset to which it was applied but trigram compositions varied with the dataset. The trigram analysis technique developed determined the error site within a misspelling accurately, but did not distinguish effectively between different error types or between valid words and misspellings. However, methods for increasing its accuracy are suggested.

133 citations


Journal ArticleDOI
TL;DR: It is shown that the concept of threshold values resolves the problems inherent with relevance weights, and possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations are explored.
Abstract: Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.

99 citations


Journal ArticleDOI
TL;DR: The present article assesses the validity of inter-indexer consistency as a measure of indexing quality or effectiveness in various environments.
Abstract: Indexing quality determines whether the information content of an indexed document is accurately represented. Indexing effectiveness measures whether an indexed document is correctly retrieved every time it is relevant to a query. Measurement of these criteria is cumbersome and costly; data base producers therefore prefer inter-indexer consistency as a measure of indexing quality or effectiveness. The present article assesses the validity of this substitution in various environments.

94 citations


Journal ArticleDOI
TL;DR: It is shown that clusters identified by the journal concentration method also cohere in a natural way through cluster co-citation.
Abstract: A co-citation cluster analysis of a three year (1975–1977) cumulation of the Social Sciences Citation Index is described, and clusters of information science documents contained in this data-base are identified using a journal subset concentration measure. The internal structure of the information science clusters is analyzed in terms of co-citations among clusters, and external linkages to fields outside information science are explored. It is shown that clusters identified by the journal concentration method also cohere in a natural way through cluster co-citation. Conclusions are drawn regarding the relationship of information science to the social sciences, and suggestions are made on how these data might be used in planning an agenda for research in the field.

60 citations


Journal ArticleDOI
TL;DR: The various query-processing methods are discussed and compared and some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms.
Abstract: Most current document retrieval systems require that user queries be specified in the form of Boolean expressions. Although Boolean queries work, they have flaws. Some of the attempts to overcome these flaws have involved “partial-match” retrieval or the use of fuzzy-subset theory. Recently, some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms. The various query-processing methods are discussed and compared.

36 citations


Journal ArticleDOI
TL;DR: A fast algorithm is described for comparing the lists of terms representing documents in automatic classification experiments, using an inverted file to the terms in the document collection.
Abstract: A fast algorithm is described for comparing the lists of terms representing documents in automatic classification experiments. The speed of the procedure arises from the fact that all of the non-zero-valued coefficients for a given document are identified together, using an inverted file to the terms in the document collection. The complexity and running time of the algorithm are compared with previously described procedures.

30 citations


Journal ArticleDOI
Karen Markey1
TL;DR: It is proposed that Taylor's four levels of question formulation may be inadequate for describing question negotiation in the online presearch interview.
Abstract: In a seminal article on question negotiation, Taylor outlines four levels of question formulation which pertain to the client-information professional interview session. The literature which supports Taylor's theory is covered. It is proposed that Taylor's four levels may be inadequate for describing question negotiation in the online presearch interview. An altered model is given with suggestions for testing the model in the online environment. Some recommendations concerning the importance of discovering such a model are offered.

26 citations


Journal ArticleDOI
TL;DR: The results show that between 1967 and 1972, the information sector grew at a slightly slower rate than did the entire economy, but other elements of the sector out-paced the economy, including electronic instruments and telecommunications.
Abstract: The paper presents the results of an Input-Output study of the U.S. information sector, constructed from 1972 tables compiled by the Bureau of Economic Analysis of the U.S. Department of Commerce. The study updates the 1967 transaction table published in The Information Economy (U.S. Department of Commerce, 1977). The results show that between 1967 and 1972, the information sector grew at a slightly slower rate than did the entire economy. While the sector accounted for 25.1% GNP in 1967, its share of GNP had actually declined to 24.8% by 1972. Many of the elements of the sector lagged behind the national economy, including printing and publishing and radio and TV equipment; however, other elements of the sector out-paced the economy, including electronic instruments and telecommunications.

26 citations


Journal ArticleDOI
TL;DR: A dynamic model for determining the proper stopping point using decision theory under risk with changing utilities is used as the basis for a Bayesian model of user scanning behavior, which has implications for retrieval systems design and evaluation.
Abstract: A model of a user's scan of the output of an information storage and retrieval system in response to a query is presented. Rules for determining the user's optimal stopping point are discussed and compared. A dynamic model for determining the proper stopping point using decision theory under risk with changing utilities is used as the basis for a Bayesian model of user scanning behavior. An algorithm to implement the Bayesian model is introduced and examples of the model are given. The implications for retrieval systems design and evaluation are discussed.

18 citations


Journal ArticleDOI
D. Kropp1, Georg Walch1
TL;DR: By classifying the set of Superstrings belonging to a fragment according to the position of the fragment in the Superstring, one gains a novel possibility of supporting exact match-, partial match-, and masked partial match-retrieval by an index.
Abstract: An indexing technique for text data based on word fragments is described In contrast to earlier approaches the fragments are allowed to be overlapping and are linked in a directed graph structure reflecting that many fragments (“Superstrings”) contain other fragments as substrings This leads to a redundant free set of primary data pointers By classifying the set of Superstrings belonging to a fragment according to the position of the fragment in the Superstring, one gains a novel possibility of supporting exact match-, partial match-, and masked partial match-retrieval by an index The search strategies for the various retrieval cases are described

Journal ArticleDOI
TL;DR: It is argued that extension of the scope of Information Theory as well as development of new theories of information science presupposes better understanding of relevant empirical regularities and laws.
Abstract: The empirical import of Shannon's Information Theory and its impact on information science are discussed. It is argued that extension of the scope of Information Theory as well as development of new theories of information science presupposes better understanding of relevant empirical regularities and laws. Possibilities of broadening the empirical foundation of Information Theory by introduction of appropriate least effort criteria are discussed.

Journal ArticleDOI
TL;DR: An approach toward a functional structure analysis of abstract text is presented as a part of a semantic information representation method of scientific and technical documents together with an extraction method of semantic information in the text.
Abstract: An approach toward a functional structure analysis of abstract text is presented as a part of a semantic information representation method of scientific and technical documents. This analysis method forms a fundamental study on construction methodology of an advanced document information system together with an extraction method of semantic information in the text. As “information functions” in the abstract, four concepts of Theme, Method, Result and Discussion are selected and a model set of “functional patterns” consisting of meta-terms in each sentence of the abstract is prepared. A preliminary computer experiment is implemented with a pattern matching procedure identifying the information function for each sentence of the abstract based on the model set. The results are examined and discussed in terms of reproducibility of the model set and effectiveness of the procedure for the other data sets. The procedure is evaluated showing some validity as a whole and suggestions for future improvements are given.


Journal ArticleDOI
TL;DR: The study examines selected roles of the information sector in the national economy and finds that the high technology elements of the sector, such as electronic components, computers and telecommunications equipment have experienced appreciably less price rise than has the economy as a whole.
Abstract: The study examines selected roles of the information sector in the national economy. Among the findings are the following: (1) the information sector conducts relatively little international trade, in comparison to its domestic activity. Roughly 12% of U.S. exports are attributable to the information sector; over 97% of the sector's output is sold within the U.S.; and the sector's exports account for only a small fraction of 1% of GNP. (2) The historical pattern of employment shows that the portion of information workers has risen from 8% of the U.S. work force in 1870 to 41% in 1970. Relatively little of this growth is the result of new technological innovations such as telephones, radio, television and, more recently, computers. Rather, the growth of public and private bureaucracies, which now total 26% of our total work force, largely explains the growth of the sector. (3) Unemployment within the information sector has consistently been lower than in either the manufacturing or agricultural sectors of the national economy. (4) Since 1967, the high technology elements of the information sector, such as electronic components, computers and telecommunications equipment have experienced appreciably less price rise than has the economy as a whole. However, over the same time period, the service elements of the sector, including finance and insurance, education and medical care, have experienced greater rates of inflation than has the economy as a whole.

Journal ArticleDOI
TL;DR: It has been found that there is a considerable degree of accordance between the results of journal acquisition planning based on the described method, and the demands of journal users.
Abstract: A great and constantly growing number of scientific journals editions demands strict selection of journals during the planning of their purchase. The method of journal selection on the basis of the information service system data is described in this article. The primary value of a journal has been defined as an amount of the retrieved for the readers information concerning the articles published in a given journal. This parameter and the costs of subscription are the basis for journal ranking and determination of the number of copies (to be bought). This method has been verified by the use of data from the SDI system exploited at Wroclaw Technical University. Having compared the achieved results with the results of the simultaneously conducted questionnaire investigation, it has been found that there is a considerable degree of accordance between the results of journal acquisition planning based on the described method, and the demands of journal users.

Journal ArticleDOI
TL;DR: Results showed the impracticality of the procedure in an operational setting, but indicated the value of the analyses with sample data in the development and maintenance of keyword dictionaries and thesauri.
Abstract: A study was carried out of [1] the relationship between the vocabulary of user queries and the vocabulary of documents relevant to the queries, and [2] the value of adding to the document description record in a retrieval system keywords from previous queries for which the document had proven useful Two test databases incorporating user query keywords were implemented at the School of Library and Information Science, University of Western Ontario Clustering of the documents via little and user keywords, a statistical analysis of title-user keyword co-occurrences, and retrieval tests were used to examine the effect of the added keywords Results showed the impracticality of the procedure in an operational setting, but indicated the value of the analyses with sample data in the development and maintenance of keyword dictionaries and thesauri

Journal ArticleDOI
TL;DR: A 13-year analysis of the finances of a major database producer organization, which is also a publisher of abstracting and indexing (AI), finds that there have been decreases in the number of database leases and licences, theNumber of print product subscriptions, and the excess of income over expenses.
Abstract: A 13-year analysis of the finances of a major database producer organization, which is also a publisher of abstracting and indexing (AI at the same time, there have been decreases in the number of database leases and licences, the number of print product subscriptions, and the excess of income over expenses. In constant dollars, the cost of producing an abstract has decreased, the subscription charge for printed products has increased only slightly, the lease and license fees have decreased, royalty charges have increased, and hourly connect fees have remained steady (even though the size of the online file has increased greatly). The problem is that of maintaining a balanced financial status in light of increased income from one class of products, decreased income from another class of products, and increased cost of operation. Income, expenses, excess of income over expenses, and prices have been plotted in both real and constant dollars. The product a user gets for his money, including database quality and growth, are discussed in relation to cost. Possible approaches to ensure economic viability are considered in terms of expenses, efficiency of operation, marketing, products, services, and pricing. The most promising approaches lie in the areas of pricing and development of new products and services. The possibility of developing a consortium of database producers for offering online services is proposed. There appears to be no alternative to increasing the prices for online users. Data provided relates to one database producing organization, but the trends are considered to be representative of numerous A&I databases.

Journal ArticleDOI
TL;DR: This paper demonstrates the property of inclusiveness of document retrieval systems where documents are indexed by unweighted descriptors, and in which query search patterns are Boolean functions of descriptors (systems using the Inverted File Method, the Canonical Structure File Method or the Sequential File Method).
Abstract: One of the means of reducing the information retrieval time is by taking advantage of the property of inclusiveness of information retrieval systems. When one knows the system response to a query which is more general in relation to another query, then in an inclusive retrieval system in order to retrieve the response to the more specific query it suffices to limit the information retrieval process to the search of the system response to the more general query. This paper demonstrates the property of inclusiveness of document retrieval systems where documents are indexed by unweighted descriptors, and in which query search patterns are Boolean functions of descriptors (systems using the Inverted File Method, the Canonical Structure File Method, or the Sequential File Method). The paper presents three methods for determining a partial ordering relation on a set of Boolean search patterns of queries, implying a partial ordering on the set of the system responses to these queries and discusses the adequacy of each method depending on the information retrieval method used. The paper also proves the general theorems concerning the property of inclusiveness with regard to the class of information retrieval systems considered. Furthermore, the methods for calculating the degree of generality/specificity between queries are given. The utilization of the property of inclusiveness may considerably reduce the operating costs of information retrieval systems (particularly systems using the Sequential File Method, e.g. the SDI systems). Moreover, the results of the studies presented in the paper give the possibility of gradual narrowing or broadening in an automatic way of a given Boolean search pattern of a query, which is of vital importance for on-line information retrieval systems.

Journal ArticleDOI
TL;DR: It is shown that if one approaches the computer handling of bibliographic records from the point of view of what a computer is capable of doing, rather than adapting and simulating manual methods, it is possible to dispense with virtually all skilled preparation of formalised and highly structured records.
Abstract: The conventional methods of composing bibliographic records, developed over a long period for manual filing systems, are discussed from the point of view of what can be done in a computer system. The tendency has been to adapt the highly skilled and time-consuming process of composing formalised bibliographic records in bibliographic language by introducing a further degree of formalisation and structure in order to enable the computer to interpret the records in a way which simulates manual searching. It is shown that if one approaches the computer handling of bibliographic records from the point of view of what a computer is capable of doing, rather than adapting and simulating manual methods, it is possible to dispense with virtually all skilled preparation of formalised and highly structured records. A working system on this basis is briefly described. A data-base containing over 40,000 bibliographic references has been compiled by simply transcribing the descriptive data found on the title pages and elsewhere on the documents, using clerical personnel with minimal professional supervision. This data-base has been in regular use for 1 1/2 years as an on-line library catalogue for document retrieval, as well as a retrieval system for subject enquiries. The system is based on the Status-2 software developed by AERE, Harwell. Retrieval from natural language text is not new as a means of interrogating data-bases from the point of view of subject matter. The principle is here extended to cover the purely bibliographic elements as well, and to serve the purpose of library cataloguing. The wider implications of the use of natural language bibliographic descriptions, transcribed without modification, beyond the application to individual libraries, are discussed.

Journal ArticleDOI
TL;DR: A technique of online instruction and assistance to bibliographic data base searchers called Individualized Instruction for Data Access (IIDA) assists searchers by providing feedback based on real-time analysis while searches are being performed.
Abstract: A technique of online instruction and assistance to bibliographic data base searchers called Individualized Instruction for Data Access (IIDA) is being developed by Drexel University. IIDA assists searchers by providing feedback based on real-time analysis while searches are being performed. Extensive help facilities which draw on this analysis are available to users. Much of the project's experimental work, as described elsewhere [1–3], is concerned with the process of searching and the behavior of searchers. This paper will largely address itself to the project's computer system, which is being developed by subcontract with the Franklin Institute's Science Information Services.

Journal ArticleDOI
TL;DR: The state-of-the-art of predictive modelling is discussed with respect to syntactic, semantic, and pragmatic criteria, emphasizing the need for concentrated effort in further development of the empirical foundation of information science.
Abstract: The problem of modelling information systems is studied with focus on predictability. Predictability presupposes discovery and knowledge of empirical laws and theories, which are in the domain of information science. Discovery of such laws and theories goes hand in hand with the development of the capability to measure important variables in that domain. The state-of-the-art of predictive modelling is discussed with respect to syntactic, semantic, and pragmatic criteria, emphasizing the need for concentrated effort in further development of the empirical foundation of information science.

Journal ArticleDOI
TL;DR: The problem of information retrieval is discussed from a theoretical point of view, followed by an analysis of the reference process and data thereby gathered, leading to a description of REFLES in terms of its hardware and software.
Abstract: REFLES is a microcomputer-based system for data retrieval in library environments. The problem of information retrieval is discussed from a theoretical point of view, followed by an analysis of the reference process and data thereby gathered, leading to a description of REFLES in terms of its hardware and software. REFLES, a prototype system at present, currently functions in a test environment. Examples of data contained in the system and of its use are presented. Future considerations and speculations on other versions of the system conclude the paper.

Journal ArticleDOI
TL;DR: The work reported in this paper, uses subject headings structured according to postulates and principles for facet analysis, as the input for generating thesaurus using test data of about 1500 subject-propositions from Leather Technology.
Abstract: Several experiments were conducted at the Documentation Research and Training Centre for generating thesaurus. The work reported in this paper, uses subject headings structured according to postulates and principles for facet analysis, as the input for generating thesaurus. The system uses a coding scheme for augmenting the subject headings to make them suitable for computer manipulation. Once the subject headings are coded and input, the other processes are done automatically. The system has five phases namely, Translation Phase, Term-pair Generation Phase, Coordinate Term-pair Generation Phase, Retranslation Phase and Printing Phase. The system is described briefly, giving the systems flow-chart, inputs and outputs of the different phases and a sample printout of a model thesaurus generated using test data of about 1500 subject-propositions from Leather Technology.

Journal ArticleDOI
TL;DR: The Multilevel Information System (MLIS), extension of typical Information Retrieval System towards more complete data processing, is discussed and functions typical for data base management systems and retrieval-oriented systems are integrated.
Abstract: The Multilevel Information System (MLIS), extension of typical Information Retrieval System towards more complete data processing, is discussed. MLIS integrates functions typical for data base management systems and retrieval-oriented systems. Several levels of data accessing are provided, each level developed for a different class of users. End-user level is based on simple query language, trained user level on a relational model, and application programmer level on a Data Manipulation Language nested in high level programming language. The last two levels are discussed in detail.

Journal ArticleDOI
TL;DR: A simple mathematical model for the dynamics of the interaction between libraries and publishers is analyzed and it is argued that present trends are unlikely to continue, but that a discontinuous shift in the production of scholarly output is likely to occur within a decade or two.
Abstract: Libraries and publishers have evolved together. Publishers rely on libraries as a minimum market for their scholarly products. Inflationary pressures have caused publishers to increase prices that, in turn, strain library budgets that have not increased as fast, and which, in turn, undermine the minimal demand publishers can count on, adding to inflationary pressure. A simple mathematical model for the dynamics of the interaction between libraries and publishers is analyzed. It derives a function for the supply curve of scholarly publications, and is used to estimate when an institution will have to spend as much per person on library support as on his or her salary if present trends continue. This is used to argue that present trends are unlikely to continue, but that a discontinuous shift in the production of scholarly output is likely to occur within a decade or two. Likely new forms of communication among scholars in “communicating classes” involving nearly simultaneous communication and a new kind of organized cumulative record are discussed. The implication for institutional changes not only in libraries and publishers and their interrelation but of new kinds of institutions are sketched.

Journal ArticleDOI
TL;DR: The principles and methodology of Warsaw Clearinghouse activity are presented and an information retrieval system TEKLA designed to improveClearinghouse service is presented.
Abstract: The principles and methodology of Warsaw Clearinghouse activity are presented. The Clearinghouse is a specialized biblographic service for collecting, processing and disseminating information on thesauri, classification systems and schedules, descriptors, key-words, and subject heading lists. An information retrieval system TEKLA designed to improve Clearinghouse service is presented.

Journal ArticleDOI
TL;DR: The special requirements of a bibliographic database are reviewed and how they are met in the database system of DOBIS-LIBIS (Dortmund Library System-Leuven Library System) are shown.
Abstract: The database to be used with an online bibliographic information system must meet a number of requirements which are often not satisfied by conventional database management systems. Most important of these is the requirement for full authority file control over the indexes to the database. This paper reviews the special requirements of a bibliographic database and shows how they are met in the database system of DOBIS-LIBIS (Dortmund Library System-Leuven Library System).


Journal ArticleDOI
TL;DR: By the introduction of the labeled terms, the expressive power of the description of semantic information of documents and queries has considerably been increased in comparison with a version of simple terms.
Abstract: This paper deals with a mathematical formulation of document information systems The model of document information systems is given as an interpretation of a formal language In the formal language, a concept of the labeled terms is introduced as the most basic concept, from a viewpoint of hierarchical semantic structure analysis of documents The labels are intended to denote the context information, called the information functions in this paper, in the text of documents By the introduction of the labeled terms, the expressive power of the description of semantic information of documents and queries has considerably been increased in comparison with a version of simple terms Various conventional retrieval methods and more general retrieval method in the augmented framework are systematically described