scispace - formally typeset
Search or ask a question

Showing papers in "Information Processing and Management in 1996"


Journal ArticleDOI
TL;DR: This volume is neither an original contribution to the LIS literature, nor a particularly insightful addition to the voluminous rhetoric that addresses the form that future information services will take, characterized by platitudes and unsubstantiated claims.
Abstract: electronically. This kind of proposition, offered without convincing argument, is typical of articles in this book. Another example is Haack et al's assumption, when describing the University of Hawaii's plans to create a Library External Services Plan to stimulate economic development, that commercialization of library services and products results in only positive outcomes. Apparently the authors do not recognize they are making such an assumption; thus, their suggestions for service delivery ought not be taken at face value. The one exception to the uncritical presentation of ideas is Dusenbury and Pease's insightful discussion of changes in bibliographic instruction resulting from pressures as diverse as changing student expectations, demographics, technology and user education goals. This chapter stands alone in suggesting that technological solutions will not comprehensively solve all the challenges facing information service delivery systems. However, more typical of the book is Morrison's discussion of future reference service and suggestion that future reference librarians essentially will become little more than expert system designers. The remainder of this volume similarly fails to deliver on the title's promise, providing only banal descriptions of how information services are being or ought to be delivered. In summary, for the most part this volume is neither an original contribution to the LIS literature, nor a particularly insightful addition to the voluminous rhetoric to which the LIS community is subjected that addresses the form that future information services will take. It is characterized by platitudes and unsubstantiated claims. If used by professoinals to assist their decision-making, or by students of LIS to explore the changes occurring in information service delivery, this use should start with a crticial examination of the authors' various assumptions and contentions.

398 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a book that is a kind of precious book written by an experienced author, and they show the reasonable reasons why you need to read this book.
Abstract: Any books that you read, no matter how you got the sentences that have been read from the books, surely they will give you goodness. But, we will show you one of recommendation of the book that you need to read. This does technology drive history the dilemma of technological determinism is what we surely mean. We will show you the reasonable reasons why you need to read this book. This book is a kind of precious book written by an experienced author.

364 citations


Journal ArticleDOI
TL;DR: These measures deal with phonetic similarity, typing errors and plain string similarity and it is shown experimentally that all three approaches lead to significantly higher retrieval quality than plain identity.
Abstract: Searching for names, e.g. author names or company names, is still an open problem. This paper reviews known similarity measures. These measures deal with phonetic similarity, typing errors and plain string similarity. It is shown experimentally that all three approaches lead to significantly higher retrieval quality than plain identity. Further improvements are possible by combining different methods; a probabilistic interpretation of string similarity is developed that leads to better results than an ad-hoc approach.

108 citations


Journal ArticleDOI
TL;DR: This paper used text similarity measurements to determine relationships between natural language texts and text excerpts, and the resulting linked hypertext maps can be decomposed into text segments and text themes, and these decompositions are usable to identify different text types and text structures.
Abstract: Sophisticated text similarity measurements are used to determine relationships between natural-language texts and text excerpts. The resulting linked hypertext maps can be decomposed into text segments and text themes, and these decompositions are usable to identify different text types and text structures, leading to improved text access and utilization. Examples of text decomposition are given for expository and non-expository texts.

105 citations


Journal ArticleDOI
TL;DR: In this paper, a review of the literature on organizational effectiveness suggests that it may not be possible to find a precise measure of IS effectiveness and the criteria for effectiveness may vary from organization to organization.
Abstract: Information systems (IS) effectiveness is a complex variable. The literature on organizational effectiveness suggests that it may not be possible to find a precise measure of IS effectiveness and the criteria for effectiveness may vary from organization to organization. A popular perceptual construct, user satisfaction, is examined through a review of IS effectiveness literature. Problems with this construct are highlighted and the social psychology literature is used to clarify these problems. It is noted that theories and models from the behavioural sciences offer a sound basis for understanding the problems with conceptualization and operationalization of user satisfaction. As a result of this review, we offer some principles to keep in mind when utilizing user satisfaction as a measure of IS effectiveness.

104 citations


Journal ArticleDOI
TL;DR: This study recommends query expansion using retrieval feedback for adding McSH search terms to a user's initial query.
Abstract: This paper evaluates the retrieval effectiveness of query expansion strategies on a MEDLINE test collection using Cornell University's SMART retrieval system. Three expansion strategies are tested on their ability to identify appropriate McSH terms for user queries: expansion using an inter-field statistical thesaurus, expansion via retrieval feedback and expansion using a combined approach. These expansion strategies do not require prior relevance decisions. The study compares retrieval effectiveness using the original unexpanded and the alternative expanded user queries on a collection of 75 queries and 2334 MEDLINE citations. Retrieval effectiveness is assessed using eleven point average precision scores (11-AvgP). The combination of expansion using the thesaurus followed by retrieval feedback gives the best improvement of 17% over a baseline performance of 0.5169 11-AvgP. However this improvement is almost identical to that achieved by expansion via retrieval feedback (16.4%). Query expansion using the inter-field thesaurus gives a significant but lower performance improvement (9.9%) over the same baseline. This study recommends query expansion using retrieval feedback for adding McSH search terms to a user's initial query.

97 citations


Journal ArticleDOI
TL;DR: Applications of discourse analysis to information include investigation of the social, political, and technical uses of the word “information” as they have implications for theory and practice.
Abstract: Library and information science (LIS) is a discipline based on communication. Research questions in LIS include those focusing on the retrieval use of information, information services, and information technology. Moreover, the questions asked and the thought relevant to the study of information are communicated formally within the profession, primarily through the literature. This sensitivity to communication suggests that discourse analysis is a valuable method for inquiry in LIS. Discourse analysis has the advantage of being able to address questions regarding both spoken and written communications and so can be applied to matters of articulations of purpose and practice of information study that appear in books and journals in the field. Two key elements of language form the heart of discourse analysis: form (the structure of the language as code, including grammar and semantics) and function (language as a social phenomenon). Applications of discourse analysis to information include investigation of the social, political, and technical uses of the word “information” as they have implications for theory and practice.

96 citations


Journal ArticleDOI
TL;DR: Examination of the technological background and of the Graduate Library School, University of Chicago, suggests that there was a temporary paradigm change away from design and technological innovation.
Abstract: Three related questions are addressed: why was the work of the European documentalists largely ignored in the U.S.A. before the Second World War? What was the “information science vs library science” argument about? Technological innovation was a vital force in library science in the late 19th century and again after 1950. Why was it not a vital force inbetween? Examination of the technological background and of the Graduate Library School, University of Chicago, suggests that there was a temporary paradigm change away from design and technological innovation. Arguments over “information science” reflected a reversal of that paradigm.

83 citations


Journal ArticleDOI
TL;DR: The paper concludes that the history of information science is an historical interdiscipline and those interested in it need to draw on a range of related historical studies such as the historyof science and technology, thehistory of printing and publishing, and the historyOf information institutions such as libraries, archives and museums.
Abstract: The first part of this paper examines some of the difficulties for the historian of information science that arise from the lack of agreement as to what precisely constitutes information science and from its commonly accepted interdisciplinary nature. It examines in this connection Machlup and Mansfield's ideas about a “narrow” information science and information science as a composite of disciplinary chunks. Regardless of these issues, it demonstrates that the history of information science is gaining an identity both bibliographically and socially. The second part of the paper suggests that as a condition of their organization, reproduction, and control all societies have evolved their own distinctive ways of managing information. Ultimately, then, the history of information science can be considered to extend far beyond the last 50 years where attention is commonly focused. Drawing on Braudel's notions, duree longue, moyenne and courte , the paper suggests an approach to periodicity that provides a new perspective for the history information science. The paper also introduces the notions of synchrony and diachrony to suggest other approaches to the historical study of aspects of information science. The paper concludes that the history of information science is an historical interdiscipline and those interested in it need to draw on a range of related historical studies such as the history of science and technology, the history of printing and publishing, and the history of information institutions such as libraries, archives and museums.

77 citations


Journal ArticleDOI
TL;DR: What is it about the human mind that accounts for the fact that the authors can speak and understand a language?
Abstract: What is it about the human mind that accounts for the fact that we can speak and understand a language? Why can t other creatures do the same? And what does this tell us about the rest of human abilities? Recent dramatic...

71 citations


Journal ArticleDOI
TL;DR: Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form.
Abstract: This paper describes experiments in the retrieval of spoken documents in multimedia systems. Speech documents pose a particular problem for retrieval since their words as well as contents are unknown. The work reported addresses this problem, for a video mail application, by combining state of the art speech recognition with established document retrieval technologies so as to provide an effective and efficient retrieval tool. Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form.

Journal ArticleDOI
TL;DR: It is shown that average precision and recall is not affected for the full text document collection when the OCR version is compared to its corresponding corrected set and that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Abstract: We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.

Journal ArticleDOI
TL;DR: A detailed analysis of prior results and their underlying experimental designs indicates that there are a number of open questions relevant to the overall debate on indexing vocabularies for MEDLINE and results from a new experiment assessing eight different retrieval strategies indicate that MeSH does have an important role in text retrieval.
Abstract: The indexing vocabulary is an important determinant of success in text retrieval. Researchers have compared the effectiveness of indexing using free-text and controlled vocabularies in a variety of text contexts. A number of studies have investigated the relative merits of free-text, MeSH and UMLS Metathesaurus indexing vocabularies for MEDLINE document indexing. Most of these studies conclude that controlled vocabularies offer no advantages in retrieval performance over free-text. This paper offers a detailed analysis of prior results and their underlying experimental designs. The analysis indicates that there are a number of open questions relevant to the overall debate on indexing vocabularies for MEDLINE. This paper also offers results from a new experiment assessing eight different retrieval strategies. These strategies involve document indexing via free-text, MeSH and several alternative combinations of the two vocabularies. The results indicate that MeSH does have an important role in text retrieval.

Journal ArticleDOI
TL;DR: There is much in the story of the ICD that remains to be more fully examined, given the classification's long life, its international reach, and the developments to which it has been subject.
Abstract: The author presents a brief account of the history of the International Classification of Diseases that, as information infrastructure, its is as much a social construct as the product of a rigorous scientific process of development. He introduces an arrresting concept of information infrastructure inverssion which it would be interesting to see explored in a range of other contexts. But there is also much in the story of the ICD that remains to be more fully examined, given the classification's long life, its international reach, and the developments to which it has been subject. One wonders also, what a comparative study of the development of the International Classificaiton of Disaeases and of the International Catalogue of Scientific Papers, for example , would reveal about the social process of international information infrastructuree organisation and support -one having been adaptable and still alive, the other apparently unable to adapt an,d now an historical monument

Journal ArticleDOI
TL;DR: Results indicate that relevance has strong relationships with process, product and overall user satisfaction measures while relevance and cost-benefit satisfaction measures have no significant relationship and that understanding the proper units of analysis for these measures helps resolve the paradox of the management information system and information science literatures not informing each other concerning user-based information system performance measures.
Abstract: The goal of this research was to better understand the relationship between relevance and user-satisfaction, the two predominant aspects of user-based performance in information systems. This project unconfounds relevance and user-satisfaction assessments of system performance at the retrieved item level. To minimize the idiosyncracies of any one system, a generalized, naturalistic information system was employed in this study. Respondents completed sense-making timeline questionnaires in which they described a recent need they had for geographic information. Retrieved documents from the generalized system consisted of the responses users obtained while resolving their information needs. Respondents directly provided process, product, cost-benefit, and overall satisfaction assessments with the generalized geographic system. Relevance judgments of retrieved items were obtained through content analysis from sense-making questionnaires as a secondary observation technique. The content analysis provided relevance values on both five-category and two-category scales. Results indicate that relevance has strong relationships (gamma values from 0.74 to 0.89) with process, product and overall user satisfaction measures while relevance and cost-benefit satisfaction measures have no significant relationship (gamma value of 0.049). This analysis also indicates that neither relevance nor user-satisfaction subsumes the other concept, and that understanding the proper units of analysis for these measures helps resolve the paradox of the management information system and information science literatures not informing each other concerning user-based information system performance measures.

Journal ArticleDOI
TL;DR: This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods and finding relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach.
Abstract: Latent Semantic Indexing (LSI) is an effective automated method for determining if a document is relevant to a reader based on a few words or an abstract describing the reader's needs. A particular feature of LSI is its ability to deal automatically with synonyms. LSI generally is explained in terms of a mathematical concept called the Singular Value Decomposition and statistical methods such as factor analysis. This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods. The relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach.

Journal ArticleDOI
TL;DR: The LUST system as discussed by the authors learns the characteristics of the language or sublanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts, without the prior imposition of some common grammatical assumptions (e.g. part-of-speech assumptions).
Abstract: The grammars of natural languages may be learned by using genetic algorithms that reproduce and mutate grammatical rules and part-of-speech tags, improving the quality of later generations of grammatical components. Syntactic rules are randomly generated and then evolve; those rules resulting in improved parsing and occasionally improved retrieval and filtering performance are allowed to further propagate. The LUST system learns the characteristics of the language or sublanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts. Unlike the application of traditional linguistic rules to retrieval and filtering applications, LUST develops grammatical structures and tags without the prior imposition of some common grammatical assumptions (e.g. part-of-speech assumptions), producing grammars that are empirically based and are optimized for this particular application.

Journal ArticleDOI
TL;DR: The design and implementation of TACHIR, a tool for the automatic construction of hypertexts for Information Retrieval through the use of an authoring methodology employing a set of well known Information retrieval techniques, automatically builds up a hypertext from a document collection.
Abstract: The paper describes the design and implementation of TACHIR, a tool for the automatic construction of hypertexts for Information Retrieval. Through the use of an authoring methodology employing a set of well known Information Retrieval techniques, TACHIR automatically builds up a hypertext from a document collection. The structure of the hypertext reflects a three level conceptual model that has proved to be quite effective for Information Retrieval. Using this model it is possible to navigate among documents, index terms, and concepts using automatically determined links. The hypertext is implemented using the HTML hypertext mark up language, the mark up language of the World Wide Web project. It can be distributed on different sites and different machines over the Internet, and it can be navigated using any of the interfaces developed in the framework World Wide Web project, for example NetScape .

Journal ArticleDOI
Young C. Park1, Key-Sun Choi1
TL;DR: A formal approach for the data-sparseness problem, which is crucial in constructing a thesaurus, is developed and the validity of this approach is shown by experiments.
Abstract: Automatic thesaurus construction is accomplished by extracting term relations mechanically. A popular method uses statistical analysis to discover the term relations. For low-frequency terms, however, the statistical information of the terms cannot be reliably used for deciding the relationship of terms. This problem is generally referred to as the data-sparseness problem. Unfortunately, many studies have shown that low-frequency terms are of most use in thesaurus construction. This paper characterizes the statistical behavior of terms by using an inference network. A formal approach for the data-sparseness problem, which is crucial in constructing a thesaurus, is developed. The validity of this approach is shown by experiments.


Journal ArticleDOI
TL;DR: The aim of this study was to design an information retrieval system permitting the “personalization” of search, by taking into account user profile, and to test such a hypothesis using an existing information retrieved system incorporating full-text natural language processing tools.
Abstract: Due to the ever-increasing quantity of available information, which users have to scan in order to find relevant items, noise has become a major issue in the implementation and use of information retrieval systems. The aim of this study was to design an information retrieval system permitting the “personalization” of search, by taking into account user profile. A pre-orientation system was first developed to give access to a personalized subcorpus. To limit noise in information retrieval systems, the textual material offered to the user is reduced and contains only those sections (units) of the document that interest him and are significant to him (where textual material is used in the sense of document units to be processed by content analysis in order to build descriptions of the documents). In this way, the documents are structured on the basis of utility functions. The selected document units are part of the sub-corpus defined by the pre-orientation system. Next, the profile of each user is characterized by determining competence in a given field and at different levels. Each user is characterized by: • -stable information, related to the person rather than to a particular search. This information provides a general description of the user and his habits, • -variable information, related to a specific search. The priority here is to describe the objective of the search (search may be either exhaustive or non-exhaustive; it may concern specialized or popular publications, etc.). The function of the pre-orientation system is to associate a set of characteristics applying to document units to a given user profile. Search is then applied only to the subset of the selected document units that are relevant to the user and established following his profile. Document units are not characterized on the basis of thematic criteria related to content, but rather on the basis of criteria relating to utility. The objective was to propose a hypothesis on the different parameters determining user profile and document unit characteristics, and to test such a hypothesis using an existing information retrieval system incorporating full-text natural language processing tools.

Journal ArticleDOI
TL;DR: The Getty Online Searching Project studied the end-user searching behavior of 27 humanities scholars over a 2-year period and found that a number of scholars anticipated—and found—that they were already familiar with a very high percentage of the records their searches retrieved.
Abstract: The Getty Online Searching Project studied the end-user searching behavior of 27 humanities scholars over a 2-year period. Surprising results were that a number of scholars anticipated—and found—that they were already familiar with a very high percentage of the records their searches retrieved. Previous familiarity with documents has been mentioned in discussion of relevance and information retrieval (IR) theory, but it has generally not been considered a significant factor. However, these experiences indicate that high document familiarity can be a significant factor in searching. Some implications are drawn regarding the impact of high document familiarity on relevance and IR theory. Finally, some speculations are made regarding high document familiarity and Bradford's Law.

Journal ArticleDOI
TL;DR: Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it, so a new method of forming relevance judgments that a suitable for assessing recall gives different results is shown.
Abstract: Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognized as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgments have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance that is suitable for assessing precision but not recall. The problem is demonstrated by comparing two information retrieval methods over several queries, and showing how a new method of forming relevance judgments that a suitable for assessing recall gives different results. Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it.

Journal ArticleDOI
TL;DR: This study proposes an extended vector-processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness and builds a hypertext based on two medium-size collections, the cacm and the cisi collection.
Abstract: When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query-based access mechanism must be therefore provided to complement the navigational tools inherent in hypertext systems. Most mechanisms currently proposed are based on conventional information retrieval models which consider documents as independent entities, and ignore hypertext links. To promote the use of other information retrieval mechanisms adapted to hypertext systems, this study attempts to respond to the following questions: (1) How can we integrate information given by hypertext links into an information retrieval scheme? (2) Are these hypertext links (and link semantics) clues to the enhancement of retrieval effectiveness? (3) If so, how can we use them? Two solutions are: (a) using a default weight function based on link type or assigning the same strength to all link types; or (b) using a specific weight for each particular link, i.e. the level of association or a similarity measure. This study proposes an extended vector-processing scheme which extracts additional information from hypertext links to enhance retrieval effectiveness. To carry out our investigations, we have built a hypertext based on two medium-size collections, the cacm and the cisi collection. The hypergraph is composed of explicit links (bibliographic references), computed links based on bibliographic information (bibliographic coupling, co-citation), or on hypertext links established according to document representatives (nearest neighbor).

Journal ArticleDOI
TL;DR: This paper will concern you to try reading ethics of information management as one of the reading material to finish quickly.
Abstract: Feel lonely? What about reading books? Book is one of the greatest friends to accompany while in your lonely time. When you have no friends and activities somewhere and sometimes, reading book can be a great choice. This is not only for spending the time, it will increase the knowledge. Of course the b=benefits to take will relate to what kind of book that you are reading. And now, we will concern you to try reading ethics of information management as one of the reading material to finish quickly.


Journal ArticleDOI
TL;DR: Another perspective on key MIS issues is provided by examining published MIS articles by examining differences that exist between the issues that appeared as important in MIS publications and those that appeared significant to the top executives in key issue studies.
Abstract: Reports of key MIS issues based on the perceptions of senior IS executives appear periodically in the MIS literature. In this article, we provide another perspective on key MIS issues by examining published MIS articles. A content analysis of MIS articles appearing between 1989 and mid-year 1993 in prominent academic and practitioner journals has been conducted in order to: identify, classify, and prioritize by meta-categories the key issues in MIS publications; to perform a trend analysis of the various meta-categories; and to examine the relevance of issues by providing a comparison with the issues that emerged out of previous key issue studies. Twenty-six key issues are ranked according to their frequency of occurrence as the topic of inquiry in the 630 articles surveyed. Further, a year-by-year analysis of publications from 1989 to 1992 provides some visible trends. The study also reveals differences that exist between the issues that appeared as important in MIS publications and those that appeared significant to the top executives in key issue studies. Reasons for and implications of these differences are offered.

Journal ArticleDOI
TL;DR: Examination of the published and unpublished documentation and transcripts of oral histories reveals a sense of a significant era and a vital, exciting time in the individual professional lives of the online pioneers.
Abstract: The historical development of online systems and services is not just a story of tapes and disks, terminals and telephones, search engines and algorithms, demonstrations and downtime; it is also a story of people. Examination of the published and unpublished documentation and transcripts of oral histories reveals a sense of a significant era and a vital, exciting time in the individual professional lives of the online pioneers. The leaders of the online age can be divided into three groups: the developers, the managers and trainers, and the users. The developers were diverse in their geographic and disciplinary backgrounds and their underlying goals, but they all were aggressive, competitive, and imaginative in creating opportunities to exploit the latest hardware and software of the period. The second group, managers and trainers, energetically demonstrated the unreliable online systems. With zeal, perseverance, charm, and even chicanery, they recruited and trained the first users. The users were the third group, playing a critical role in evaluating new systems, testing documentation, and assessing training programs.


Journal ArticleDOI
TL;DR: In this article, the statistical significance of windows is computed, based on the presence of terms in titles, abstracts, citations, and section headers, as well as binary-independent and inverse-document-frequency weightings.
Abstract: Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs practice spectrum. This distinction is also studied here using the type-token ratio to differentiate between sublanguages. The statistical significance of windows is computed, based on the presence of terms in titles, abstracts, citations, and section headers, as well as binary-independent and inverse-document-frequency weightings. The characteristics of windows are studied by examining their within-window density and the S concentration, the concentration of terms from various document fields (e.g. title, abstract) in the fulltext. The rate of window occurrences from the beginning to the end of document fulltext differs between academic fields. Different syntactic structures in sublanguages are examined, and their use is considered for discriminating between specific academic disciplines and, more generally, between theory vs practice or knowledge vs applications-oriented documents.