scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 1996"




Journal ArticleDOI
TL;DR: In this paper, a conceptual framework that links the world of the information poor with a world of insiders is developed, showing that insiders, because of their status, reinforce information poverty by neglecting to accept sources of information not created by themselves.
Abstract: Drawing upon a series of studies that examines the information world of poor people, the author discovers four critical concepts that serve as the basis for defining an impoverished life-world. These concepts are risk-taking, secrecy, deception, and situational relevance. Moving back and forth among the worlds of janitors, single mothers, and an aging population, the author develops a conceptual framework that links the world of the information poor—the outsiders—with a world of insiders. Paradoxically, the author finds that the very existence of two worlds is in itself a hindrance to information seeking and sharing behaviors. Insiders, because of their status, reinforce information poverty by neglecting to accept sources of information not created by themselves. The author's findings thus indicate that the world of insiders is one in which outsiders are not sought for information and advice and is a world in which norms and mores define what is important and what is not. © 1996 John Wiley & Sons, Inc.

533 citations


Journal ArticleDOI
David A. Hull1
TL;DR: A case study of stemming algorithms is described which describes a number of novel approaches to evaluation and demonstrates their value.
Abstract: The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the basis of these measures. We claim that average performance figures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This article is a case study of stemming algorithms which describes a number of novel approaches to evaluation and demonstrates their value. © 1996 John Wiley & Sons, Inc.

430 citations



Journal ArticleDOI
TL;DR: The problems with query matching systems are discussed, which were designed for skilled search intermediaries rather than end‐users, and the knowledge and skills they require in the information‐seeking process, illustrated with examples of searching card and online catalogs.
Abstract: Author(s): Borgman, Christine L. | Abstract: We return to arguments made 10 years ago (Borgman, 1986a) that online catalogs are difficult to use because their design does not incorporate sufficient understanding of searching behavior. The earlier article examined studies of information retrieval system searching for their implications for online catalog design; this article examines the implications of card catalog design for online catalogs. With this analysis, we hope to contribute to a better understanding of user behavior and to lay to rest the card catalog design model for online catalogs. We discuss the problems with query matching systems, which were designed for skilled search intermediaries rather than end-users, and the knowledge and skills they require in the information-seeking process, illustrated with examples of searching card and online catalogs. Searching requires conceptual knowledge of the information retrieval process—translating an information need into a searchable query; semantic knowledge of how to implement a query in a given system—the how and when to use system features; and technical skills in executing the query—basic computing skills and the syntax of entering queries as specific search statements. In the short term, we can help make online catalogs easier to use through improved training and documentation that is based on information-seeking behavior, with the caveat that good training is not a substitute for good system design. Our long term goal should be to design intuitive systems that require a minimum of instruction. Given the complexity of the information retrieval problem and the limited capabilities of today's systems, we are far from achieving that goal. If libraries are to provide primary information services for the networked world, they need to put research results on the information-seeking process into practice in designing the next generation of online public access information retrieval systems.

357 citations


Journal ArticleDOI
TL;DR: An evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs is proposed and it is shown that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.
Abstract: Ranking techniques are effective at finding answers in document collections but can be expensive to evaluate. We propose an evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs ; for our test data, queries are evaluated in 2% of the memory of the standard implementation without degradation in retrieval effectiveness. Cpu time and disk traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. The principle of the index design is that inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces cpu time and disk traffic to around one third of the original requirement. We also show that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.

225 citations




Journal ArticleDOI
TL;DR: The use of statistics based both on the frequency of tokens within a literature and on the number of records containing various tokens are discussed, suggesting that token and record frequencies are good indicators of literatures profitably related to some source literature, and that relative recordfrequency are useful in isolating literatures with the potential of containing a discovery.
Abstract: Don R. Swanson has undertaken a program of research to use the published medical literature as a source of discoveries. We have attempted to replicate his discovery of a connection between Raynaud's disease and dietary fish oil, as well as develop computer-based searching methods that could usefully support literature-based discoveries. We have been successful in replicating Swanson's discovery and have developed a method of discovery support based on the complete text of MEDLINE records. From these, we compute statistics based both on the frequency of tokens within a literature and on the number of records containing various tokens. We discuss the use of these statistics, suggesting that token and record frequencies are good indicators of literatures profitably related to some source literature, and that relative record frequencies are useful in isolating literatures with the potential of containing a discovery. © 1996 John Wiley & Sons, Inc.

163 citations



Journal ArticleDOI
TL;DR: A series of thorough, rigorous, and extensive tests is needed of precisely how, and under what conditions, variations in relevance assessments do, and do not, affect measures of retrieval performance.
Abstract: The purpose of this article is to bring attention to the problem of variations in relevance assessments and the effects that these may have on measures of retrieval effectiveness. Through an analytical review of the literature, I show that despite known wide variations in relevance assessments in experimental test collections, their effects on the measurement of retrieval performance are almost completely unstudied. I will further argue that what we know about the many variables that have been found to affect relevance assessments under experimental conditions, as well as our new understanding of psychological, situational, user-based relevance, point to a single conclusion. We can no longer rest the evaluation of information retrieval systems on the assumption that such variations do not significantly affect the measurement of information retrieval performance. A series of thorough, rigorous, and extensive tests is needed, of precisely how, and under what conditions, variations in relevance assessments do, and do not, affect measures of retrieval performance. We need to develop approaches to evaluation that are sensitive to these variations and to human factors and individual differences more generally. Our approaches to evaluation must reflect the real world of real users. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: The analysis of the development of AI in terms of stability and coherency of the journal‐sets during the period 1982–1992 teaches us that AI has emerged as a set of journals with the characteristics of a discipline only since 1988.
Abstract: Has an identifiable core of activities called AI been established, during the AI-boom in the eighties? Is AI already in a “paradigmatic” phase? There has been a lot of disagreement among commentators and specialists about the nature of Artificial Intelligence as a specialty. This makes AI an interesting case of an emerging specialty. We use aggregated journal-journal citations for describing Artificial Intelligence as sets of journals; factor analytic techniques are used to analyze the development of AI in terms of (an emerging) stability and coherency of the journal-sets during the period 1982–1992. The analysis teaches us that AI has emerged as a set of journals with the characteristics of a discipline only since 1988. The thereafter relatively stable set of journals includes both fundamental and applied AI-journals, and journals with a focus on expert systems. Additionally, specialties related to artificial intelligence (like pattern analysis, computer science, cognitive psychology) are identified. Neural network research is a part neither of AI nor of its direct citation environment. Information science is related to AI only in the early eighties. The citation environment of AI is more stable than AI itself. © 1996 John Wiley & Sons, Inc.



Journal ArticleDOI
TL;DR: Novice end users were given 2 hours of training in searching a full‐text magazine database (Magazine ASAP™) on DIALOG and found that most of the searches were performed for the self and were work‐related.
Abstract: Novice end users were given 2 hours of training in searching a full-text magazine database (Magazine ASAP™) on DIALOG. Subjects searched during three to four sessions in the presence of a trained monitor who prompted them to think aloud throughout the sessions. Qualitative analysis of the transcripts and transaction logs yielded empirical information on user variables (purpose, motivation, satisfaction), uses of the database, move types, and every question users asked during the searches. The spontaneous, naturalistic questions were categorized according to affective, cognitive, and sensorimotor speech acts. Results show that most of the searches were performed for the self and were work-related. The most common use of the database was to retrieve full-text articles online and to download and print them out rather than read them on screen. The majority of searches were judged satisfactory. Innovative uses included browsing for background information and obtaining contextualized sentences for language teaching. Searchers made twice as many moves to limit sets as moves to expand sets. Affective questions outnumbered cognitive and sensorimotor questions by two to one. This preponderance of affective micro-information needs during searching might be addressed by new system functions. © 1996 John Wiley & Sons, Inc.


Journal ArticleDOI
TL;DR: A multiple search session model of end‐users' interaction with information retrieval systems based on results from an exploratory study investigating end‐ users' search sessions over time with online public access catalogs or CD‐ROM databases at different stages of their information seeking related to a current research project is discussed.
Abstract: This article discusses a multiple search session model of end-users' interaction with information retrieval systems based on results from an exploratory study investigating end-users' search sessions over time with online public access catalogs (OPAC) or CD-ROM databases at different stages of their information seeking related to a current research project. Interviews were conducted with 200 academic end-users to investigate the occurrence of multiple search sessions. Results show that at the time of the interview, 57% of end-users had conducted multiple search sessions during their research project and 86% of end-users conducted their first search session at the beginning stage of their information-seeking process. Forty-nine percent of end-users had conducted between 1 and 6 search sessions and 8% more than 6 search sessions. Seventy percent of multiple search session end-users' had modified their search terms since their first search session. The implications of the findings for end-user training, information retrieval systems design, and further research are discussed. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: Performance is assessed by counting the number of identifiable errors during the stemming of words from various text samples and it appears that the Lovins stemmer is inferior to the other two in terms of general accuracy.
Abstract: In most previous studies, the effectiveness of stemming algorithms has been compared by determining the retrieval performance for various experimental test collections. The present work assesses performance by counting the number of identifiable errors during the stemming of words from various text samples. This entails manual grouping of the words in each sample; software has been developed to facilitate this. After grouping, the words are stemmed and indices are then computed which represent the rate of understemming and overstemming. Results are presented for three stemmers (Lovins, Porter, and Paice/Husk), in each case using three distinct text samples. Although the results are not entirely clear cut, it appears that the Lovins stemmer is inferior to the other two in terms of general accuracy. The way in which the indices vary with the size of the text sample is also investigated. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: In this article, a quantitative study is reported on the resistance that scientists may encounter when they do innovative work or when they attempt to publish articles that later become highly cited, and a set of 205 commentaries by authors of some of the most cited papers of all times have been examined in order to identify those articles whose authors encountered difficulty in getting his or her work published.
Abstract: In this article a quantitative study is reported on the resistance that scientists may encounter when they do innovative work or when they attempt to publish articles that later become highly cited. A set of 205 commentaries by authors of some of the most-cited papers of all times have been examined in order to identify those articles whose authors encountered difficulty in getting his or her work published. There are 22 commentaries (10.7%) in which authors mention some difficulty or resistance in doing or publishing the research reported in the article. Three of the articles which had problems in being published are the most cited from their respective journals. According the authors' commentaries, although sometimes referees' negative evaluations can help improve the articles, in other instances referees and editors wrongly rejected the highly cited articles. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A task‐oriented assessment of two MEDLINE searching systems, one which promotes traditional Boolean searching on human‐indexed thesaurus terms and the other natural language searching on words in the title, abstract, and indexing terms, demonstrates that both types of systems can be used equally well with minimal training.
Abstract: As retrieval systems become more oriented towards end-users, there is an increasing need for improved methods to evaluate their effectiveness. We performed a task-oriented assessment of two MEDLINE searching systems, one which promotes traditional Boolean searching on human-indexed thesaurus terms and the other natural language searching on words in the title, abstract, and indexing terms. Medical students were randomized to one of the two systems and given clinical questions to answer. The students were able to use each system successfully, with no significant differences in questions correctly answered, time taken, relevant articles retrieved, or user satisfaction between the systems. This approach to evaluation was successful in measuring effectiveness of system use and demonstrates that both types of systems can be used equally well with minimal training. © 1996 John Wiley & Sons, Inc.


Journal ArticleDOI
TL;DR: A theoretical model for understanding DL use in different social worlds is provided, and preliminary DL use patterns to pursue are suggested in a follow-on study.
Abstract: Behind the expectations that Digital Libraries (DLs) will provide access to any document at any time to anyone in any place are questions about whether digital collection, storage, and transmission are useful to people who depend upon library materials. This study focuses on DL use within the context of research activities in Ph.D.-granting institutions. We examine what constitutes effective DL use, how faculty members are using DLs, and how useful they find them. We conducted our study in faculty workplaces: The laboratories and offices where they conduct scholarly research. Our focus is on the human activity systems that unite readers, authors, librarians and researchers with electronic materials, resource streams, computer equipment and know-how. We examine practices involving three clusters of informants: Faculty researchers who produce and make use of scholarly materials, librarians who facilitate access to digital and nondigital collections, and computer support providers who manage the arrangements of electronic resources. Our study, which included two research universities in two disciplines (molecular biology and literary theory), provides a theoretical model for understanding DL use in different social worlds, and suggests preliminary DL use patterns to pursue in a follow-on study. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: It is argued that the dilemma of measurement has remained intractable even given the different assumptions of the different approaches for three connected reasons—the nature of the subject matter of the field; the nature of relevance judgment; and thenature of cognition and knowledge.
Abstract: The problem of measurement in information retrieval research is traced to its source in the first retrieval tests. The problem is seen as presenting a chronic dilemma for the field. This dilemma has taken three forms as the discipline has evolved: (1) The dilemma of measurement in the archetypal approach: Stated relevance versus user relevance; (2) the dilemma of measurement in the probabilistic approach: Realism versus formalism; and (3) the dilemma of measurement in the Information Retrieval-Expert System (IR-ES) approach: Linear measures of relevance versus logarithmic measures of knowledge. It is argued that the dilemma of measurement has remained intractable even given the different assumptions of the different approaches for three connected reasons—the nature of the subject matter of the field; the nature of relevance judgment; and the nature of cognition and knowledge. Finally, it is concluded that the original vision of information retrieval research as a discipline founded on quantification proved restricting for its theoretical and methodological development and that increasing recognition of this is reflected in growing interest in qualitative methods in information retrieval research in relation to the cognitive, behavioral, and affective aspects of the information retrieval interaction. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: In this paper, the authors explored factors that influence the adoption and use of electronic networks and network services by science and engineering faculty in small universities and colleges, and found that physical access to a networked workstation seems to be the biggest determinant to adoption of the network.
Abstract: Adoption of an NSFnet connection at an institutional level is a costly undertaking. The decision to connect requires a hierarchy of subordinate decisions relating to the network connection. If any group of faculty resist adopting and using the network, the potential benefits of the network and its services will not be realized for the institution as a whole. A study was undertaken to explore factors that influence the adoption and use of electronic networks and network services by science and engineering faculty in small universities and colleges. Adoption was measured by the dichotomous variable of use and non-use for the network and for five individual services. Intensity of use was selected as a measure of use. In general, factors found to influence the adoption of the network are different from those that influence the intensity of use and the number of services used. For this reason, different actions are necessary to enhance adoption and increase use. Physical access to a networked workstation seems to be the biggest determinant to adoption of the network. Expanding training programs to include a broader audience and a broader scope will increase use. © 1996 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: The role of animation in enhancing recall and comprehension of text by grade 6 primary school students and the relationship between students' spatial skills and their ability to recall and comprehend a text enhanced with still images and animation was investigated.
Abstract: This article reports the third and final phase of a research project to investigate the role of animation in enhancing recall and comprehension of text by grade 6 primary school students. This phase had three objectives: To determine whether a complex descriptive text is enhanced by animation so long as the animation exhibits close semantic links with the text; to explore the importance of captions in linking an animation with a text so as to increase comprehension of the text; and to investigate the relationship between students' spatial skills and their ability to recall and comprehend a text enhanced with still images and animation. A descriptive text on the structure and functions of the heart from Compton's Multimedia Encyclopedia was linked to a still image and two animation sequences developed by the research team and which were both more extensive and more completely integrated semantically with the text than in the original Compton's version. Four presentation conditions were produced: Text; text and still image; text, still image, and animations; and text, still image, animations, and captions. Students were tested for spatial ability and divided into two groups: Low and high spatial ability. Their comprehension was tested using three tasks: Written recall, multiple choice questions, and problem-solving. Animation improved significantly only the problem-solving task, but this was the measure which involved the highest level of cognitive effort. Students with high spatial ability in general performed better than students with low spatial ability regardless of presentation condition, and in the case of propositional and thematic recall, this was significant. The addition of captions to the animation sequences had no significant effect but this may be because the sequences also included labels which could have obviated the need for captions. — Authors' Abstract

Journal ArticleDOI
TL;DR: By reading, you can know the knowledge and things more, not only about what you get from people to people, and book will be more trusted and give you the good idea to be successful.
Abstract: By reading, you can know the knowledge and things more, not only about what you get from people to people. Book will be more trusted. As this beyond the information systems outsourcing bandwagon the insourcing response, it will really give you the good idea to be successful. It is not only for you to be success in certain life you can be successful in everything. The success can be started by knowing the basic knowledge and do actions.

Journal ArticleDOI
TL;DR: It is evident in this study that automated word removal based on corpus statistics has a practical and significant impact on the computational tractability of categorization methods in large databases.
Abstract: This article studies aggressive word removal in text categorization to reduce the noise in free texts and to enhance the computational efficiency of categorization. We use a novel stop word identification method to automatically generate domain specific stoplists which are much larger than a conventional domain-independent stoplist. In our tests with three categorization methods on text collections from different domains/applications, significant numbers of words were removed without sacrificing categorization effectiveness. In the test of the Expert Network method on CACM documents, for example, an 87% removal of unique words reduced the vocabulary of documents from 8,002 distinct words to 1,045 words, which resulted in a 63% time savings and a 74% memory savings in the computation of category ranking, with a 10% precision improvement on average over not using word removal. It is evident in this study that automated word removal based on corpus statistics has a practical and significant impact on the computational tractability of categorization methods in large databases.

Journal ArticleDOI
TL;DR: In this article, Correspondence Factorial Analysis (CFA) is used to show how the 48 most prolific nations stand in relation to each with regard to their publication interests in 17 specific disciplinary areas and one multidisciplinary field over the period 1981-1992.
Abstract: This study illustrates the application of a descriptive multivariate statistical method, Correspondence Factorial Analysis (CFA), to the analysis of a dataset of over 6 million bibliometric entries (data from ISI). CFA is used to show how the 48 most prolific nations stand in relation to each with regard to their publication interests in 17 specific disciplinary areas and one multidisciplinary field over the period 1981-1992. The output of a CFA is a map displaying proximity among variables (countries and disciplines) and constitutes an impartial working document for experts interested in the evaluation of science. The present study focuses on three aspects of a CFA : (1) The normalized publication patterns of countries with a common feature ( e.g., that belong to the same geopolitical zone, economic union, etc.) can be pooled in order to highlight the position of the union with respect to individual countries ; (2) complex CFA maps can be simplified by selecting reference countries or disciplines and observing how the remaining countries and disciplines relate to these references ; (3) data on additional countries (new publication profiles) or on additional variables (e.g., socio-economic data on all the countries under study) can be introduced into the CFA maps used as mathematical models. Our CFA of the ISI dataset reveals the scientific interests of nations in relative terms. The main cleavage (the first factorial axis) is between countries that still concentrate on the disciplines of the industrial revolution such as physics and chemistry (or that have turned toward their offspring, materials sciences) and those that have veered toward more modern disciplines such as the life sciences (e.g., clinical medicine), the environment, and computer sciences. The second cleavage, along the second factorial axis, is between countries that focus on the agricultural sciences (the land surface) and those interested in the geosciences (the sea, earth's mantle, and mining). The third and fourth axes discriminate even further between earth, life, and abstract sciences highlighting the ostensible relationship between (organic) chemistry and all life science disciplines and between physics and disciplines related to engineering, materials sciences, etc. The CFA maps disclose the specific behavior of each country with respect to these cleavages.

Journal ArticleDOI
TL;DR: The Malay stemming algorithm developed by Othman is studied and new versions proposed to enhance its performance relate to the order in which the dictionary is looked-up, the order of the morphological rules are applied, and the number of rules.
Abstract: Stemming is used in information retrieval systems to reduce variant word forms to common roots in order to improve retrieval effectiveness. As in other languages, there is a need for an effective stemming algorithm for the indexing and retrieval of Malay documents. The Malay stemming algorithm developed by Othman is studied and new versions proposed to enhance its performance. The improvements relate to the order in which the dictionary is looked-up, the order in which the morphological rules are applied, and the number of rules. © 1996 John Wiley & Sons, Inc.