scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 2001"


Journal ArticleDOI
TL;DR: It is found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features, and the language of Web queries is distinctive.
Abstract: In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.

1,153 citations


Journal ArticleDOI
TL;DR: The results from published studies of Web searching are reviewed and the searching characteristics of Web users are compared and contrasted with users of traditional information retrieval and online public access systems to discover if there is a need for more studies that focus predominantly or exclusively on Web searching.
Abstract: Research on Web searching is at an incipient stage. This aspect provides a unique opportunity to review the current state of research in the field, identify common trends, develop a methodological framework, and define terminology for future Web searching studies. In this article, the results from published studies of Web searching are reviewed to present the current state of research. The analysis of the limited Web searching studies available indicates that research methods and terminology are already diverging. A framework is proposed for future studies that will facilitate comparison of results. The advantages of such a framework are presented, and the implications for the design of Web information retrieval systems studies are discussed. Additionally, the searching characteristics of Web users are compared and contrasted with users of traditional information retrieval and online public access systems to discover if there is a need for more studies that focus predominantly or exclusively on Web searching. The comparison indicates that Web searching differs from searching in other environments.

466 citations


Journal ArticleDOI
Blaise Cronin1
TL;DR: The wider implications of the ‘hyperauthorship’ phenomenon for scholarly publication are considered and it is proposed that authors be replaced by lists of contributors (the radical model), whose specific inputs to a given study would be recorded unambiguously.
Abstract: Classical assumptions about the nature and ethical entailments of authorship (the standard model) are being challenged by developments in scientific collaboration and multiple authorship. In the biomedical research community, multiple authorship has increased to such an extent that the trustworthiness of the scientific communication system has been called into question. Documented abuses, such as honorific authorship, have serious implications in terms of the acknowledgment of authority, allocation of credit, and assigning of accountability. Within the biomedical world it has been proposed that authors be replaced by lists of contributors (the radical model), whose specific inputs to a given study would be recorded unambiguously. The wider implications of the ‘hyperauthorship’ phenomenon for scholarly publication are considered.

443 citations


Journal ArticleDOI
TL;DR: The results of Part II of a research project as mentioned in this paper investigated the cognitive and physical behaviors of middle school students in using Yahooligans! Seventeen students in the seventh grade searched Yahooligan! to locate relevant information for an assigned research task.
Abstract: This study reports the results of Part II of a research project that investigated the cognitive and physical behaviors of middle school students in using Yahooligans! Seventeen students in the seventh grade searched Yahooligans! to locate relevant information for an assigned research task. Sixty-nine percent partially succeeded, while 31% failed. Children had difficulty completing the task mainly because they lacked adequate level of research skills and approached the task by seeking specific answers. Children's cognitive and physical behaviors varied by success levels. Similarities and differences in children's cognitive and physical behaviors were found between the research task and the fact-based task they performed in the previous study. The present study considers the impact of prior experience in using the Web, domain knowledge, topic knowledge, and reading ability on children's success. It reports the overall patterns of children's behaviors, including searching and browsing moves, backtracking and looping moves, and navigational styles, as well as the time taken to complete the research task. Children expressed their information needs and provided recommendations for improving the interface design of Yahooligans! Implications for formal Web training and system design improvements are discussed.

241 citations


Journal ArticleDOI
TL;DR: In this paper, the role of individual differences in Internet searching was investigated and the results showed that retrieval effectiveness was linked to male gender, low cognitive complexity, an imager cognitive style, and a number of Internet perceptions and study approaches.
Abstract: This article reports the results of a study of the role of individual differences in Internet searching. The dimensions of individual differences forming the focus of the research consisted of: cognitive styles; levels of prior experience; Internet perceptions; study approaches; age; and gender. Sixty-nine Masters students searched for information on a prescribed topic using the AltaVista search engine. Results were assessed using simple binary relevance judgements. Factor analysis and multiple regression revealed interesting differences, retrieval effectiveness being linked to: male gender; low cognitive complexity; an imager (as opposed to verbalizer) cognitive style; and a number of Internet perceptions and study approaches grouped here as indicating low self-efficacy. The implications of these findings for system development and for future research are discussed.

226 citations


Journal ArticleDOI
TL;DR: A two-step model of the discovery process in which hypotheses are generated and subsequently tested is proposed and implemented in a Natural Language Processing system that uses biomedical Unified Medical Language System (UMLS) concepts as its unit of analysis.
Abstract: Literature-based discovery has resulted in new knowledge. In the biomedical context, Don R. Swanson has generated several literature-based hypotheses that have been corroborated experimentally and clinically. In this paper, we propose a two-step model of the discovery process in which hypotheses are generated and subsequently tested. We have implemented this model in a Natural Language Processing system that uses biomedical Unified Medical Language System (UMLS) concepts as its unit of analysis. We use the semantic information that is provided with these concepts as a powerful filter to successfully simulate Swanson's discoveries of connecting Raynaud's disease with fish oil and migraine with a magnesium deficiency.

225 citations


Journal ArticleDOI
TL;DR: An evaluation of Ingwersen's proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research shows that four different WIFs do, in fact, correlate with the conventional academic research measures.
Abstract: Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's ([1998]) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications.

205 citations


Journal ArticleDOI
TL;DR: The study explains how to retrieve citation identities from the Institute for Scientific Information's files on Dialog and how to deal with idiosyncrasies of these files.
Abstract: This study explores the tendency of authors to recite themselves and others in multiple works over time, using the insights gained to build citation theory. The set of all authors whom an author cites is defined as that author's citation identity. The study explains how to retrieve citation identities from the Institute for Scientific Information's files on Dialog and how to deal with idiosyncrasies of these files. As the author's oeuvre grows, the identity takes the form of a core-and-scatter distribution that may be divided into authors cited only once (unicitations) and authors cited at least twice (recitations). The latter group, especially those recited most frequently, are interpretable as symbols of a citer's main substantive concerns. As illustrated by the top recitees of eight information scientists (Marcia J. Bates, Christine L. Borgman, William S. Cooper, Michael H. MacRoberts, Henry Small, Karen Sparck Jones, Don R. Swanson, Patrick Wilson), identities are intelligible, individualized, and wide-ranging. They are ego-centered without being egotistical. They are often affected by social ties between citers and citees, but the universal motivator seems to be the perceived relevance of the citees' works. Citing styles in identities differ: scientific-paper style authors recite heavily, adding to core; bibliographic-essay style authors are heavy on unicitations, adding to scatter; literature-review style authors do both at once. Identities distill aspects of citers' intellectual lives, such as orienting figures, interdisciplinary interests, bidisciplinary careers, and conduct in controversies. They can also be related to past schemes for classifying citations in categories such as positive-negative and perfunctory-organic; indeed, one author's frequent recitation of another, whether positive or negative, may be the readiest indicator of an organic relation between them. The shape of the core-and-scatter distribution of names in identities can be explained by the principle of least effort. Citers economize on effort by frequently reciting only a relatively small core of names in their identities. They also economize by frequent use of perfunctory citations, which require relatively little context, and infrequent use of negative citations, which require contexts more laborious to set.

202 citations



Journal ArticleDOI
TL;DR: A formal model and evaluative criteria are herein suggested and explained to provide a means for accurately ascribing cognitive authority in a networked environment; the model is unique in its representation of overt and covert affiliations as a mechanism for ascribing proper authority to Internet information.
Abstract: Many people fail to properly evaluate Internet information. This is often due to a lack of understanding of the issues surrounding evaluation and authority, and, more specifically, a lack of understanding of the structure and modi operandi of the Internet and the Domain Name System. The fact that evaluation is not being properly performed on Internet information means both that questionable information is being used recklessly, without adequately assessing its authority, and good information is being disregarded, because trust in the information is lacking. Both scenarios may be resolved by ascribing proper amounts of cognitive authority to Internet information. Traditional measures of authority present in a print environment are lacking on the Internet, and, even when occasionally present, are of questionable veracity. A formal model and evaluative criteria are herein suggested and explained to provide a means for accurately ascribing cognitive authority in a networked environment; the model is unique in its representation of overt and covert affiliations as a mechanism for ascribing proper authority to Internet information.

185 citations


Journal ArticleDOI
TL;DR: In this paper, the authors' use of theory in 1,160 articles that appeared in six information science (IS) journals from 1993-1998 was investigated, and it was found that theory was discussed in 34.1% of the articles (0.93 theory incidents per article; 2.73 incident per article when considering only those articles employing theory).
Abstract: We report on our findings regarding authors' use of theory in 1,160 articles that appeared in six information science (IS) journals from 1993-1998. Our findings indicate that theory was discussed in 34.1% of the articles (0.93 theory incidents per article; 2.73 incidents per article when considering only those articles employing theory). The majority of these theories were from the social sciences (45.4%), followed by IS (29.9%), the sciences (19.3%), and humanities (5.4%). New IS theories were proposed by 71 authors. When compared with previous studies, our results suggest an increase in the use of theory within IS. However, clear discrepancies were evident in terms of how researchers working in different subfields define theory. Results from citation analysis indicate that IS theory is not heavily cited outside the field, except by IS authors publishing in other literatures. Suggestions for further research are discussed.

Journal ArticleDOI
TL;DR: A new type of passage is introduced, overlapping fragments of either fixed or variable length, and it is shown that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents.
Abstract: Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.

Journal ArticleDOI
TL;DR: A linguistic model for an Information Retrieval System (IRS) defined using an ordinal fuzzy linguistic approach is proposed, and its use for modeling the imprecision and subjectivity that appear in the user‐IRS interaction is studied.
Abstract: A linguistic model for an Information Retrieval System (IRS) defined using an ordinal fuzzy linguistic approach is proposed. The ordinal fuzzy linguistic approach is presented, and its use for modeling the imprecision and subjectivity that appear in the user-IRS interaction is studied. The user queries and IRS responses are modeled linguistically using the concept of fuzzy linguistic variables. The system accepts Boolean queries whose terms can be weighted simultaneously by means of ordinal linguistic values according to three possible semantics: a symmetrical threshold semantic, a quantitative semantic, and an importance semantic. The first one identifies a new threshold semantic used to express qualitative restrictions on the documents retrieved for a given term. It is monotone increasing in index term weight for the threshold values that are on the right of the mid-value, and decreasing for the threshold values that are on the left of the mid-value. The second one is a new semantic proposal introduced to express quantitative restrictions on the documents retrieved for a term, i.e., restrictions on the number of documents that must be retrieved containing that term. The last one is the usual semantic of relative importance that has an effect when the term is in a Boolean expression. A bottom-up evaluation mechanism of queries is presented that coherently integrates the use of the three semantics and satisfies the separability property. The advantage of this IRS with respect to others is that users can express linguistically different semantic restrictions on the desired documents simultaneously, incorporating more flexibility in the user-IRS interaction.

Journal ArticleDOI
TL;DR: A novel literature‐based approach was developed to identify the user community and its characteristics and the text mining alone identified the intradiscipline applications and extradiscipline impacts and applications.
Abstract: Identifying the users and impact of research is important for research performers, managers, evaluators, and sponsors. It is important to know whether the audience reached is the audience desired. It is useful to understand the technical characteristics of the other research/development/applications impacted by the originating research, and to understand other characteristics (names, organizations, countries) of the users impacted by the research. Because of the many indirect pathways through which fundamental research can impact applications, identifying the user audience and the research impacts can be very complex and time consuming. The purpose of this article is to describe a novel approach for identifying the pathways through which research can impact other research, technology development, and applications, and to identify the technical and infrastructure characteristics of the user population. A novel literature-based approach was developed to identify the user community and its characteristics. The research performed is characterized by one or more articles accessed by the Science Citation Index (SCI) database, beccause the SCI's citation-based structure enables the capability to perform citation studies easily. The user community is characterized by the articles in the SCI that cite the original research articles, and that cite the succeeding generations of these articles as well. Text mining is performed on the citing articles to identify the technical areas impacted by the research, the relationships among these technical areas, and relationships among the technical areas and the infrastructure (authors, journals, organizations). A key component of text mining, concept clustering, was used to provide both a taxonomy of the citing articles' technical themes and further technical insights based on theme relationships arising from the grouping process. Bibliometrics is performed on the citing articles to profile the user characteristics. Citation Mining, this integration of citation bibliometrics and text mining, is applied to the 307 first generation citing articles of a fundamental physics article on the dynamics of vibrating sand-piles. Most of the 307 citing articles were basic research whose main themes were aligned with those of the cited article. However, about 20% of the citing articles were research or development in other disciplines, or development within the same discipline. The text mining alone identified the intradiscipline applications and extradiscipline impacts and applications; this was confirmed by detailed reading of the 307 abstracts. The combination of citation bibliometrics and text mining provides a synergy unavailable with each approach taken independently. Furthermore, text mining is a REQUIREMENT for a feasible comprehensive research impact determination. The integrated multigeneration citation analysis required for broad research impact determination of highly cited articles will produce thousands or tens or hundreds of thousands of citing article Abstracts. Text mining allows the impacts of research on advanced development categories and/or extradiscipline categories to be obtained without having to read all these citing article Abstracts. The multifield bibliometrics provide multiple documented perspectives on the users of the research, and indicate whether the documented audience reached is the desired target audience.

Journal ArticleDOI
TL;DR: In this paper, the authors define the concept of normes sociales, vision du monde, types sociaux, and comportement informationnel, and propose a theory of comportements normatif for decrire le role de l'information dans les vies des communautes avec un espace culturel commun.
Abstract: La theorie du comportement normatif possede un potentiel important pour decrire le role de l'information dans les vies des communautes avec un espace culturel commun. Elle determine la facon dont les gens approchent l'information pour definir le sens de leur realite sociale. Les quatre conceptes qui constituent cette theorie - normes sociales, vision du monde, types sociaux et comportement informationnel - fournissent le cadre pour l'analyse des relations complexes entre l'information et le monde social dans lequel les individus trouvent leur place. Le present article donne un appercu de la vie des communautes virtuelles et des librairies feministes mais des etudes futures utilisant les analyses empiriques plus detaillees de ces deux mondes seront necessaires pour bien cerner la theorie et ses conceptes

Journal ArticleDOI
TL;DR: Comparaison de deux etudes faites en 1997 and 1999 sur le mode d'interrogation du Web par 200000 utilisateurs du moteur Excite ainsi que sur les sujets d'Interrogation (loisirs, sante, education...).
Abstract: Comparaison de deux etudes faites en 1997 et 1999 sur le mode d'interrogation du Web par 200000 utilisateurs du moteur Excite ainsi que sur les sujets d'interrogation (loisirs, sante, education...).

Journal ArticleDOI
TL;DR: A citation analysis of undergraduate term papers in microeconomics revealed a significant decrease in the frequency of scholarly resources cited between 1996 and 1999 as discussed by the authors, showing that only 18% of URLs cited in 1996 led to the correct Internet document.
Abstract: A citation analysis of undergraduate term papers in microeconomics revealed a significant decrease in the frequency of scholarly resources cited between 1996 and 1999. Book citations decreased from 30% to 19%, newspaper citations increased from 7% to 19%, and Web citations increased from 9% to 21%. Web citations checked in 2000 revealed that only 18% of URLs cited in 1996 led to the correct Internet document. For 1999 bibliographies, only 55% of URLs led to the correct document. The authors recommend (1) setting stricter guidelines for acceptable citations in course assignments; (2) creating and maintaining scholarly portals for authoritative Web sites with a commitment to long-term access; and (3) continuing to instruct students how to critically evaluate resources.

Journal ArticleDOI
TL;DR: A novel scheme to represent a user's interest categories, and an adaptive algorithm to learn the dynamics of the user's interests through positive and negative relevance feedback are described.
Abstract: Learning users' interest categories is challenging in a dynamic environment like the Web because they change over time. This article describes a novel scheme to represent a user's interest categories, and an adaptive algorithm to learn the dynamics of the user's interests through positive and negative relevance feedback. We propose a three-descriptor model to represent a user's interests. The proposed model maintains a long-term interest descriptor to capture the user's general interests and a short-term interest descriptor to keep track of the user's more recent, faster-changing interests. An algorithm based on the three-descriptor representation is developed to acquire high accuracy of recognition for long-term interests, and to adapt quickly to changing interests in the short-term. The model is also extended to multiple three-descriptor representations to capture a broader range of interests. Empirical studies confirm the effectiveness of this scheme to accurately model a user's interests and to adapt appropriately to various levels of changes in the user's interests.

Journal ArticleDOI
TL;DR: Theories of aboutness and theories of subject analysis and of related concepts such as topicality are often isolated from each other in the literature of information science and related disciplines.
Abstract: Theories of aboutness and theories of subject analysis and of related concepts such as topicality are often isolated from each other in the literature of information science (IS) and related disciplines. In IS it is important to consider the nature and meaning of these concepts, which is closely related to theoretical and metatheoretical issues in information retrieval (IR). A theory of IR must specify which concepts should be regarded as synonymous concepts and explain how the meaning of the nonsynonymous concepts should be defined.

Journal ArticleDOI
TL;DR: This article outlines a personal view of the changing framework for information retrieval suggested by the Web environment, and goes on to speculate about how some of these changes may manifest in upcoming generations of information retrieval systems.
Abstract: This article outlines a personal view of the changing framework for information retrieval suggested by the Web environment, and then goes on to speculate about how some of these changes may manifest in upcoming generations of information retrieval systems. It also sketches some ideas about the broader context of trust management infrastructure that will be critical during this decade. The pursuit of these agendas is going to call for new collaborations between information scientists and a wide range of other disciplines

Journal ArticleDOI
TL;DR: Initial results of a user evaluation study comparing MetaSpider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements.
Abstract: It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results from generic or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing MetaSpider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.

Journal ArticleDOI
TL;DR: Using novel informatics techniques to process the output of Medline searches, a list of viruses that may have the potential for development as weapons is generated, supporting an inference that the new viruses on the list share certain important characteristics with viruses of known biological warfare interest.
Abstract: Using novel informatics techniques to process the output of Medline searches, we have generated a list of viruses that may have the potential for development as weapons. Our findings are intended as a guide to the virus literature to support further studies that might then lead to appropriate defense and public health measures. This article stresses methods that are more generally relevant to information science. Initial Medline searches identified two kinds of virus literatures---the first concerning the genetic aspects of virulence, and the second concerning the transmission of viral diseases. Both literatures taken together are of central importance in identifying research relevant to the development of biological weapons. Yet, the two literatures had very few articles in common. We downloaded the Medline records for each of the two literatures and used a computer to extract all virus terms common to both. The fact that the resulting virus list includes most of an earlier independently published list of viruses considered by military experts to have the highest threat as potential biological weapons served as a test of the method; the test outcome showed a high degree of statistical significance, thus supporting an inference that the new viruses on the list share certain important characteristics with viruses of known biological warfare interest.

Journal ArticleDOI
Yin Zhang1
TL;DR: The longitudinal analysis of e-source citations shows that there has been a notable increase in the number and proportion of authors who cite e-sources in their research articles over the 8-year period, and suggests that a limited number of criteria can be implemented in practice for scholars to evaluate electronic sources and systems.
Abstract: This research examines the use of Internet-based electronic resources (e-sources) by a group of library and information science (LIS) scholars. It focuses particularly on how scholars use, cite, and evaluate e-sources during the research process. This research also explores the problems scholars encounter and concerns they have when using e-sources for research. The following approaches were used to collect data for the investigation: (a) a longitudinal analysis of e-source citations in eight LIS journals from 1991 through 1998; (b) a survey of editors of the eight journals; and (c) a survey of 201 authors with articles to be published in the eight journals. The longitudinal analysis of e-source citations shows that there has been a notable increase in the number and proportion of authors who cite e-sources in their research articles over the 8-year period, although at the time of this study, e-sources were still cited much less frequently than print sources. This result provides empirical evidence that e-sources are increasingly used among scholars. Complementing the citation data, the results from the author survey show that e-sources are becoming an important component in scholars' research, and are serving a wide range of purposes and functions. The number of access points and self-perceived overall ability to use the Internet are identified as the two significant variables affecting frequency of e-source use. The results of this study also suggest that a limited number of criteria can be implemented in practice for scholars to evaluate electronic sources and systems. When citing e-sources, scholars consider some factors that are unique to e-sources, in addition to the factors they consider for print sources. Although the advantages of e-sources promote citing, some drawbacks of e-sources at this stage serve as a barrier. The survey of editors also reveals a lack of clearly stated editorial policies regarding citing e-sources. The major problems and concerns reported by scholars regarding using e-sources are summarized, and both the theoretical implications and practical applications of the findings are discussed.

Journal ArticleDOI
TL;DR: An effective approach to and a prototype system for image retrieval from the Internet using Web mining to improve image retrieval performance in three aspects, including the accuracy of the document space model of image representation obtained from the Web pages is improved.
Abstract: The popularity of digital images is rapidly increasing due to improving digital imaging technologies and convenient availability facilitated by the Internet. However, how to find user-intended images from the Internet is nontrivial. The main reason is that the Web images are usually not annotated using semantic descriptors. In this article, we present an effective approach to and a prototype system for image retrieval from the Internet using Web mining. The system can also serve as a Web image search engine. One of the key ideas in the approach is to extract the text information on the Web pages to semantically describe the images. The text description is then combined with other low-level image features in the image similarity assessment. Another main contribution of this work is that we apply data mining on the log of users' feedback to improve image retrieval performance in three aspects. First, the accuracy of the document space model of image representation obtained from the Web pages is improved by removing clutter and irrelevant text information. Second, to construct the user space model of users' representation of images, which is then combined with the document space model to eliminate mismatch between the page author's expression and the user's understanding and expectation. Third, to discover the relationship between low-level and high-level features, which is extremely useful for assigning the low-level features' weights in similarity assessment.

Journal ArticleDOI
TL;DR: The discussion includes an analysis of the first results and proposals for possible developments in the future of the CLEF (Cross-Language Evaluation Forum) series of evaluation campaigns for information retrieval systems operating on European languages.
Abstract: The goals of the CLEF (Cross-Language Evaluation Forum) series of evaluation campaigns for information retrieval systems operating on European languages are described. The difficulties of organizing an activity which aims at an objective evaluation of systems running on and over a number of different languages are examined. The discussion includes an analysis of the first results and proposals for possible developments in the future.

Journal ArticleDOI
TL;DR: The analysis shows that each group of user sessions in the MELVYL system had distinct patterns of use of the system, which justifies the methodology employed in this study.
Abstract: Different users of a Web-based information system will have different goals and different ways of performing their work. This article explores the possibility that we can automatically detect usage patterns without demographic information about the individuals. First, a set of 47 variables was defined that can be used to characterize a user session. The values of these variables were computed for approximately 257,000 sessions. Second, principal component analysis was employed to reduce the dimensions of the original data set. Third, a two-stage, hybrid clustering method was proposed to categorize sessions into groups. Finally, an external criteria-based test of cluster validity was performed to verify the validity of the resulting usage groups (clusters). The proposed methodology was demonstrated and tested for validity using two independent samples of user sessions drawn from the transaction logs of the University of California's MELVYL® on-line library catalog system (www.melvyl.ucop.edu). The results indicate that there were six distinct categories of use in the MELVYL system: knowledgeable and sophisticated use, unsophisticated use, highly interactive use with good search performance, known-item searching, help-intensive searching, and relatively unsuccessful use. Their characteristics were interpreted and compared qualitatively. The analysis shows that each group had distinct patterns of use of the system, which justifies the methodology employed in this study.

Journal ArticleDOI
TL;DR: The study investigates the various regions of across a distribution of users' relevance judgments, including how these regions may be categorized, measured, and evaluated, and suggests that the middle region of a distribution, also called “partial relevance,” represents a key avenue for ongoing study.
Abstract: The dichotomous bipolar approach to relevance has produced an abundance of information retrieval (IR) research. However, relevance studies that include consideration of users' partial relevance judgments are moving to a greater relevance clarity and congruity to impact the design of more effective IR systems. The study reported in this paper investigates the various regions of across a distribution of users' relevance judgments, including how these regions may be categorized, measured, and evaluated. An instrument was designed using four scales for collecting, measuring, and describing end-user relevance judgments. The instrument was administered to 21 end-users who conducted searches on their own information problems and made relevance judgments on a total of 1059 retrieved items. Findings include: (1) overlapping regions of relevance were found to impact the usefulness of precision ratios as a measure of IR system effectiveness, (2) both positive and negative levels of relevance are important to users as they make relevance judgments, (3) topicality was used more to reject rather than accept items as highly relevant, (4) utility was more used to judge items highly relevant, and (5) the nature of relevance judgment distribution suggested a new IR evaluation measure—median effect. Findings suggest that the middle region of a distribution of relevance judgments, also called “partial relevance,” represents a key avenue for ongoing study. The findings provide implications for relevance theory, and the evaluation of IR systems.

Journal ArticleDOI
TL;DR: The study found that educational and professional status, first language, academic background, and computer experience had significant effects in differentiating users on their factor scores.
Abstract: This article reports the results of a study that investigated effects of four user characteristics on users' mental models of information retrieval systems: educational and professional status, first language, academic background, and computer experience. The repertory grid technique was used in the study. Using this method, important components of information retrieval systems were represented by nine concepts, based on four IR experts' judgments. Users' mental models were represented by factor scores that were derived from users' matrices of concept ratings on different attributes of the concepts. The study found that educational and professional status, academic background, and computer experience had significant effects in differentiating users on their factor scores. First language had a borderline effect, but the effect was not significant enough at a = 0.05 level. Specific different views regarding IR systems among different groups of users are described and discussed. Implications of the study for information science and IR system designs are suggested.

Journal ArticleDOI
TL;DR: This approach not only augments traditional domain analysis and the understanding of scientific disciplines, but also produces a persistent and shared knowledge space for researchers to keep track the development of knowledge more effectively.
Abstract: Domain visualization is one of the new research fronts resulted from the proliferation of information visualization, aiming to reveal the essence of a knowledge domain. Information visualization plays an integral role in modeling and representing intellectual structures associated with scientific disciplines. In this article, the domain of computer graphics is visualized based on author cocitation patterns derived from an 18-year span of the prestigious IEEE Computer Graphics and Applications (1982-1999). This domain visualization utilizes a series of visualization and animation techniques, including author cocitation maps, citation time lines, animation of a high-dimensional specialty space, and institutional profiles. This approach not only augments traditional domain analysis and the understanding of scientific disciplines, but also produces a persistent and shared knowledge space for researchers to keep track the development of knowledge more effectively. The results of the domain visualization are discussed and triangulated in a broader context of the computer graphics field.

Journal ArticleDOI
TL;DR: Because expert services are likely to continue to fill a niche for factual questions in the digital reference environment, implications for further research and the development of digital reference services may be appropriately turned to source questions.
Abstract: This article discusses the history and emergence of nonlibrary commercial and noncommercial information services on the World Wide Web. These services are referred to as "expert services", while the term "digital reference" is reserved for library-related on-line information services. Following suggestions in library and information literature regarding quality standards for digital reference, researchers make clear the importance of developing a practicable methodology for critical examination of expert services, and consideration of their relevance to library and other professional information services. A methodology for research in this area and initial data are described. Two hundred forty questions were asked of 20 expert service sites. Findings include performance measures such as response rate, response time, and verifiable answers. Sites responded to 70% of all questions, and gave verifiable answers to 69% of factual questions. Performance was generally highest for factual type questions. Because expert services are likely to continue to fill a niche for factual questions in the digital reference environment, implications for further research and the development of digital reference services may be appropriately turned to source questions. This is contrary to current practice and the emergence of digital reference services reported in related literature thus far.