Showing papers on "Human–computer information retrieval published in 2003"

PDF

Open Access

Journal Article•DOI•

Stuff I've Seen: A System for Personal Information Retrieval and Re-Use

[...]

Susan T. Dumais¹, Edward Cutrell¹, Jonathan J. Cadiz¹, Gavin Jancke¹, Raman K. Sarin¹, Daniel C. Robbins¹ - Show less +2 more•Institutions (1)

Microsoft¹

28 Jul 2003

TL;DR: The design and evaluation of a system, called Stuff I've Seen (SIS), that facilitates information re-use and provides a unified index of information that a person has seen, whether it was seen as email, web page, document, appointment, etc.

...read moreread less

Abstract: Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information. We describe the design and evaluation of a system, called Stuff I've Seen (SIS), that facilitates information re-use. This is accomplished in two ways. First, the system provides a unified index of information that a person has seen, whether it was seen as email, web page, document, appointment, etc. Second, because the information has been seen before, rich contextual cues can be used in the search interface. The system has been used internally by more than 230 employees. We report on both qualitative and quantitative aspects of system use. Initial findings show that time and people are important retrieval cues. Users find information more easily using SIS, and use other search tools less frequently after installation.

...read moreread less

887 citations

Journal Article•

The IIR evaluation model: a framework for evaluation of interactive information retrieval systems

[...]

Pia Borlund¹•Institutions (1)

University of Copenhagen¹

01 Jan 2003-Information Research: An International Electronic Journal

439 citations

Journal Article•DOI•

An investigation of user attitudes toward search engines as an information retrieval tool

[...]

Shu-Sheng Liaw, Hsiu-Mei Huang

01 Nov 2003-Computers in Human Behavior

TL;DR: The results show that individual computer experience, quality of search systems, motivation, and perceptions of technology acceptance are all key factors that affect individual feelings to use search engines as an information retrieval tool.

...read moreread less

342 citations

Proceedings Article•DOI•

Improving requirements tracing via information retrieval

[...]

Jane Huffman Hayes¹, Alex Dekhtyar¹, J. Osborne¹•Institutions (1)

University of Kentucky¹

08 Sep 2003

TL;DR: Initial results suggest that the algorithms presented can retrieve a significantly higher percentage of the links than analysts, even when using existing tools, and do so in much less time while achieving comparable signal-to-noise levels.

...read moreread less

Abstract: We present an approach for improving requirements tracing based on framing it as an information retrieval (IR) problem. Specifically, we focus on improving recall and precision in order to reduce the number of missed traceability links as well as to reduce the number of irrelevant potential links that an analyst has to examine when performing requirements tracing. Several IR algorithms were adapted and implemented to address this problem. We evaluated our algorithms by comparing their results and performance to those of a senior analyst who traced manually as well as with an existing requirements tracing tool. Initial results suggest that we can retrieve a significantly higher percentage of the links than analysts, even when using existing tools, and do so in much less time while achieving comparable signal-to-noise levels.

...read moreread less

296 citations

Journal Article•DOI•

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

[...]

01 Apr 2003

TL;DR: This report summarizes a discussion of IR research challenges that took place at a recent workshop, which identified Contextual retrieval and global information access were identified as particularly important long-term challenges.

...read moreread less

Abstract: Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

...read moreread less

240 citations

Book Chapter•DOI•

Multimedia search with pseudo-relevance feedback

[...]

Rong Yan¹, Alexander G. Hauptmann¹, Rong Jin¹•Institutions (1)

Carnegie Mellon University¹

24 Jul 2003

TL;DR: The results are encouraging, indicating that pseudo-relevance feedback shows great promise for multimedia retrieval with very varied and errorful data.

...read moreread less

Abstract: We present an algorithm for video retrieval that fuses the decisions of multiple retrieval agents in both text and image modalities. While the normalization and combination of evidence is novel, this paper emphasizes the successful use of negative pseudorelevance feedback to improve image retrieval performance. Although we have not solved all problems in video information retrieval, the results are encouraging, indicating that pseudo-relevance feedback shows great promise for multimedia retrieval with very varied and errorful data.

...read moreread less

229 citations

Proceedings Article•DOI•

Word sense disambiguation in information retrieval revisited

[...]

Christopher Stokoe¹, Michael Oakes¹, John Tait¹•Institutions (1)

University of Sunderland¹

28 Jul 2003

TL;DR: This study explores the development and subsequent evaluation of a statistical word sense disambiguation system which demonstrates increased precision from a sense based vector space retrieval model over traditional TF*IDF techniques.

...read moreread less

Abstract: Word sense ambiguity is recognized as having a detrimental effect on the precision of information retrieval systems in general and web search systems in particular, due to the sparse nature of the queries involved. Despite continued research into the application of automated word sense disambiguation, the question remains as to whether less than 90% accurate automated word sense disambiguation can lead to improvements in retrieval effectiveness. In this study we explore the development and subsequent evaluation of a statistical word sense disambiguation system which demonstrates increased precision from a sense based vector space retrieval model over traditional TF*IDF techniques.

...read moreread less

211 citations

Patent•

Automatic method and system for formulating and transforming representations of context used by information services

[...]

Kristian J. Hammond¹, Jerome Budzik¹, Lawrence Birnbaum¹•Institutions (1)

Northwestern University¹

30 Jul 2003

TL;DR: In this article, an information retrieval system for automatically retrieving information related to the context of an active task being manipulated by a user is presented, where the system observes the operation of the active task and user interactions and utilizes predetermined criteria to generate a context representation.

...read moreread less

Abstract: An information retrieval system for automatically retrieving information related to the context of an active task being manipulated by a user. The system observes the operation of the active task and user interactions, and utilizes predetermined criteria to generate a context representation of the active task that are relevant to the context of the active task. The information retrieval system then processes the context representation to generate queries or search terms for conducting an information search. The information retrieval system reorders the terms in a query so that they occur in a meaningful order as they naturally occur in a document or active task being manipulated by the user. Furthermore, the information retrieval system may access a user profile to retrieve information related to the user, and the select information sources or transform search terms based on attributes related to the user, such as the user's occupation, position in a company, major in school, etc.

...read moreread less

206 citations

Journal Article•

Järvelin, K., & On conceptual models for information seeking and retrieval research.

[...]

TD Wilson

01 Jan 2003-Information Research

177 citations

Proceedings Article•

Natural Language Processing in Information Retrieval.

[...]

Thorsten Brants

01 Jan 2003

TL;DR: NLP needs to be optimized for IR in order to be effective and document retrieval is not an ideal application for NLP, at least given the current state-of-the-art in NLP.

...read moreread less

Abstract: Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc.) usually yield significant improvements, while higher-level processing (chunking, parsing, word sense disambiguation, etc.) only yield very small improvements or even a decrease in accuracy. At the same time, higher-level methods increase the processing and storage cost dramatically. This makes them hard to use on large collections. We review NLP techniques and come to the conclusion that (a) NLP needs to be optimized for IR in order to be effective and (b) document retrieval is not an ideal application for NLP, at least given the current state-of-the-art in NLP. Other IR-related tasks, e.g., question answering and information extraction, seem to be better suited.

...read moreread less

156 citations

Proceedings Article•DOI•

Query expansion using associated queries

[...]

Bodo Billerbeck¹, Falk Scholer¹, Hugh E. Williams¹, Justin Zobel¹•Institutions (1)

RMIT University¹

03 Nov 2003

TL;DR: This work proposes a new method of obtaining expansion terms, based on selecting terms from past user queries that are associated with documents in the collection, that is effective for query expansion for web retrieval.

...read moreread less

Abstract: Hundreds of millions of users each day use web search engines to meet their information needs Advances in web search effectiveness are therefore perhaps the most significant public outcomes of IR research Query expansion is one such method for improving the effectiveness of ranked retrieval by adding additional terms to a query In previous approaches to query expansion, the additional terms are selected from highly ranked documents returned from an initial retrieval run We propose a new method of obtaining expansion terms, based on selecting terms from past user queries that are associated with documents in the collection Our scheme is effective for query expansion for web retrieval: our results show relative improvements over unexpanded full text retrieval of 26%--29%, and 18%--20% over an optimised, conventional expansion approach

...read moreread less

Proceedings Article•

ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval

[...]

Torsten Suel¹, Chandan Mathur, Jo-wen Wu, Jiangong Zhang¹, Alex Delis, Mehdi Kharrazi, Xiaohui Long², Xiaohui Long¹, Kulesh Shanmugasundaram - Show less +5 more•Institutions (2)

New York University¹, Microsoft²

01 Jun 2003

TL;DR: ODISSEA as discussed by the authors is a P2P-based search engine for massive document collections that uses a two-tier search engine architecture and a global index structure distributed over the nodes of the system.

...read moreread less

Abstract: We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Architecture) that is currently under development in our group. ODISSEA provides a highly distributed global indexing and query execution service that can be used for content residing inside or outside of a P2P network. ODISSEA is different from many other approaches to P2P search in that it assumes a two-tier search engine architecture and a global index structure distributed over the nodes of the system. We give an overview of the proposed system and discuss the basic design choices. Our main focus is on efficient query execution, and we discuss how recent work on topqueries in the database community can be applied in a highly distributed environment. We also give some preliminary simulation results on a real search engine log and a terabytesize web page collection that indicate good scalability for our approach. Project homepage: http://cis.poly.edu/westlab/odissea/. A preliminary version of this paper appeared at the International Workshop on the Web and Databases, June 2003. Contact author. Email: suel@poly.edu

...read moreread less

Proceedings Article•DOI•

Information seeking and sharing in design teams

[...]

Steven Poltrock¹, Jonathan Grudin², Susan T. Dumais², Raya Fidel³, Harry Bruce³, Annelise Mark Pejtersen - Show less +2 more•Institutions (3)

Boeing Phantom Works¹, Microsoft², University of Washington³

09 Nov 2003

TL;DR: Field studies of information gathering in two design teams that had very different products, disciplinary backgrounds, and tools found striking similarities in the kinds of information they sought and the methods used to get it.

...read moreread less

Abstract: Information retrieval is generally considered an individual activity, and information retrieval research and tools reflect this view. As digitally mediated communication and information sharing increase, collaborative information retrieval merits greater attention and support. We describe field studies of information gathering in two design teams that had very different products, disciplinary backgrounds, and tools. We found striking similarities in the kinds of information they sought and the methods used to get it. For example, each team sought information about design constraints from external sources. A common strategy was to propose ideas and request feedback, rather than to ask directly for recommendations. Some differences in information seeking and sharing reflected differences in work contexts. Our findings suggest some ways that existing team collaboration tools could support collaborative information retrieval more effectively.

...read moreread less

Proceedings Article•DOI•

Negative pseudo-relevance feedback in content-based video retrieval

[...]

Rong Yan¹, Alexander G. Hauptmann¹, Rong Jin¹•Institutions (1)

Carnegie Mellon University¹

02 Nov 2003

TL;DR: This work presents a novel approach that uses pseudo-relevance feedback from retrieved items that are NOT similar to the query items without further inquiring user feedback and suggests a score combination scheme via posterior probability estimation.

...read moreread less

Abstract: Video information retrieval requires a system to find information relevant to a query which may be represented simultaneously in different ways through a text description, audio, still images and/or video sequences. We present a novel approach that uses pseudo-relevance feedback from retrieved items that are NOT similar to the query items without further inquiring user feedback. We provide insight into this approach using a statistical model and suggest a score combination scheme via posterior probability estimation. An evaluation on the 2002 TREC Video Track queries shows that this technique can improve video retrieval performance on a real collection. We believe that negative pseudo-relevance feedback shows great promise for very difficult multimedia retrieval tasks, especially when combined with other different retrieval algorithms.

...read moreread less

Proceedings Article•

Information retrieval on the Semantic Web: Integrating inference and retrieval

[...]

James Mayfield¹, Tim Finin²•Institutions (2)

Johns Hopkins University Applied Physics Laboratory¹, University of Maryland, Baltimore County²

01 Aug 2003

TL;DR: It is suggested that retrieval can be tightly bound to infer- ence, which makes today's Web search engines useful to Semantic Web inference engines, and causes improvements in either retrieval or inference to lead directly to improvements in the other.

...read moreread less

Abstract: vision of the Semantic Web is that it will be much like the Web we know today, except that documents will be enriched by annotations in machine understandable markup. These annotations will provide metadata about the documents as well as machine interpretable statements capturing some of the meaning of document content. We discuss how the informa- tion retrieval paradigm might be recast in such an environ- ment. We suggest that retrieval can be tightly bound to infer- ence. Doing so makes today's Web search engines useful to Semantic Web inference engines, and causes improvements in either retrieval or inference to lead directly to improvements in the other.

...read moreread less

Patent•

Multi-modal fusion in content-based retrieval

[...]

Ching-Yung Lin¹, Apostol Natsev, Milind Naphade¹, John R. Smith¹, Belle L. Tseng¹ - Show less +1 more•Institutions (1)

IBM¹

30 Jun 2003

TL;DR: In this article, the use of search fusion methods for querying multimedia databases and more specifically to a method and system for constructing a multi-modal query of a multimedia repository by forming multiple uni modal searches and explicitly selecting fusion methods.

...read moreread less

Abstract: The present invention relates to the use of search fusion methods for querying multimedia databases and more specifically to a method and system for constructing a multi-modal query of a multimedia repository by forming multiple uni-modal searches and explicitly selecting fusion methods for combining their results. The present invention also relates to the integration of search methods for content-based retrieval, model-based retrieval, text-based retrieval, and metadata search, and the use of graphical user interfaces allowing the user to form queries fusing these search methods.

...read moreread less

Automatic Image Annotation and Retrieval using CrossMedia Relevance Models

[...]

Jiwoon Jeon¹, V. Lavrenko¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2003

TL;DR: The approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval by assuming that regions in an image can be described using a small vocabulary of blobs.

...read moreread less

Abstract: Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models. allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

...read moreread less

Proceedings Article•

A natural language query interface for tourism information.

[...]

Michael Dittenbach, Dieter Merkl, Helmut Berger

01 Jan 2003

TL;DR: A query interface exploiting the intuitiveness of natural language for the largest Austrian web-based tourism platform Tiscover, which shows how users formulate queries when their imagination is not limited by conventional search interfaces with structured forms consisting of check boxes, radio buttons and special-purpose text fields.

...read moreread less

Abstract: With the increasing amount of information available on the Internet one of the most challenging tasks is to provide search interfaces that are easy to use without having to learn a specific syntax. Hence, we present a query interface exploiting the intuitiveness of natural language for the largest Austrian web-based tourism platform Tiscover. Furthermore, we will describe the results and our insights from analyzing the natural language queries collected during a field trial in which the interface was promoted via the Tiscover homepage. This analysis shows how users formulate queries when their imagination is not limited by conventional search interfaces with structured forms consisting of check boxes, radio buttons and special-purpose text fields. The results of this field test are thus valuable indicators into which direction the web-based tourism information system should be extended to better serve the customers.

...read moreread less

Book•DOI•

Multimedia Information Retrieval and Management

[...]

David Dagan Feng, Wan-Chi Siu, Hong-Jiang Zhang

01 Jan 2003

Proceedings Article•DOI•

Web image retrieval re-ranking with relevance model

[...]

W.-H. Lin¹, Rong Jin¹, Alexander G. Hauptmann¹•Institutions (1)

Carnegie Mellon University¹

13 Oct 2003

TL;DR: A re-ranking method to improve Web image retrieval by reordering the images retrieved from an image search engine based on a relevance model, which is a probabilistic model that evaluates the relevance of the HTML document linking to the image, and assigns a probability of relevance.

...read moreread less

Abstract: Web image retrieval is a challenging task that requires efforts from image processing, link structure analysis, and Web text retrieval. Since content-based image retrieval is still considered very difficult, most current large-scale Web image search engines exploit text and link structure to "understand" the content of the Web images. However, local text information, such as caption, filenames and adjacent text, is not always reliable and informative. Therefore, global information should be taken into account when a Web image retrieval system makes relevance judgment. We propose a re-ranking method to improve Web image retrieval by reordering the images retrieved from an image search engine. The re-ranking process is based on a relevance model, which is a probabilistic model that evaluates the relevance of the HTML document linking to the image, and assigns a probability of relevance. The experiment results showed that the re-ranked image retrieval achieved better performance than original Web image retrieval, suggesting the effectiveness of the re-ranking method. The relevance model is learned from the Internet without preparing any training data and independent of the underlying algorithm of the image search engines. The re-ranking process should be applicable to any image search engines with little effort.

...read moreread less

Book Chapter•DOI•

Relevance Models in Information Retrieval

[...]

Victor Lavrenko¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2003

TL;DR: A simple statistical model for capturing the notion of topical relevance in information retrieval, called a relevance model, is developed and extensive evaluations of the relevance model approach are described on the TREC ad-hoc retrieval and cross-language tasks.

...read moreread less

Abstract: We develop a simple statistical model, called a relevance model, for capturing the notion of topical relevance in information retrieval. Estimating probabilities of relevance has been an important part of many previous retrieval models, but we show how this estimation can be done in a more principled way based on a generative or language model approach. In particular, we focus on estimating relevance models when training examples (examples of relevant documents) are not available. We describe extensive evaluations of the relevance model approach on the TREC ad-hoc retrieval and cross-language tasks. In both cases, rankings based on relevance models significantly outperform strong baseline approaches.

...read moreread less

Book Chapter•DOI•

Natural language in information retrieval

[...]

Elzbieta Dura

16 Feb 2003

TL;DR: The time is ripe for the two to meet: NLP has grown out of prototypes and IR is having hard time trying to improve precision, so two examples of possible approaches are considered.

...read moreread less

Abstract: It seems the time is ripe for the two to meet: NLP has grown out of prototypes and IR is having hard time trying to improve precision. Two examples of possible approaches are considered below. Lexware is a lexicon-based system for text analysis of Swedish applied in an information retrieval task. NLIR is an information retrieval system using intensive natural language processing to provide index terms on a higher level of abstraction than stems.

...read moreread less

Journal Article•DOI•

Active e-document framework ADF: model and tool

[...]

Hai Zhuge¹•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2003-Information & Management

TL;DR: This work improves the current Web information retrieval approach by raising the efficiency of information retrieval, enhancing the preciseness and mobility of information services, and enabling intelligent information services.

...read moreread less

Using concept hierarchies to enhance user queries in web-based information retrieval

[...]

Ahu Sieg, Bamshad Mobasher, Steve Lytinen, Robin Burke¹•Institutions (1)

DePaul University¹

01 Jan 2003

TL;DR: The results show that concept-based query enhancement in ARCH leads to significantly higher precision for ambiguous queries without sacrificing recall, as well as comparing enhanced and non-enhanced queries over a range of topics.

...read moreread less

Abstract: The effectiveness of Internet search engines is often hampered by the ambiguity of user queries and the reluctance or inability of users to build less ambiguous multi-word queries. Our system, ARCH, is a client-side Web agent, which incorporates domain-specific concept hierarchies together with interactive query formulation in order to automatically produce a richer and therefore less ambiguous query. Unlike traditional relevance feedback methods, ARCH assists users in query modification prior to the search task. ARCH uses the domain knowledge inherent in Web-based classification hierarchies such as Yahoo, combined with a user’s profile information, to add just those terms likely to improve the match with the user’s intent. The goal of the system is therefore to meet the user’s information needs by closing the gap between the user’s stated query and the actual intent of the search. We present a detailed evaluation of the query enhancement in ARCH, comparing enhanced and non-enhanced queries over a range of topics. Our results show that concept-based query enhancement in ARCH leads to significantly higher precision for ambiguous queries without sacrificing recall.

...read moreread less

Journal Article•

Cross-Language Information Retrieval.

[...]

Jian-Yun Nie

01 Jan 2003-IEEE Computational Intelligence Bulletin

TL;DR: The authors provide a comprehensive description of the specifi c problems arising in cross-language information retrieval, the solutions proposed in this area, as well as the remaining problems, and a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR.

...read moreread less

Abstract: Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specifi c problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR.

...read moreread less

Book•DOI•

Intelligent exploration of the web

[...]

Piotr S. Szczepaniak¹, Javier Segovia², Janusz Kacprzyk³, Lotfi A. Zadeh⁴•Institutions (4)

University of Łódź¹, Technical University of Madrid², Polish Academy of Sciences³, University of California, Berkeley⁴

01 Jan 2003

TL;DR: Intelligent Web Agents that Learn to Retrieve and Extract Information and a Neural Net Approach to Data Mining: Classification of Users to Aid Information Management are presented.

...read moreread less

Abstract: Creation and Representation of Web Resources.- Structure Analysis and Generation for Internet Documents.- A Fuzzy System for the Web Page Representation.- Flexible Representation and Retrieval of Web Documents.- Information Retrieval.- Intelligent Information Retrieval on the Web.- Internet as a Challenge to Fuzzy Querying.- Internet Search Based on Text Intuitionistic Fuzzy Similarity.- Content-Based Fuzzy Search in a Multimedia Web Database.- Self-Organizing Maps for Interactive Search in Document Databases.- Methods for Exploratory Cluster Analysis.- Textual Information Retrieval with User Profiles Using Fuzzy Clustering and Inferencing.- Intelligent Clustering as Source of Knowledge for Web Dialogue Manager in an Information Retrieval System.- Document Clustering Using Tolerance Rough Set Model Its Application to Information Retrieval.- Improving Web Search by the Identification of Contextual Information.- Intelligent Internet-Based Multiagent Systems.- Neural Agent for Text Database Discovery.- Intelligent Web Agents that Learn to Retrieve and Extract Information.- Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach.- Web Browsing Using Machine Learning on Text Data.- Retrieval of Semistructured Web Data.- Intelligent Retrieval of Hypermedia Documents.- Bootstrapping an Ontology-Based Information Extraction System.- Web Data Mining and Use.- Intelligent Web Mining.- A Neural Net Approach to Data Mining: Classification of Users to Aid Information Management.- Web-Based Expert Systems: Information Clients versus Knowledge Servers.

...read moreread less

Proceedings Article•DOI•

Exploiting query history for document ranking in interactive information retrieval

[...]

Xuehua Shen¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Jul 2003

TL;DR: Experiments using the TREC data show that incorporating user query history, as context information, consistently improves the retrieval performance in both average precision and precision at 20 documents.

...read moreread less

Abstract: In this poster,we incorporate user query history, as context information, to improve the retrieval performance in interactive retrieval. Experiments using the TREC data show that incorporating such context information indeed consistently improves the retrieval performance in both average precision and precision at 20 documents.

...read moreread less

Journal Article•DOI•

Information retrieval in the Web: beyond current search engines☆

[...]

Ricardo Baeza-Yates¹•Institutions (1)

University of Chile¹

01 Nov 2003-International Journal of Approximate Reasoning

TL;DR: The challenges to expand information retrieval (IR) on the Web, in particular other types of data, Web mining and issues related to crawling are explored.

...read moreread less

Journal Article•DOI•

Enriching web taxonomies through subject categorization of query terms from search engine logs

[...]

Shui-Lung Chuang¹, Lee-Feng Chien¹•Institutions (1)

Academia Sinica¹

01 Apr 2003

TL;DR: A query-categorization approach for categorizing Web query terms from the logs of on-line search services into a predefined subject taxonomy based on their supposed popular search interests is introduced.

...read moreread less

Abstract: In this paper, we propose a query-categorization approach to facilitating the engineering process of constructing Web taxonomies. One primary step in taxonomy construction is to acquire the domain-specific terminology terms and the mapping between the subjects and these terms. We introduce a technique for categorizing Web query terms from the logs of on-line search services into a predefined subject taxonomy based on their supposed popular search interests. The obtained experimental results show our technique's effectiveness in reducing the workload of human indexers in constructing Web taxonomies and also show its usefulness in various Web information retrieval applications.

...read moreread less

Can Relevance be Inferred from Eye Movements in Information Retrieval

[...]

Jarkko Salojärvi¹, Ilpo Kojo¹, Jaana Simola², Samuel Kaski²•Institutions (2)

Helsinki University of Technology¹, Aalto University²

01 Jan 2003

TL;DR: Relevance of document titles to the processing task can be predicted with reasonable accuracy from only a few features, whereas prediction of relevance of specific words will require new features and methods.

...read moreread less

Abstract: We investigate whether it is possible to infer from implicit feedback what is relevant for a user in an information retrieval task. Eye movement signals are measured; they are very noisy but potentially contain rich hints about the current state and focus of attention of the user. In the experimental setting relevance is controlled by giving the user a specific search task, and the modeling goal is to predict from eye movements which of the given titles are relevant. We extract a set of standard features from the signal, and explore the data with statistical information visualization methods including standard self-organizing maps (SOMs) and SOMs that learn metrics. Relevance of document titles to the processing task can be predicted with reasonable accuracy from only a few features, whereas prediction of relevance of specific words will require new features and methods.

...read moreread less

Collapse