Showing papers on "Ranking (information retrieval) published in 2000"

PDF

Open Access

Journal Article•DOI•

Stochastic ranking for constrained evolutionary optimization

[...]

Thomas Philip Runarsson, Xin Yao¹•Institutions (1)

01 Sep 2000-IEEE Transactions on Evolutionary Computation

TL;DR: A novel approach to balance objective and penalty functions stochastically, i.e., stochastic ranking, is introduced, and a new view on penalty function methods in terms of the dominance of penalty and objective functions is presented.

...read moreread less

Abstract: Penalty functions are often used in constrained optimization. However, it is very difficult to strike the right balance between objective and penalty functions. This paper introduces a novel approach to balance objective and penalty functions stochastically, i.e., stochastic ranking, and presents a new view on penalty function methods in terms of the dominance of penalty and objective functions. Some of the pitfalls of naive penalty methods are discussed in these terms. The new ranking method is tested using a (/spl mu/, /spl lambda/) evolution strategy on 13 benchmark problems. Our results show that suitable ranking alone (i.e., selection), without the introduction of complicated and specialized variation operators, is capable of improving the search performance significantly.

...read moreread less

1,571 citations

Journal Article•DOI•

IR evaluation methods for retrieving highly relevant documents

[...]

Kalervo Järvelin¹, Jaana Kekäläinen¹•Institutions (1)

University of Tampere¹

01 Jul 2000

TL;DR: The novel evaluation methods and the case demonstrate that non-dichotomous relevance assessments are applicable in IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods.

...read moreread less

Abstract: This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents This is desirable from the user point of view in modem large IR environments The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) two novel measures computing the cumulative gain the user obtains by examining the retrieval result up to a given ranked position We then demonstrate the use of these evaluation methods in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance The test was run with a best match retrieval system (In- Query I) in a text database consisting of newspaper articles The results indicate that the tested strong query structures are most effective in retrieving highly relevant documents The differences between the query types are practically essential and statistically significant More generally, the novel evaluation methods and the case demonstrate that non-dichotomous relevance assessments are applicable in IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods

...read moreread less

1,461 citations

Journal Article•DOI•

Improving the effectiveness of information retrieval with local context analysis

[...]

Jinxi Xu¹, W. Bruce Croft²•Institutions (2)

BBN Technologies¹, University of Massachusetts Amherst²

01 Jan 2000-ACM Transactions on Information Systems

TL;DR: A new technique is proposed, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents.

...read moreread less

Abstract: Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.

...read moreread less

613 citations

Patent•

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query

[...]

Benjamin Thomas Smith¹, Sergey Brin¹, Sanjay Ghemawat¹, Christopher D. Manning¹•Institutions (1)

Google¹

26 Dec 2000

TL;DR: In this article, a system allows a user to submit an ambiguous search query and to receive potentially disambiguated search results by translating a search engine's conventional alphanumeric index into a second index that is ambiguous in the same manner as which the user's input is ambiguated, and the corresponding documents are provided to the user as search results.

...read moreread less

Abstract: A system allows a user to submit an ambiguous search query and to receive potentially disambiguated search results. In one implementation, a search engine's conventional alphanumeric index is translated into a second index that is ambiguated in the same manner as which the user's input is ambiguated. The user's ambiguous search query is compared to this ambiguated index, and the corresponding documents are provided to the user as search results.

...read moreread less

300 citations

Journal Article•DOI•

Determination of weights of interacting criteria from a reference set

[...]

Jean-Luc Marichal¹, Marc Roubens¹•Institutions (1)

University of Liège¹

01 Aug 2000-European Journal of Operational Research

TL;DR: A model allowing to determine the weights related to interacting criteria is presented, done on the basis of the knowledge of a partial ranking over a reference set of alternatives (prototypes), a partialranking over the set of criteria, and apartial ranking over theSet of interactions between pairs of criteria.

...read moreread less

286 citations

Journal Article•DOI•

Supporting web query expansion efficiently using multi-granularity indexing and query processing

[...]

Wen-Syan Li¹, Divyakant Agrawal¹•Institutions (1)

NEC¹

01 Dec 2000

TL;DR: The notion of a multi-granularity information and processing structure is used to support efficient query expansion, which involves an indexing phase, a query processing and a ranking phase.

...read moreread less

Abstract: A method and apparatus for efficient query expansion using reduced size indices and for progressive query processing. Queries are expanded conceptually, using semantically similar and syntactically related words to those specified by the user in the query to reduce the chances of missing relevant documents. The notion of a multi-granularity information and processing structure is used to support efficient query expansion, which involves an indexing phase, a query processing and a ranking phase. In the indexing phase, semantically similar words are grouped into a concept which results in a substantial index size reduction due to the coarser granularity of semantic concepts. During query processing, the words in a query are mapped into their corresponding semantic concepts and syntactic extensions, resulting in a logical expansion of the original query. Additionally, the processing overhead is avoided. The initial query words can then be used to rank the documents in the answer set on the basis of exact, semantic and syntactic matches and also to perform progressive query processing.

...read moreread less

260 citations

Proceedings Article•DOI•

Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web

[...]

Xiaolan Zhu¹, Susan Gauch¹•Institutions (1)

University of Kansas¹

01 Jul 2000

TL;DR: The results show that incorporating quality metrics can generally improve search effectiveness in both centralized and distributed search environments.

...read moreread less

Abstract: Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms based solely on term frequency statistics Information quality is usually ignored This leads to the problem that documents are retrieved without regard to their quality We present an approach that combines similarity-based similarity ranking with quality ranking in centralized and distributed search environments Six quality metrics, including the currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, were investigated Search effectiveness was significantly improved when the currency, availability, information-to-noise ratio and page cohesiveness metrics were incorporated in centralized search The improvement seen when the availability, information-to- noise ratio, popularity, and cohesiveness metrics were incorporated in site selection was also significant Finally, incorporating the popularity metric in information fusion resulted in a significant improvement In summary, the results show that incorporating quality metrics can generally improve search effectiveness in both centralized and distributed search environments

...read moreread less

252 citations

Proceedings Article•DOI•

Does “authority” mean quality? predicting expert quality ratings of Web documents

[...]

Brian Amento¹, Loren Terveen², Will Hill²•Institutions (2)

Virginia Tech¹, AT&T²

01 Jul 2000

TL;DR: An experimental evaluation of link analysis algorithms for their potential to identify high quality items using a dataset of web documents rated for quality by human topic experts found link-based metrics did a good job of picking out high-quality items.

...read moreread less

Abstract: For many topics, the World Wide Web contains hundreds or thousands of relevant documents of widely varying quality. Users face a daunting challenge in identifying a small subset of documents worthy of their attention.Link analysis algorithms have received much interest recently, in large part for their potential to identify high quality items. We report here on an experimental evaluation of this potential.We evaluated a number of link and content-based algorithms using a dataset of web documents rated for quality by human topic experts. Link-based metrics did a good job of picking out high-quality items. Precision at 5 is about 0.75, and precision at 10 is about 0.55; this is in a dataset where 0.32 of all documents were of high quality. Surprisingly, a simple content-based metric performed nearly as well; ranking documents by the total number of pages on their containing site.

...read moreread less

244 citations

Journal Article•DOI•

A probabilistic justification for using tf.idf term weighting in information retrieval

[...]

Djoerd Hiemstra¹•Institutions (1)

Information Technology University¹

01 Aug 2000-International Journal on Digital Libraries

TL;DR: The paper shows that the new probabilistic interpretation of tf×idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking.

...read moreread less

Abstract: This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf.idf term weighting. The paper shows that the new probabilistic interpretation of tf.idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the TREC collection shows that the linguistically motivated weighting algorithm outperforms the popular BM25 weighting algorithm.

...read moreread less

209 citations

Journal Article•DOI•

Relevance ranking for one to three term queries

[...]

Charles L. A. Clarke¹, Gordon V. Cormack², Elizabeth A. Tudhope²•Institutions (2)

University of Toronto¹, University of Waterloo²

01 Jan 2000-Information Processing and Management

TL;DR: In this paper, the authors investigated the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of pages indicating several potentially relevant documents.

...read moreread less

Abstract: We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating several potentially relevant documents. Traditional ranking methods for information retrieval, based on term and inverse document frequencies, have been found to work poorly in this context. Under the cover density measure, ranking is based on term proximity and cooccurrence. Experimental comparisons show performance that compares favorably with previous work.

...read moreread less

203 citations

Patent•

Devices and methods for generating and managing a database

[...]

Sean Brady, Christopher K. Harris, Josh Dammeier, Sameer Samat

10 Jul 2000

TL;DR: In this article, an automated method of creating or updating a database of resumes and related documents is proposed. But, the method is limited to the retrieval of documents from a network of documents, where the document is the most relevant document to the subject taxonomy stored in the retrieval priority list.

...read moreread less

Abstract: An automated method of creating or updating a database of resumes and related documents, the method comprising, a) entering at least one example document that is relevant to a subject taxonomy in a retrieval priority list, if there is a plurality of example documents stored in the retrieval priority list, ranking the example documents according to the relevancy of the example documents to the subject taxonomy; b) retrieving a document from a network of documents, where the document is the most relevant document to the subject taxonomy stored in the retrieval priority list; c) harvesting information from specified fields of the document; d) classifying the information into one or more classes according to specified categories of the subject taxonomy; e) storing the information into a database; f) determining whether the information are links to other documents; g) ranking the link's according to relevancy to the subject taxonomy, and storing the links in the retrieval priority list according to the relevancy; h) terminating the method, provided the method's stop criteria have been met; and i) repeating steps b) through h), provided the method's stop criteria has not been met.

...read moreread less

Patent•

System and technique for suggesting alternate query expressions based on prior user selections and their query strings

[...]

Michael Lawrence Emens¹, Reiner Kraft¹•Institutions (1)

IBM¹

31 May 2000

TL;DR: In this article, a query manager is used to monitor user choices and selections on a search result web page and provide alternative query expressions to further narrow and enhance the user's search.

...read moreread less

Abstract: An invention for monitoring user choices and selections on a search result web page and providing alternative query expressions to further narrow and enhance the user's search. Monitoring and recording user choices and selections is achieved by a query manager. Query strings are then standardized. The search is performed on an Internet search engine, and each search result item in the result output set is then associated with a list of alternative standardized queries by an alternate query matching manager. Each search result item in the result output set that is associated with the alternate queries is then flagged. The resulting flagged list of alternative queries is displayed to the user by a page presentation manager using a graphical user interface for subsequent user selection. Upon selection of the graphical user interface for alternate query expressions, an alternate query selection manager retrieves and displays each alternate query to the user.

...read moreread less

Patent•

System and method for classifying legal concepts using legal topic scheme

[...]

James S. Wiltshire, John T. Morelock, Timothy L. Humphrey, X. Allen Lu, James M. Peck, Salahuddin Ahmed - Show less +2 more

31 Jul 2000

TL;DR: An economic, scalable machine learning system and process performed document (concept) classification with high accuracy using large topic schemes, including large hierarchical topic schemes as discussed by the authors, which includes training and concept classification processes.

...read moreread less

Abstract: An economic, scalable machine learning system and process perform document (concept) classification (210) with high accuracy using large topic schemes, including large hierarchical topic schemes. One or more highly relevant classification topics is suggested for a given document (concept) to be classified (210). The invention includes training (200) and concept classification (210) processes. The invention also provides methods that may be used as part of the training and/or concept classification processes, including: a method of scoring (303) the relevance of features in training concepts, a method of ranking concepts based on relevance score, and a method of voting on topics associated with an input concept. In a preferred embodiment, the invention is applied to the legal (case law) domain, classifying legal concepts (rules of law) according to a proprietary legal topic classification scheme (a hierarchical scheme of areas of law).

...read moreread less

Journal Article•DOI•

SearchPad: explicit capture of search context to support Web search

[...]

Krishna Bharat

01 Jun 2000

TL;DR: It was discovered that the ability to maintain search context explicitly seems to affect the way people search, and an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot is described.

...read moreread less

Abstract: Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experiment with query variations, consult multiple search engines, and gather information over many sessions. In the process they need to keep track of search context — namely useful queries and promising result links, which can be hard. We present an extension to search engines called SearchPad that makes it possible to keep track of ‘search context’ explicitly. We describe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot. Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instant start requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no added client–server communication overhead. An added benefit is that it allows search services to collect valuable relevance information about the results shown to the user. In the context of each query SearchPad can log the actions taken by the user, and in particular record the links that were considered relevant by the user in the context of the query. The service was tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. We discovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPad users looked at more search results than is typical on the Web, suggesting that availability of search context may partially compensate for non-relevant pages in the ranking.

...read moreread less

Patent•

Meta-descriptor for multimedia information

[...]

Gandhimathi Vaithilingam¹, Mohamed Abdel-Mottaleb¹•Institutions (1)

Philips¹

29 Jun 2000

TL;DR: In this article, meta-descriptors are generated for multimedia information in a repository by extracting the descriptors from the multimedia information and clustering the metadata information based on the descriptor.

...read moreread less

Abstract: Multimedia information retrieval is performed using meta-descriptors in addition to descriptors. A 'descriptor' is a representation of a feature, a 'feature' being a distinctive characteristic of multimedia information, while a 'meta-descriptor' is information about the descriptor. Meta-descriptors are generated for multimedia information in a repository (10, 12, 14, 16, 18, 20, 22, 24) by extracting the descriptors from the multimedia information (111), clustering the multimedia information based on the descriptors (112), assigning meta-descriptors to each cluster (113), and attaching the meta-descriptors to the multimedia information in the repository (114). The multimedia repository is queried by formulating a query using query-by-example (131), acquiring the descriptor/s and meta-descriptor/s for a repository multimedia item (132), generating a query descriptor/s if none of the same type has been previously generated (133, 134), comparing the descriptors of the repository multimedia item and the query multimedia item (135), and ranking and displaying the results (136, 137).

...read moreread less

Journal Article•DOI•

Applying genetic algorithms to query optimization in document retrieval

[...]

Jorng-Tzong Horng¹, Ching-Chang Yeh¹•Institutions (1)

National Central University¹

01 Sep 2000-Information Processing and Management

TL;DR: A novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights and this approach is faster and uses less memory than the PAT-tree based approach.

...read moreread less

Abstract: This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Bigram (Chen, A., He, J., Xu, L., Gey, F. C., & Meggs, J. 1997. Chinese text retrieval without using a dictionary , ACM SIGIR’97, Philadelphia, PA, USA, pp. 42–49; Yang, Y.-Y., Chang, J.-S., & Chen, K.-J. 1993), Document automatic classification and ranking , Master thesis, Department of Computer Science, National Tsing Hua University) model and PAT-tree structure (Chien, L.-F., Huang, T.-I., & Chien, M.-C. 1997 Pat-tree-based keyword extraction for Chinese information retrieval , ACM SIGIR’97, Philadelphia, PA, US, pp. 50–59) to retrieve keywords. The approach extracts bigrams from documents and uses the bigrams to construct a PAT-tree to retrieve keywords. The proposed approach can retrieve any type of keywords such as technical keywords and a person’s name. Effectiveness of the proposed approach is demonstrated by comparing how effective are the keywords found by both this approach and the PAT-tree based approach. This comparison reveals that our keyword retrieval approach is as accurate as the PAT-tree based approach, yet our approach is faster and uses less memory. The study then applies genetic algorithms to tune the weight of retrieved keywords. Moreover, several documents obtained from web sites are tested and experimental results are compared with those of other approaches, indicating that the proposed approach is highly promising for applications.

...read moreread less

Patent•

Method and system for indentifying significant topics of a document

[...]

Wacholder¹, Faye•Institutions (1)

Columbia University¹

26 Dec 2000

TL;DR: In this paper, a "domain-general" method for representing the sense of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the noun phrases by head, and ranking the noun phrase according to a significance measure.

...read moreread less

Abstract: A "domain-general" method for representing the "sense" of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the simplex noun phrases by head, and ranking the simplex noun phrases according to a significance measure to indicate the relative importance of the simplex noun phrases as significant topics of the document. Furthermore, the output can be filtered in a variety of ways, both for automatic processing and for presentation to users.

...read moreread less

Patent•

System and method for associating search results

[...]

Michael Lawrence Emens¹, Reiner Kraft¹•Institutions (1)

IBM¹

22 Sep 2000

TL;DR: In this article, a method for associating search results is presented, where an original list of search results are provided to a user in response to a first query, and the search results selected by the first user are recorded and associated with the first query.

...read moreread less

Abstract: A method for associating search results is provided. According to the method, an original list of search results is provided to a first user in response to a first query, and the search results selected by the first user are recorded and associated with the first query. Additionally, a second query that is the same as or similar to the first query is received from a second user, and an alternate list of search results is provided to the second user. The alternate list lists those search results from the original list that have been associated with the first query due to selection by a user. Also provided is a system for providing search results that includes a search engine, a query database, and a controller. The search engine provides original lists of search results in response to queries, and the query database stores the search results selected by users in response to each of the queries. The controller provides an alternate list of search results in response to another query that is the same as or similar to one of the queries, with the alternate list of search results listing those search results from the original list that have been recorded in the query database as having been previously selected in response to the one query.

...read moreread less

Proceedings Article•DOI•

A comparison of rankings produced by summarization evaluation measures

[...]

Robert L. Donaway, Kevin W. Drummey, Laura A. Mather

30 Apr 2000

TL;DR: This paper proposes using sentence-rank-based and content-based measures for evaluating extract summaries, and compares these with recall-based evaluation measures.

...read moreread less

Abstract: Summary evaluation measures produce a ranking of all possible extract summaries of a document. Recall-based evaluation measures, which depend on costly human-generated ground truth summaries, produce uncorrelated rankings when ground truth is varied. This paper proposes using sentence-rank-based and content-based measures for evaluating extract summaries, and compares these with recall-based evaluation measures. Content-based measures increase the correlation of rankings induced by synonymous ground truths, and exhibit other desirable properties.

...read moreread less

Journal Article•DOI•

Interactive query expansion: a user-based evaluation in a relevance feedback environment

[...]

Efthimis N. Efthimiadis¹•Institutions (1)

University of Washington¹

01 Sep 2000-Journal of the Association for Information Science and Technology

TL;DR: A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented, providing evidence for the effectiveness of interactive querying and highlighting the need for more research on.

...read moreread less

Abstract: A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented in this article. Data were collected from 25 searches using the INSPEC database. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results discuss issues that relate to query expansion, retrieval effectiveness, the correspondence of the on-line-to-off-line relevance judgments, and the selection of terms for query expansion by users (interactive query expansion). The main conclusions drawn from the results of the study are that: (1) one-third of the terms presented to users in a list of candidate terms for query expansion was identified by the users as potentially useful for query expansion. (2) These terms were mainly judged as either variant expressions (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationships identified between the five best terms selected by the users for query expansion and the initial query terms were that: (a) 34% of the query expansion terms have no relationship or other type of correspondence with a query term; (b) 66% of the remaining query expansion terms have a relationship to the query terms. These relationships were: narrower term (46%), broader term (3%), related term (17%). (4) The results provide evidence for the effectiveness of interactive query expansion. The initial search produced on average three highly relevant documents; the query expansion search produced on average nine further highly relevant documents. The conclusions highlight the need for more research on: interactive query expansion, the comparative evaluation of automatic vs. interactive query expansion, the study of weighted Web-based or Web-accessible retrieval systems in operational environments, and for user studies in searching ranked retrieval systems in general.

...read moreread less

Journal Article•DOI•

Query routing for Web search engines: architectures and experiments

[...]

Atsushi Sugiura¹, Oren Etzioni²•Institutions (2)

NEC¹, University of Washington²

01 Jun 2000

TL;DR: Q-Pilot is described, an automatic query routing system that attempts to dynamically route each user query to the appropriate specialized search engines, based on an off-line component that creates an approximate model of each specialized search engine's topic.

...read moreread less

Abstract: General-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to KidsHealth.org) have proliferated on the Web. Typically, topic-specific engines return far better results for `on topic' queries as compared with standard Web search engines. However, it is difficult for the casual user to identify the appropriate specialized engine for any given search. It is more natural for a user to issue queries at a particular Web site, and have these queries automatically routed to the appropriate search engine(s). This paper describes an automatic query routing system called Q-Pilot. Q-Pilot has an off-line component that creates an approximate model of each specialized search engine's topic. On line, Q-Pilot attempts to dynamically route each user query to the appropriate specialized search engines. In our experiments, Q-Pilot was able to identify the appropriate query category 70% of the time. In addition, Q-Pilot picked the best search engine for the query, as one of the top three picks out of its repository of 144 engines, about 40% of the time. This paper reports on Q-Pilot's architecture, the query expansion and clustering algorithms it relies on, and the results of our preliminary experiments.

...read moreread less

Patent•

System and method for collaborative ranking of search results employing user and group profiles

[...]

Boris Chidlovskii¹, Natalie S. Glance¹, Antonietta Grasso¹•Institutions (1)

Xerox¹

28 Apr 2000

TL;DR: In this paper, a system for ranking search results obtained from an information retrieval system includes a search pre-processor (30), a search engine (20), and a search postprocessor (40).

...read moreread less

Abstract: A system for ranking search results obtained from an information retrieval system includes a search pre-processor (30), a search engine (20) and a search post-processor (40). The search pre-processor (30) determines the context of the search query by comparing the terms in the search query with a predetermined user context profile. Preferably, the context profile is a user profile or a community profile, which includes a set of terms which have been rated by the user, community, or a recommender system. The search engine generates a search result comprising at least one item obtained from the information retrieval system. The search post-processor (40) ranks each item returned in the search result in accordance with the context of the search query.

...read moreread less

Proceedings Article•DOI•

Interactive Internet search: keyword, directory and query reformulation mechanisms compared

[...]

Peter Bruza¹, Robert McArthur¹, Simon Dennis¹•Institutions (1)

University of Queensland¹

01 Jul 2000

TL;DR: Search effectiveness when using query-based Internet search, directory-based search and phrase-based query reformulation assisted search is compared by means of a controlled, user-based experimental study.

...read moreread less

Abstract: This article compares search effectiveness when using query-based Internet search (via the Google search engine), directory-based search (via Yahoo) and phrase-based query reformulation assisted search (via the Hyperindex browser) by means of a controlled, user-based experimental study. The focus was to evaluate aspects of the search process. Cognitive load was measured using a secondary digit-monitoring task to quantify the effort of the user in various search states; independent relevance judgements were employed to gauge the quality of the documents accessed during the search process. Time was monitored in various search states. Results indicated the directory-based search does not offer increased relevance over the query-based search (with or without query formulation assistance), and also takes longer. Query reformulation does significantly improve the relevance of the documents through which the user must trawl versus standard query-based internet search. However, the improvement in document relevance comes at the cost of increased search time and increased cognitive load.

...read moreread less

Proceedings Article•

Minimizing word error rate in textual summaries of spoken language

[...]

Klaus Zechner¹, Alex Waibel¹•Institutions (1)

Carnegie Mellon University¹

29 Apr 2000

TL;DR: Comparison experiments where passages with higher speech recognizer confidence scores are favored in the ranking process show that a relative word error rate reduction can be achieved while at the same time the accuracy of the summary improves markedly.

...read moreread less

Abstract: Automatic generation of text summaries for spoken language faces the problem of containing incorrect words and passages due to speech recognition errors. This paper describes comparative experiments where passages with higher speech recognizer confidence scores are favored in the ranking process. Results show that a relative word error rate reduction of over 10% can be achieved while at the same time the accuracy of the summary improves markedly.

...read moreread less

Journal Article•DOI•

Evaluating the effectiveness of visual user interfaces for information retrieval

[...]

Alistair Sutcliffe¹, M. Ennis², J. Hu¹•Institutions (2)

University of Manchester¹, Northampton Community College²

01 Nov 2000-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: An integrated visual thesaurus and results browser to support information retrieval was designed using a task model of information searching and found that while visual user interfaces for information searching might seem to be usable, they may not actually improve performance.

...read moreread less

Abstract: An integrated visual thesaurus and results browser to support information retrieval was designed using a task model of information searching. The system provided a hierarchical thesaurus with a results cluster display representing similarity between retrieved documents and relevance ranking using a bullseye metaphor. Latent semantic indexing (LSI) was used as the retrieval engine and to calculate the similarity between documents. The design was tested with two information retrieval tasks. User behaviour, performance and attitude were recorded as well as usability problems. The system had few usability problems and users liked the visualizations, but recall performance was poor. The reasons for poor/good performance were investigated by examining user behaviour and search strategies. Better searchers used the visualizations more effectively and spent longer on the task, whereas poorer performances were attributable to poor motivation, difficulty in assessing article relevance and poor use of system visualizations. The bullseye browser display appeared to encourage limited evaluation of article relevance on titles, leading to poor performance. The bullseye display metaphor for article relevance was understood by users; however, they were confused by the concept of similarity searching expressed as visual clusters. The conclusions from the study are that while visual user interfaces for information searching might seem to be usable, they may not actually improve performance. Training and advisor facilities for effective search strategies need to be incorporated to enhance the effectiveness of visual user interfaces for information retrieval.

...read moreread less

Proceedings Article•DOI•

Link-based and content-based evidential information in a belief network model

[...]

Ilmério Silva¹, Berthier Ribeiro-Neto¹, Pável Calado¹, Edleno Silva de Moura¹, Nivio Ziviani¹ - Show less +1 more•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Jul 2000

TL;DR: An information retrieval model developed to deal with hyperlinked environments that is based on belief networks and provides a framework for combining information extracted from the content of the documents with information derived from cross-references among the documents is presented.

...read moreread less

Abstract: This work presents an information retrieval model developed to deal with hyperlinked environments. The model is based on belief networks and provides a framework for combining information extracted from the content of the documents with information derived from cross-references among the documents. The information extracted from the content of the documents is based on statistics regarding the keywords in the collection and is one of the basis for traditional information retrieval (IR) ranking algorithms. The information derived from cross-references among the documents is based on link references in a hyperlinked environment and has received increased attention lately due to the success of the Web. We discuss a set of strategies for combining these two types of sources of evidential information and experiment with them using a reference collection extracted from the Web. The results show that this type of combination can improve the retrieval performance without requiring any extra information from the users at query time. In our experiments, the improvements reach up to 59% in terms of average precision figures.

...read moreread less

Book Chapter•DOI•

Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information

[...]

Carlos Soares¹, Pavel Brazdil¹•Institutions (1)

University of Porto¹

13 Sep 2000

TL;DR: The adjusted ratio of ratios ranking method takes into account not only accuracy but also the time performance of the candidate algorithms, and indicates that on average better results are obtained with zooming than without it.

...read moreread less

Abstract: Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present a combination of techniques to address this problem. The first one, zooming, analyzes a given dataset and selects relevant (similar) datasets that were processed by the candidate algoritms in the past. This process is based on the concept of "distance", calculated on the basis of several dataset characteristics. The information about the performance of the candidate algorithms on the selected datasets is then processed by a second technique, a ranking method. Such a method uses performance information to generate advice in the form of a ranking, indicating which algorithms should be applied in which order. Here we propose the adjusted ratio of ratios ranking method. This method takes into account not only accuracy but also the time performance of the candidate algorithms. The generalization power of this ranking method is analyzed. For this purpose, an appropriate methodology is defined. The experimental results indicate that on average better results are obtained with zooming than without it.

...read moreread less

Patent•

System for managing an exchange of questions and answers through an expert answer web site

[...]

Reiner Kraft¹, Joann Ruvolo¹•Institutions (1)

IBM¹

30 Jun 2000

TL;DR: In this article, a question management system for an expert advice web site maintains a database of experts in different subject matter categories and ranking scores associated with each expert are continually updated based on the timeliness of answers provided by the experts and answer rating feedback received from the question poser.

...read moreread less

Abstract: A question management system for an expert advice web site maintains a database of experts in different subject matter categories. Ranking scores associated with each expert are continually updated based on the timeliness of answers provided by the experts and answer rating feedback received from the question poser. According to another aspect of the invention, method and computer readable medium is disclosed for carrying out the above method.

...read moreread less

Patent•

Method for computing the location and orientation of an object in three-dimensional space

[...]

Sebastien Roy¹•Institutions (1)

NEC¹

30 Aug 2000

TL;DR: In this paper, the authors present a method for computing the location and orientation of an object in 3D space, which comprises the steps of marking a plurality of feature points on a 3D model and corresponding feature points in a 2D query image; for all possible subsets of three two-dimensional feature points marked in step (a), computing the four possible three-dimensional rigid motion solutions of a set of three points in threedimensional space such that after each of the four rigid motions, under a fixed perspective projection, the three threedimensional points are mapped precisely to the three corresponding

...read moreread less

Abstract: A method for computing the location and orientation of an object in three-dimensional space. The method comprises the steps of: (a) marking a plurality of feature points on a three-dimensional model and corresponding feature points on a two-dimensional query image; (b) for all possible subsets of three two-dimensional feature points marked in step (a), computing the four possible three-dimensional rigid motion solutions of a set of three points in three-dimensional space such that after each of the four rigid motions, under a fixed perspective projection, the three three-dimensional points are mapped precisely to the three corresponding two-dimensional points; (c) for each solution found in step (b), computing an error measure derived from the errors in the projections of all three-dimensional marked points in the three-dimensional model which were not among the three points used in the solution, but which did have corresponding marked points in the two-dimensional query image; (d) ranking the solutions from step (c) based on the computed error measure; and (e) selecting the best solution based on the ranking in step (d). Also provided is a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform the method steps of the present invention and a computer program product embodied in a computer-readable medium for carrying out the methods of the present invention.

...read moreread less

Patent•

Method and system for document collection final search result by arithmetical operations between search results sorted by multiple ranking metrics

[...]

Kobayashi Mei¹, Kohichi Takeda¹•Institutions (1)

IBM¹

11 Feb 2000

TL;DR: In this article, a method and a system for sorting a specific collection of documents in various orderings, and defining a new ranking metrics by composing multiple ranking to provide a user with highly relevant search results is provided.

...read moreread less

Abstract: A method and a system for sorting a specific collection of documents in various orderings, and defining a new ranking metrics by composing multiple ranking to provide a user with highly relevant search results is provided. Collections of documents are sorted with multiple ranking metrics, a new collection of documents in higher-ranking positions of the sorted collections of documents is determined; and an arithmetical operation between the new collections of documents in higher-ranking positions is performed. A search result is determined by the documents in higher-ranking positions as result of the arithmetical operation. Final search results are acquired by performing an arithmetical operation among specific (with fixed search results) collections of documents sorted in various orderings. The most suitable arrangement of search results can be specified by interactively combining such ranking metrics.

...read moreread less

Collapse