Showing papers on "Ranking (information retrieval) published in 2001"

PDF

Open Access

Proceedings Article•DOI•

[...]

Cynthia Dwork, Ravi Kumar¹, Moni Naor², Dandapani Sivakumar¹•Institutions (2)

01 Apr 2001

TL;DR: A set of techniques for the rank aggregation problem is developed and compared to that of well-known methods, to design rank aggregation techniques that can be used to combat spam in Web searches.

...read moreread less

Abstract: We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques that can e ectively combat \spam," a serious problem in Web searches. Experiments show that our methods are simple, e cient, and e ective.

...read moreread less

1,982 citations

Proceedings Article•DOI•

Support vector machine active learning for image retrieval

[...]

Simon Tong¹, Edward Y. Chang²•Institutions (2)

Stanford University¹, University of California, Santa Barbara²

01 Oct 2001

TL;DR: This work proposes the use of a support vector machine active learning algorithm for conducting effective relevance feedback for image retrieval and achieves significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback.

...read moreread less

Abstract: Relevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user's desired output or query concept by asking the user whether certain proposed images are relevant or not. For a relevance feedback algorithm to be effective, it must grasp a user's query concept accurately and quickly, while also only asking the user to label a small number of images. We propose the use of a support vector machine active learning algorithm for conducting effective relevance feedback for image retrieval. The algorithm selects the most informative images to query a user and quickly learns a boundary that separates the images that satisfy the user's query concept from the rest of the dataset. Experimental results show that our algorithm achieves significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback.

...read moreread less

1,512 citations

Proceedings Article•DOI•

Model-based feedback in the language modeling approach to information retrieval

[...]

ChengXiang Zhai¹, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

05 Oct 2001

TL;DR: This paper proposes and evaluates two different approaches to updating a query language model based on feedback documents, one based on a generative probabilistic model of feedback documents and onebased on minimization of the KL-divergence over feedback documents.

...read moreread less

Abstract: The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the original query is usually literally expanded by adding additional terms to it. Such expansion-based feedback creates an inconsistent interpretation of the original and the expanded query. In this paper, we present a more principled approach to feedback in the language modeling approach. Specifically, we treat feedback as updating the query language model based on the extra evidence carried by the feedback documents. Such a model-based feedback strategy easily fits into an extension of the language modeling approach. We propose and evaluate two different approaches to updating a query language model based on feedback documents, one based on a generative probabilistic model of feedback documents and one based on minimization of the KL-divergence over feedback documents. Experiment results show that both approaches are effective and outperform the Rocchio feedback approach.

...read moreread less

852 citations

Journal Article•DOI•

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

[...]

John Lafferty¹, ChengXiang Zhai¹•Institutions (1)

Carnegie Mellon University¹

01 Sep 2001

TL;DR: A framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory is presented and an operational retrieval model that extends recent developments in the language modeling approach to information retrieval is suggested.

...read moreread less

Abstract: We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling approach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk minimization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC collections and compared to the basic language modeling approach and vector space models together with query expansion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data.

...read moreread less

823 citations

Proceedings Article•

Pranking with Ranking

[...]

Koby Crammer¹, Yoram Singer¹•Institutions (1)

Hebrew University of Jerusalem¹

03 Jan 2001

TL;DR: A simple and efficient online algorithm is described, its performance in the mistake bound model is analyzed, its correctness is proved, and it outperforms online algorithms for regression and classification applied to ranking.

...read moreread less

Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-predict ion rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.

...read moreread less

657 citations

Journal Article•DOI•

An information-theoretic approach to automatic query expansion

[...]

Claudio Carpineto¹, Renato De Mori², Giovanni Romano¹, Brigitte Bigi²•Institutions (2)

Fondazione Ugo Bordoni¹, University of Avignon²

01 Jan 2001-ACM Transactions on Information Systems

TL;DR: This work presents a computationally simple and theoretically justified method for assigning scores to candidate expansion terms within Rocchio's framework for query reweigthing, and discusses the effect on retrieval effectiveness of the main parameters involved in automatic query expansion.

...read moreread less

Abstract: Techniques for automatic query expansion from top retrieved documents have shown promise for improving retrieval effectiveness on large collections; however, they often rely on an empirical ground, and there is a shortage of cross-system comparisons. Using ideas from Information Theory, we present a computationally simple and theoretically justified method for assigning scores to candidate expansion terms. Such scores are used to select and weight expansion terms within Rocchio's framework for query reweigthing. We compare ranking with information-theoretic query expansion versus ranking with other query expansion techniques, showing that the former achieves better retrieval effectiveness on several performance measures. We also discuss the effect on retrieval effectiveness of the main parameters involved in automatic query expansion, such as data sparseness, query difficulty, number of selected documents, and number of selected terms, pointing out interesting relationships.

...read moreread less

404 citations

Proceedings Article•DOI•

XIRQL: a query language for information retrieval in XML documents

[...]

Norbert Fuhr, Kai Großjohann

01 Sep 2001

TL;DR: XIRQL as discussed by the authors is a query language based on the document-centric view of XML, which integrates logic-based probabilistic IR models, in combination with concepts from the database area.

...read moreread less

Abstract: Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented, that also serves as a starting point for query optimization.

...read moreread less

332 citations

Patent•

Ranking search results by reranking the results based on local inter-connectivity

[...]

Krishna Bharat¹•Institutions (1)

Google¹

30 Jan 2001

TL;DR: A re-ranking component in the search engine then refined the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents were preferred over documents that were less frequently cited within the original set.

...read moreread less

Abstract: A search engine for searching a corpus improves the relevancy of the results by refining a standard relevancy score based on the interconnectivity of the initially returned set of documents. The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.

...read moreread less

330 citations

Patent•

Method and system of ranking and clustering for document indexing and retrieval

[...]

Maureen Caudill¹, Jason Chun-Ming Tseng¹, Lei Wang¹•Institutions (1)

Science Applications International Corporation¹

18 Jan 2001

TL;DR: In this paper, the relevance of a document to a user's query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates.

...read moreread less

Abstract: A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user's query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of known documents to generate one or more document predicate structures. A comparison of each query predicate structure with each document predicate structure is performed to determine a matching degree, represented by a real number. A multilevel modifier strategy is implemented to assign different relevance values to the different parts of each predicate structure match to calculate the predicate structure's matching degree. The relevance of a document to a user's query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates. Documents are autonomously clustered using a self-organizing neural network that provides a coordinate system that makes judgments in a non-subjective fashion.

...read moreread less

321 citations

Proceedings Article•DOI•

Effective site finding using link anchor information

[...]

Nick Craswell¹, David Hawking¹, Stephen Robertson²•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, Microsoft²

01 Sep 2001

TL;DR: In a different type of experiment, ranking based on link anchor text is twice as effective asranking based on document content, even though both methods used the same BM25 formula.

...read moreread less

Abstract: Link-based ranking methods have been described in the literature and applied in commercial Web search engines. However, according to recent TREC experiments, they are no better than traditional content-based methods. We conduct a different type of experiment, in which the task is to find the main entry point of a specific Web site. In our experiments, ranking based on link anchor text is twice as effective as ranking based on document content, even though both methods used the same BM25 formula. We obtained these results using two sets of 100 queries on a 18.5 million document set and another set of 100 on a 0.4 million document set. This site finding effectiveness begins to explain why many search engines have adopted link methods. It also opens a rich new area for effectiveness improvement, where traditional methods fail.

...read moreread less

320 citations

Patent•

Domain specific knowledge-based metasearch system and methods of using

[...]

Robert Kincaid¹, Simon Handley¹, Aditya Vailaya¹, Parvathi Chundi¹•Institutions (1)

Agilent Technologies¹

19 Dec 2001

TL;DR: In this article, a system and method for performing domain-specific knowledge based metasearches is presented for accessing a searching text-based documents using generic search engines while simultaneously being able to access publication based databases and sequence databases as well as in-house proprietary databases and any database capable of being interfaced with a web interface so as to produce search results in text format.

...read moreread less

Abstract: A system and method for performing domain-specific knowledge based metasearches. A metasearch engine is provided for accessing a searching text-based documents using generic search engines while simultaneously being able to access publication based databases and sequence databases as well as in-house proprietary databases and any database capable of being interfaced with a web interface so as to produce search results in text format. A data mining module is also provided for organizing raw data obtained by unsupervised clustering, simple relevance ranking, and categorization, all of which are done independently of one another. The system is capable of storing previous search data for use in query refinement or subsequent searches based upon the stored data. A search results collection browser may be provided for analyzing current browsing patterns of the user for developing weighting factors to be used in ordering the results of future searches.

...read moreread less

Proceedings Article•DOI•

Ranking retrieval systems without relevance judgments

[...]

Ian Soboroff¹, Charles Nicholas¹, Patrick Cahan¹•Institutions (1)

University of Maryland, Baltimore County¹

01 Sep 2001

TL;DR: The initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics are proposed, which are referred to aspseudo-relevance judgments.

...read moreread less

Abstract: The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to aspseudo-relevance judgments.Rankings of systems with our methodology correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.

...read moreread less

Book Chapter•DOI•

Synthesis of Linear Ranking Functions

[...]

Michael Colón¹, Henny B. Sipma¹•Institutions (1)

Stanford University¹

02 Apr 2001

TL;DR: An algorithm is presented to synthesize linear ranking functions that can establish termination of program cycles and the representation of systems of linear inequalities and sets of linear expressions as polyhedral cones allows this search to be reduced to the computation of polars, intersections and projections ofpolyhedral cones.

...read moreread less

Abstract: Deductive verification of progress properties relies on finding ranking functions to prove termination of program cycles. We present an algorithm to synthesize linear ranking functions that can establish such termination. Fundamental to our approach is the representation of systems of linear inequalities and sets of linear expressions as polyhedral cones. This representation allows us to reduce the search for linear ranking functions to the computation of polars, intersections and projections of polyhedral cones, problems which have well-known solutions.

...read moreread less

Proceedings Article•DOI•

Modeling score distributions for combining the outputs of search engines

[...]

R. Manmatha, Toni M. Rath, Fangfang Feng

01 Sep 2001

TL;DR: It is shown empirically that the score distributions of a number of text search engines on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution forThe set of relevant documents.

...read moreread less

Abstract: In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for not only probabilistic search engines like INQUERY but also vector space search engines like SMART for English. We have also used this model to fit the output of other search engines like LSI search engines and search engines indexing other languages like Chinese.It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all 'good' text search engines operating on any language have similar characteristics.This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques.

...read moreread less

Proceedings Article•DOI•

Vector-space ranking with effective early termination

[...]

Vo Anh, Owen de Kretser, Alistair Moffat

01 Sep 2001

TL;DR: A new inverted file structure using quantized weights that provides superior retrieval effectiveness compared to conventional inverted file structures when early termination heuristics are employed, and so provide a better cost/performance compromise than previous inverted file organisations.

...read moreread less

Abstract: Considerable research effort has been invested in improving the effectiveness of information retrieval systems. Techniques such as relevance feedback, thesaural expansion, and pivoting all provide better quality responses to queries when tested in standard evaluation frameworks. But such enhancements can add to the cost of evaluating queries. In this paper we consider the pragmatic issue of how to improve the cost-effectiveness of searching. We describe a new inverted file structure using quantized weights that provides superior retrieval effectiveness compared to conventional inverted file structures when early termination heuristics are employed. That is, we are able to reach similar effectiveness levels with less computational cost, and so provide a better cost/performance compromise than previous inverted file organisations.

...read moreread less

Patent•

Method and apparatus for obtaining consumer product preferences through product selection and evaluation

[...]

Jonas Ulenas, Valdas C. Duoba

23 Aug 2001

TL;DR: In this article, a method and system for obtaining consumer preferences over a communication network from consumers is presented, where the system searches the product database for products or services based on consumer's search criteria.

...read moreread less

Abstract: A method and system for obtaining consumer preferences over a communication network from consumers. The system searches the product database for products or services based on consumer's search criteria. The system displays the products or services and/or advertisements related to the consumer's search criteria in accordance with the ranking parameter(s) specified by the user. The consumer's preferences, i.e., the search criteria and the ranking parameters(s), are stored in the database for future references, e.g., determine consumer trends, etc.

...read moreread less

Book Chapter•DOI•

Geographical Information Retrieval with Ontologies of Place

[...]

Christopher B. Jones¹, Harith Alani², Douglas Tudhope³•Institutions (3)

Cardiff University¹, University of Southampton², University of South Wales³

19 Sep 2001

TL;DR: An ontology of place is presented that combines limited coordinate data with qualitative spatial relationships between places and has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology.

...read moreread less

Abstract: Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness as well as semantic closeness with respect to the topic of interest. Here we present an ontology of place that combines limited coordinate data with qualitative spatial relationships between places. This parsimonious model of place is intended to suppon information retrieval tasks that may be global in scope. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This can be combined with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects.

...read moreread less

Proceedings Article•DOI•

Applying summarization techniques for term selection in relevance feedback

[...]

Adenike M. Lam-Adesina¹, Gareth J. F. Jones¹•Institutions (1)

University of Exeter¹

01 Sep 2001

TL;DR: Experimental results show that query-expansion using document summaries can be considerably more effective than using full-document expansion and a novel approach to term-selection that separates the choice of relevant documents from the selection of a pool of potential expansion terms is presented.

...read moreread less

Abstract: Query-expansion is an effective Relevance Feedback technique for improving performance in Information Retrieval. In general query-expansion methods select terms from the complete contents of relevant documents. One problem with this approach is that expansion terms unrelated to document relevance can be introduced into the modified query due to their presence in the relevant documents and distribution in the document collection. Motivated by the hypothesis that query-expansion terms should only be sought from the most relevant areas of a document, this investigation explores the use of document summaries in query-expansion. The investigation explores the use of both context-independent standard summaries and query-biased summaries. Experimental results using the Okapi BM25 probabilistic retrieval model with the TREC-8 ad hoc retrieval task show that query-expansion using document summaries can be considerably more effective than using full-document expansion. The paper also presents a novel approach to term-selection that separates the choice of relevant documents from the selection of a pool of potential expansion terms. Again, this technique is shown to be more effective that standard methods.

...read moreread less

Patent•

Ontological concept-based, user-centric text summarization

[...]

Chung Hwang, Bradford W. Miller, Marek Rusinkiewicz

29 Jun 2001

TL;DR: In this article, a method and system for constructing a text summarization is presented, where a user profile indicative of a user's interests is defined in terms of the ontology concepts and a document's relevance to the user is determined based upon the user profile.

...read moreread less

Abstract: A method and system for constructing a text summarization. At least one domain ontology that includes a set of concepts is selected. A user profile indicative of a user's interests is defined in terms of the ontology concepts. A document's relevance to the user is determined based upon the user profile. If the document is relevant, at least a portion of the ontology is used to extract concepts from the document. The degree of match between the extracted concepts and the user profile concepts is determined and the document text summary is generated if the degree of match exceeds a predetermined threshold. Generating the summary may include selecting sentences based on the concepts in the user profile, ranking the selected sentences by relevance to the user profile, selecting sentences for inclusion in the document text summary based upon the ranking, and merging the selected sentences into the document text summary.

...read moreread less

Patent•

Weighted preference data search system and method

[...]

Fadi Victor Micaelian, Richard Sawey, Emil Mario Scoffone, David Brandon Criswell

13 Apr 2001

TL;DR: A weighted preference data search engine as discussed by the authors uses the weighted preference information to search a data source and to provide an ordered result list based upon the weighting information, including a plurality of search criteria and a corresponding plurality of weights indicating the relative importance of the search criteria.

...read moreread less

Abstract: A search engine for databases, data streams, and other data sources allows user preferences as to the relative importance of search criteria to be used to rank the output of the search engine. A weighted preference generator generates weighted preference information including at least a plurality of weights corresponding to a plurality of search criteria. A weighted preference data search engines uses the weighted preference information to search a data source and to provide an ordered result list based upon the weighted preference information. A method for weighted preference data searching includes determining weighted preference information including a plurality of search criteria and a corresponding plurality of weights signifying the relative importance of the search criteria, and querying a data source and ranking the results based upon the weighted preference information. In addition to allowing client input of the relative importance of various search criteria, the system and method also preferably include the ability to provide a subjective ordering for at least some of the search criteria.

...read moreread less

Journal Article•DOI•

Effective ranking with arbitrary passages

[...]

Marcin Kaszkiel¹, Justin Zobel¹•Institutions (1)

RMIT University¹

15 Feb 2001-Journal of the Association for Information Science and Technology

TL;DR: A new type of passage is introduced, overlapping fragments of either fixed or variable length, and it is shown that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents.

...read moreread less

Abstract: Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.

...read moreread less

Proceedings Article•DOI•

Rank-preserving two-level caching for scalable search engines

[...]

Patricia Correia Saraiva¹, Edleno Silva de Moura², Nivio Ziviani¹, Wagner Meira¹, Rodrigo Fonseca, Berthier Ribeiro-Neto - Show less +2 more•Institutions (2)

Universidade Federal de Minas Gerais¹, Akwan Information Technologies²

01 Sep 2001

TL;DR: Experimental results show that the two-level cache is superior, and that it allows increasing the maximum number of queries processed per second by a factor of three, while preserving the response time.

...read moreread less

Abstract: We present an e ective caching scheme that reduces the computing and I/O requirements of a Web search engine without altering its ranking characteristics. The novelty is a two-level caching scheme that simultaneously combines cached query results and cached inverted lists on a real case search engine. A set of log queries are used to measure and compare the performance and the scalability of the search engine with no cache, with the cache for query results, with the cache for inverted lists, and with the two-level cache. Experimental results show that the two-level cache is superior, and that it allows increasing the maximum number of queries processed per second by a factor of three, while preserving the response time. These results are new, have not been reported before, and demonstrate the importance of advanced caching schemes for real case search engines.

...read moreread less

Book Chapter•DOI•

Similarity of Cardinal Directions

[...]

Roop K. Goyal¹, Max J. Egenhofer²•Institutions (2)

Esri¹, University of Maine²

12 Jul 2001

TL;DR: A computational model is developed to determine the directional similarity between extended spatial objects, which forms a foundation for meaningful spatial similarity operators and confirms the cognitive plausibility of the similarity model.

...read moreread less

Abstract: Like people who casually assess similarity between spatial scenes in their routine activities, users of pictorial databases are often interested in retrieving scenes that are similar to a given scene, and ranking them according to degrees of their match. For example, a town architect would like to query a database for the towns that have a landscape similar to the landscape of the site of a planned town. In this paper, we develop a computational model to determine the directional similarity between extended spatial objects, which forms a foundation for meaningful spatial similarity operators. The model is based on the direction-relation matrix. We derive how the similarity assessment of two direction-relation matrices corresponds to determining the least cost for transforming one direction-relation matrix into another. Using the transportation algorithm, the cost can be determined efficiently for pairs of arbitrary direction-relation matrices. The similarity values are evaluated empirically with several types of movements that create increasingly less similar direction relations. The tests confirm the cognitive plausibility of the similarity model.

...read moreread less

Proceedings Article•DOI•

Evaluating a probabilistic model for cross-lingual information retrieval

[...]

Jinxi Xu¹, Ralph Weischedel¹, Chanh Nguyen¹•Institutions (1)

BBN Technologies¹

01 Sep 2001

TL;DR: A probabilistic cross-lingual retrieval system that uses a generative model to estimate the probability that a document in one language is relevant, given a query in another language, which achieves better retrieval results but requires more computation than the structural query translation technique.

...read moreread less

Abstract: This work proposes and evaluates a probabilistic cross-lingual retrieval system. The system uses a generative model to estimate the probability that a document in one language is relevant, given a query in another language. An important component of the model is translation probabilities from terms in documents to terms in a query. Our approach is evaluated when 1) the only resource is a manually generated bilingual word list, 2) the only resource is a parallel corpus, and 3) both resources are combined in a mixture model. The combined resources produce about 90% of monolingual performance in retrieving Chinese documents. For Spanish the system achieves 85% of monolingual performance using only a pseudo-parallel Spanish-English corpus. Retrieval results are comparable with those of the structural query translation technique (Pirkola, 1998) when bilingual lexicons are used for query translation. When parallel texts in addition to conventional lexicons are used, it achieves better retrieval results but requires more computation than the structural query translation technique. It also produces slightly better results than using a machine translation system for CLIR, but the improvement over the MT system is not significant.

...read moreread less

Journal Article•DOI•

Case base querying for travel planning recommendation.

[...]

Francesco Ricci, Hannes Werthner

01 Mar 2001-Information Technology & Tourism

TL;DR: The general architecture and function of an intelligent recommendation system aimed at supporting a leisure traveller in the task of selecting a tourist destination, bundling a set of products and composing a plan for the travel is described.

...read moreread less

Abstract: This paper describes the general architecture and function of an intelligent recommendation system aimed at supporting a leisure traveller in the task of selecting a tourist destination, bundling a set of products and composing a plan for the travel. The system enables the user to identify his own destination and to personalize the travel by aggregating elementary items (additional locations to visit, services and activities). Case-Based Reasoning techniques enable the user to browse a repository of past travels and make possible the ranking of the elementary items included in a recommendation when these are selected from a catalogue. The system integrates data and information originating from external, already existent, tourist portals exploiting an XML-based mediator architecture, data mapping techniques, similarity-based retrieval and online analytical processing.

...read moreread less

Patent•

A system and method for analyzing a query and generating results and related questions

[...]

Mohammed S. Anwar¹•Institutions (1)

AmeriCorps VISTA¹

16 Mar 2001

TL;DR: A query information retrieval content enhancing system and method using the system disclosed that takes a user query and generates not only results corresponding to the exact query, but also results that relate to the same query as discussed by the authors.

...read moreread less

Abstract: A query information retrieval content enhancing system and method using the system disclosed that takes a user query and generates not only results corresponding to the exact query, but also generates results that relate to the exact query. The related results are generated by identifying query keywords and connectors and determining related keywords and/or connectors. The original keywords and connectors and the relates keywords and connectors are then submitted to data mining routines that generate the related results. The normal results and related results are then made available to the user through an interface so that the user can review, analyze and manipulate the results.

...read moreread less

Journal Article•DOI•

LiveBench‐1: Continuous benchmarking of protein structure prediction servers

[...]

Janusz M. Bujnicki¹, Arne Elofsson², Daniel Fischer³, Leszek Rychlewski¹•Institutions (3)

International Institute of Minnesota¹, Stockholm University², Ben-Gurion University of the Negev³

01 Feb 2001-Protein Science

TL;DR: A novel, continuous approach aimed at the large‐scale assessment of the performance of available fold‐recognition servers, which found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries.

...read moreread less

Abstract: We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets—where standard methods such as PSI-BLAST fail—the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An “ideally combined consensus” prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.

...read moreread less

Patent•

Systems and methods for document searching and organizing

[...]

Bernard J. Burdick, William H. Schoendorf, Ryan Thomas, Stan Heckman, Theodore Hall, Scott Bradley - Show less +2 more

04 Oct 2001

TL;DR: In this article, a document organizer processor may analyze the content of documents such as web pages and text documents, downloaded from a computer network, such as the Internet or an intranet, in response to a user's search query.

...read moreread less

Abstract: Systems and methods interactive document search, retrieval, categorization, and summarization are provided. A document organizer processor may analyze the content of documents, such as web pages and text documents, downloaded from a computer network, such as the Internet or an intranet, in response to a user's search query. After receiving a search query from a user, the processor may locate documents related to the query, parse words in the documents into a word set, filter out unnecessary words, group the documents into categories, provide labels for the categories, construct summaries of the documents in each category, determine if any additional words or phases are to be recommended, present the labels and summaries to the user, and enable the user to iteratively refine the search.

...read moreread less

Patent•

Apparatus and method for adaptively ranking search results

[...]

Jianchang Mao, Mani Abrol¹, Rajat Mukherjee¹, Michel Tourn¹, Prabhakar Raghavan¹ - Show less +1 more•Institutions (1)

Hewlett-Packard¹

08 May 2001

TL;DR: In this article, a similarity score is calculated for the query utilizing a feature vector that characterizes attributes and query words associated with the document, and a rank value is assigned to the document based upon the relevance score and the similarity score.

...read moreread less

Abstract: A method of ranking search results includes producing a relevance score for a document in view of a query. A similarity score is calculated for the query utilizing a feature vector that characterizes attributes and query words associated with the document. A rank value is assigned to the document based upon the relevance score and the similarity score.

...read moreread less

Patent•

System and method for search and recommendation based on usage mining

[...]

Omar Alonso¹, Atul Kumar¹•Institutions (1)

Business International Corporation¹

22 Aug 2001

TL;DR: In this paper, a method, system, and computer program product for performing searching that generates improved queries, retrieves meaningful and relevant information, and presents the retrieved information to the user in a useful and comprehensive manner is described.

...read moreread less

Abstract: A method, system, and computer program product for performing searching that generates improved queries, retrieves meaningful and relevant information, and presents the retrieved information to the user in a useful and comprehensive manner is described. The method of searching comprises the steps of: receiving from a user a search query requesting information, retrieving at least one recommendation relating to the search query, generating an expanded query based on the received query, performing a search using the expanded query to retrieve documents, and generating themes relating to the retrieved documents. The at least one recommendation relating to the search query is retrieved from a recommendation database. The recommendation database is generated by performing the steps of: performing data mining using users search query logs, user search patterns, and user profile information to generate a plurality of recommendations relating to search query strings, generating a data structure including the recommendations relating to search query strings, and generating a text index based on information in the data structure.

...read moreread less

Collapse