scispace - formally typeset

Topic

Query expansion

About: Query expansion is a(n) research topic. Over the lifetime, 17538 publication(s) have been published within this topic receiving 452719 citation(s).
Papers
More filters

Proceedings ArticleDOI
Brian Babcock1, Shivnath Babu1, Mayur Datar1, Rajeev Motwani1  +1 moreInstitutions (1)
03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

2,872 citations


Journal ArticleDOI
Jay Ponte1, W. Bruce Croft1Institutions (1)
01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.
Abstract: In today's world, there is no shortage of information. However, for a specific information need, only a small subset of all of the available information will be useful. The field of information retrieval (IR) is the study of methods to provide users with that small subset of information relevant to their needs and to do so in a timely fashion. Information sources can take many forms, but this thesis will focus on text based information systems and investigate problems germane to the retrieval of written natural language documents. Central to these problems is the notion of "topic." In other words, what are documents about? However, topics depend on the semantics of documents and retrieval systems are not endowed with knowledge of the semantics of natural language. The approach taken in this thesis will be to make use of probabilistic language models to investigate text based information retrieval and related problems. One such problem is the prediction of topic shifts in text, the topic segmentation problem. It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection. Two complementary sets of features are studied individually and then combined into a single language model. The language modeling approach allows this problem to be approached in a principled way without complex semantic modeling. Next, the problem of document retrieval in response to a user query will be investigated. Models of document indexing and document retrieval have been extensively studied over the past three decades. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. Much of the reason for this is that the indexing component requires inferences as to the semantics of documents. Instead, an approach to retrieval based on probabilistic language modeling will be presented. Models are estimated for each document individually. The approach to modeling is non-parametric and integrates the entire retrieval process into a single model. One advantage of this approach is that collection statistics, which are used heuristically for the assignment of concept probabilities in other probabilistic models, are used directly in the estimation of language model probabilities in this approach. The language modeling approach has been implemented and tested empirically and performs very well on standard test collections and query sets. In order to improve retrieval effectiveness, IR systems use additional techniques such as relevance feedback, unsupervised query expansion and structured queries. These and other techniques are discussed in terms of the language modeling approach and empirical results are given for several of the techniques developed. These results provide further proof of concept for the use of language models for retrieval tasks.

2,671 citations


Proceedings Article
01 Jan 1994
TL;DR: Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases, and expansion using terms from the top documents retrieved by a pilot search on topic terms was used.
Abstract: City submitted two runs each for the automatic ad hoc, very large collection track, automatic routing and Chinese track; and took part in the interactive and filtering tracks. The method used was : expansion using terms from the top documents retrieved by a pilot search on topic terms. Additional runs seem to show that we would have done better without expansion. Twor runs using the method of city96al were also submitted for the Very Large Collection track. The training database and its relevant documents were partitioned into three parts. Working on a pool of terms extracted from the relevant documents for one partition, an iterative procedure added or removed terms and/or varied their weights. After each change in query content or term weights a score was calculated by using the current query to search a second protion of the training database and evaluating the results against the corresponding set of relevant documents. Methods were compared by evaluating queries predictively against the third training partition. Queries from different methods were then merged and the results evaluated in the same way. Two runs were submitted, one based on character searching and the other on words or phrases. Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases

2,451 citations


Proceedings ArticleDOI
Carlton Wayne Niblack1, R. Barber1, Will Equitz1, Myron D. Flickner1  +5 moreInstitutions (1)
TL;DR: The main algorithms for color texture, shape and sketch query that are presented, show example query results, and discuss future directions are presented.
Abstract: In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

2,113 citations


Proceedings ArticleDOI
John R. Smith1, Shih-Fu Chang1Institutions (1)
01 Feb 1997
TL;DR: The VisualSEEk system is novel in that the user forms the queries by diagramming spatial arrangements of color regions by utilizing color information, region sizes and absolute and relative spatial locations.
Abstract: We describe a highly functional prototype system for searching by visual features in an image database. The VisualSEEk system is novel in that the user forms the queries by diagramming spatial arrangements of color regions. The system nds the images that contain the most similar arrangements of similar regions. Prior to the queries, the system automatically extracts and indexes salient color regions from the images. By utilizing e cient indexing techniques for color information, region sizes and absolute and relative spatial locations, a wide variety of complex joint color/spatial queries may be computed.

2,053 citations


Network Information
Related Topics (5)
Web page

50.3K papers, 975.1K citations

89% related
Ontology (information science)

57K papers, 869.1K citations

88% related
Graph (abstract data type)

69.9K papers, 1.2M citations

86% related
Web service

57.6K papers, 989K citations

86% related
Supervised learning

20.8K papers, 710.5K citations

85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021108
2020122
2019141
2018197
2017497