Relevant document distribution estimation method for resource selection
read more
Citations
Sources of evidence for vertical selection
Federated Search
Mining Query Logs: Turning Search Usage Data into Knowledge
Retrieval and feedback models for blog feed search
Evaluating different methods of estimating retrieval quality for resource selection
References
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
Some Simple Effective Approximations to the 2-Poisson Model
Distributed information retrieval
STARTS: Stanford proposal for Internet meta-searching
TREC and TIPSTER experiments with INQUERY
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the future works in "Relevant document distribution estimation method for resource selection" ?
It is likely that training data can be used to automatically determine testbed-specific parameter settings, improving both accuracy and generality, but this remains a topic for future research.
Q3. What is the method of acquiring resource descriptions in uncooperative environments?
In uncooperative environments perhaps the best method of acquiring resource descriptions is query-based sampling [1], in which a resource description is constructed by sampling database contents via the normal process of running queries and retrieving documents.
Q4. How many interactions is required to obtain one sample?
If the search engine requires that the list be scanned sequentially from the beginning, in pages containing 20 document ids each, then 25 interactions is required, on average, to obtain one sample.
Q5. How many document ids and scores were returned by each database?
100 document ids and scores were returned by from eachselected database, which the result-merging algorithm compiled into a final ranked list of documents.
Q6. What is the way to rank a database?
Although the capture-recapture algorithm can be better than the sample-resample algorithm when databases are small, its success depends on a stronger assumption, i.e., that the search engine supports direct access to specific segments of a ranked list.
Q7. How many queries were sent to the database?
Each of the capture-recapture variants was allowed to send about 385 queries to the database; document ids gotten in the first half of the queries were the first sample; document ids gotten in the second half of the queries were the second sample.
Q8. How many interactions is required to obtain a sample of 1,000 ids?
If the authors assume the search engine returns document ids in pages of 20, then 50 interactions is required to obtain a sample of 1,000 ids.
Q9. How many samples would be able to be captured?
If the search engine requires that ranked-list results be accessed sequentially in blocks of 20 ids, the capture-recapture algorithm would only be able to obtain about 15 samples, which is too few for an accurate estimate.
Q10. What is the skewed distribution of the databases?
A kmeans clustering algorithm was used to organize the databases by topic [14], so the databases are homogenous and the word distributions are very skewed.
Q11. What is the common use of the Precision at Recall points metric?
Precision at specified document ranks is often used, particularly for interactive retrieval where someone may only look at the first several screens of results.
Q12. How many document ids does the capture-recapture algorithm acquire?
it acquires 20 document ids, so the authors examined the effects on accuracy of using just 1 or all 20 of the document ids returned by the search engine.
Q13. What is the use of a centralized sample database?
It also demonstrates another use for a centralized sample database, extending its use as a surrogate for the (unavailable) centralized complete database.