scispace - formally typeset
Search or ask a question

Showing papers by "Gordon V. Cormack published in 2011"


Journal ArticleDOI
TL;DR: It is shown that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the ClueWeb09 dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision as well as rank measures of nearly all submitted runs.
Abstract: The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam--pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the "spamminess" of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of "honeypot" queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering--from among the worst to among the best.

324 citations


Journal Article
TL;DR: Maura R. Grossman is counsel at Wachtell, Lipton, Rosen & Katz and co-chair of the E-Discovery Working Group advising the New York State Unified Court System.
Abstract: * Maura R. Grossman is counsel at Wachtell, Lipton, Rosen & Katz. She is co-chair of the E-Discovery Working Group advising the New York State Unified Court System, and a member of the Discovery Subcommittee of the Attorney Advisory Group to the Judicial Improvements Committee of the U.S. District Court for the Southern District of New York. Ms. Grossman is a coordinator of the Legal Track of the National Institute of Standards and Technology’s Text Retrieval Conference (“TREC”), and an adjunct faculty member at Rutgers School of Law–Newark and Pace Law School. Ms. Grossman holds a J.D. from Georgetown University Law Center, and an M.A. and Ph.D. in Clinical/School Psychology from Adelphi University. The views expressed herein are solely those of the Author and should not be attributed to her firm or its clients.

95 citations



Proceedings Article
01 Jan 2011
TL;DR: Given a topic with title, narrative and description, this work builds a language model for the topic by applying the title of the topic as a query, and solves the tweet notication scenario as a multiple-choice secretary problem.
Abstract: For the first year of the Microblog Track, a real time ad hoc search task was determined to be a suitable first task. The goal of the track is to return the most recent but also relevant tweets for a user’s query. Participating runs will be officially scored using precision at 30. Other experimental scoring measures will be evaluated in parallel to the official measure. As this was the first year for the Microblog Track, our primary goal was to create a baseline method and then attempt to improve upon the baseline. Since the only task was to perform a real time ad hoc search for the track, we decided that the task would be best suited by using a traditional search methodology. In doing so we used the Wumpus Search Engine, which was developed by Stefan Buttcher while at the University of Waterloo.

26 citations


Proceedings Article
01 Jan 2011
TL;DR: The TREC 2011 Legal Track consisted of a single task: the learning task, which captured elements of both the TREC 2010 learning and interactive tasks, and required participants to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics.
Abstract: The TREC 2011 Legal Track consisted of a single task: the learning task, which captured elements of both the TREC 2010 learning and interactive tasks. Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability. Participants were permitted to request up to 1,000 responsiveness determinations from a Topic Authority for each topic. Participants elected either to use only these responsiveness determinations in preparing automatic submissions, or to augment these determinations with their own manual review in preparing technologyassisted submissions. We provide an overview of the task and a summary of the results. More detailed results are available in the Appendix to the TREC 2011 Proceedings.

22 citations


01 Jan 2011
TL;DR: This study provides a qualitative analysis of the cases of disagreement on responsiveness determinations rendered during the course of constructing a gold standard, which could indicate that responsiveness is ill-defined, or indicate that reviewers are sometimes mistaken in their assessments.
Abstract: In responding to a request for production in civil litigation, the goal is generally to produce, as nearly as practicable, all and only the non-privileged documents that are responsive to the request.2 Recall – the proportion of responsive documents that are produced – and precision – the proportion of produced documents that are responsive – quantify how nearly all of and only such responsive, non-privileged documents are produced [2, pp 67-68]. The traditional approach to measuring recall and precision consists of constructing a gold standard that identifies the set of documents that are responsive to the request. If the gold standard is complete and correct, it is a simple matter to compute recall and precision by comparing the production set to the gold standard. Construction of the gold standard typically relies on human assessment, where a reviewer or team of reviewers examines each document, and codes it as responsive or not [2, pp 73-75]. It is well known that any two reviewers will often disagree as to the responsiveness of particular documents; that is, one will code a document as responsive, while the other will code the same document as non-responsive [1, 3, 5, 8, 9, 10]. Does such disagreement indicate that responsiveness is ill-defined, or does it indicate that reviewers are sometimes mistaken in their assessments? If responsiveness is ill-defined, can there be such a thing as an accurate gold standard, or accurate measurements of recall and precision? Answering this question in the negative might call into question the ability to measure, and thus certify, the accuracy of a response to a production request. If, on the other hand, responsiveness is well-defined, might there be ways to measure and thereby correct for reviewer error, yielding a better gold standard, and therefore, more accurate measurements of recall and precision? This study provides a qualitative analysis of the cases of disagreement on responsiveness determinations rendered during the course of constructing the gold standard

10 citations