Home
/
Authors
/
Gordon V. Cormack

Author

Gordon V. Cormack

Other affiliations: University of Manitoba, McGill University

Bio: Gordon V. Cormack is an academic researcher from University of Waterloo. The author has contributed to research in topics: Relevance (information retrieval) & Relevance feedback. The author has an hindex of 41, co-authored 152 publications receiving 7925 citations. Previous affiliations of Gordon V. Cormack include University of Manitoba & McGill University.

Papers published on a yearly basis

2022
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1992
1991
1990
1989
1988
1987
1985
1984
1983
1981
1980

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Novelty and diversity in information retrieval evaluation

[...]

Charles L. A. Clarke¹, Maheedhar Kolla¹, Gordon V. Cormack¹, Olga Vechtomova¹, Azin Ashkan¹, Stefan Büttcher¹, Ian MacKinnon¹ - Show less +3 more•Institutions (1)

University of Waterloo¹

20 Jul 2008

TL;DR: This paper develops a framework for evaluation that systematically rewards novelty and diversity into a specific evaluation measure, based on cumulative gain, and demonstrates the feasibility of this approach using a test collection based on the TREC question answering track.

...read moreread less

Abstract: Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper, we present a framework for evaluation that systematically rewards novelty and diversity. We develop this framework into a specific evaluation measure, based on cumulative gain. We demonstrate the feasibility of our approach using a test collection based on the TREC question answering track.

...read moreread less

988 citations

Book•

Information Retrieval: Implementing and Evaluating Search Engines

[...]

Stefan Büttcher, Charles L. A. Clarke¹, Gordon V. Cormack•Institutions (1)

Massachusetts Institute of Technology¹

23 Jul 2010

TL;DR: Information Retrieval offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation, and is a valuable reference for professionals in computer science, computer engineering, and software engineering.

...read moreread less

Abstract: Information retrieval is the foundation for modern search engines. This text offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpusa multiuser open-source information retrieval system developed by one of the authors and available onlineprovides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. After an introduction to the basics of information retrieval, the text covers three major topic areasindexing, retrieval, and evaluationin self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating such specific applications as parallel search engines, Web search, and XML retrieval. End-of-chapter references point to further reading; exercises range from pencil and paper problems to substantial programming projects. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

...read moreread less

523 citations

Proceedings Article•DOI•

Reciprocal rank fusion outperforms condorcet and individual rank learning methods

[...]

Gordon V. Cormack¹, Charles L. A. Clarke¹, Stefan Buettcher²•Institutions (2)

University of Waterloo¹, Google²

19 Jul 2009

TL;DR: Reciprocal Rank Fusion is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method.

...read moreread less

Abstract: Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method

...read moreread less

395 citations

Journal Article•DOI•

Efficient and effective spam filtering and re-ranking for large web datasets

[...]

Gordon V. Cormack¹, Mark D. Smucker¹, Charles L. A. Clarke¹•Institutions (1)

University of Waterloo¹

01 Oct 2011-Information Retrieval

TL;DR: It is shown that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the ClueWeb09 dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision as well as rank measures of nearly all submitted runs.

...read moreread less

Abstract: The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam--pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the "spamminess" of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of "honeypot" queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering--from among the worst to among the best.

...read moreread less

324 citations

Proceedings Article•DOI•

Exploiting redundancy in question answering

[...]

Charles L. A. Clarke¹, Gordon V. Cormack¹, Thomas R. Lynam¹•Institutions (1)

University of Waterloo¹

01 Sep 2001

TL;DR: A method for arbitrary passage retrieval is applied to answer brief factual questions of the form typically asked in trivia quizzes and television game shows and it is demonstrated that answer redundancy can be used to address the second half.

...read moreread less

Abstract: Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote The Wind in the Willows?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem reduces to finding potential answers in large volumes of data and validating their accuracy. We apply a method for arbitrary passage retrieval to the first half of the problem and demonstrate that answer redundancy can be used to address the second half. The success of our approach depends on the idea that the volume of available Web data is large enough to supply the answer to most factual questions multiple times and in multiple contexts. A query is generated from a question and this query is used to select short passages that may contain the answer from a large collection of Web data. These passages are analyzed to identify candidate answers. The frequency of these candidates within the passages is used to ``vote'' for the most likely answer. The approach is experimentally tested on questions taken from the TREC-9 question-answering test collection. As an additional demonstration, the approach is extended to answer multiple choice trivia questions of the form typically asked in trivia quizzes and television game shows.

...read moreread less

262 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

物件導向軟體之架構(Object-Oriented Software Construction)探討

[...]

簡聰富

01 Dec 1989

4,898 citations

Journal Article•DOI•

Arithmetic coding for data compression

[...]

Ian H. Witten¹, Radford M. Neal¹, John G. Cleary¹•Institutions (1)

University of Calgary¹

01 Jun 1987-Communications of The ACM

TL;DR: The state of the art in data compression is arithmetic coding, not the better-known Huffman method, which gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

...read moreread less

Abstract: The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

...read moreread less

3,188 citations

Book•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

27 Jun 2009

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

...read moreread less

Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

...read moreread less

2,515 citations

Book•

Information Retrieval: Data Structures and Algorithms

[...]

William B. Frakes, Ricardo Baeza-Yates¹•Institutions (1)

University of Chile¹

12 Jun 1992

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

Abstract: An edited volume containing data structures and algorithms for information retrieved including a disk with examples written in C. For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

2,359 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse