Home
/
Authors
/
Chris J.C. Burges

Author

Chris J.C. Burges

Bio: Chris J.C. Burges is an academic researcher from Microsoft. The author has contributed to research in topics: Learning to rank & Ranking (information retrieval). The author has an hindex of 21, co-authored 32 publications receiving 3803 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning to rank using gradient descent

[...]

Chris J.C. Burges¹, Tal Shaked¹, Erin L. Renshaw¹, Ari Lazier¹, Matt Deeds¹, Nicole A. Hamilton¹, Greg Hullender¹ - Show less +3 more•Institutions (1)

Microsoft¹

07 Aug 2005

TL;DR: RankNet is introduced, an implementation of these ideas using a neural network to model the underlying ranking function, and test results on toy data and on data from a commercial internet search engine are presented.

...read moreread less

Abstract: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine.

...read moreread less

2,813 citations

Ranking, Boosting, and Model Adaptation

[...]

Chris J.C. Burges, Krysta M. Svore, Qiang Wu, Jianfeng Gao

01 Oct 2008

TL;DR: A new ranking algorithm is presented that combines the strengths of two previous methods: boosted tree classification, and LambdaR ank, which has been shown to be empirically optimal for a widely used information retrieval measure.

...read moreread less

Abstract: We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaR ank, which has been shown to be empirically optimal for a widely used information retrieval measure. The algorithm is based on boosted regression trees, although the ideas apply to any weak learners, and it is significantly fast er in both train and test phases than the state of the art, for comparable accuracy. We also show how to find the optimal linear combination for any two ran kers, and we use this method to solve the line search problem exactly during boosting. In addition, we show that starting with a previously tra ined model, and boosting using its residuals, furnishes an effective techn ique for model adaptation, and we give results for a particularly pressing prob lem in Web Search - training rankers for markets for which only small amounts of labeled data are available, given a ranker trained on much more data from a larger market.

...read moreread less

142 citations

Proceedings Article•DOI•

High accuracy retrieval with multiple nested ranker

[...]

Irina Matveeva¹, Chris J.C. Burges², Timo Burkard, Andy Laucius², Leon Wong² - Show less +1 more•Institutions (2)

University of Chicago¹, Microsoft²

06 Aug 2006

TL;DR: This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents by using the RankNet learning algorithm to re-rank a subset of the results.

...read moreread less

Abstract: High precision at the top ranks has become a new focus of research in information retrieval. This paper presents the multiple nested ranker approach that improves the accuracy at the top ranks by iteratively re-ranking the top scoring documents. At each iteration, this approach uses the RankNet learning algorithm to re-rank a subset of the results. This splits the problem into smaller and easier tasks and generates a new distribution of the results to be learned by the algorithm. We evaluate this approach using different settings on a data set labeled with several degrees of relevance. We use the normalized discounted cumulative gain (NDCG) to measure the performance because it depends not only on the position but also on the relevance score of the document in the ranked list. Our experiments show that making the learning algorithm concentrate on the top scoring results improves precision at the top ten documents in terms of the NDCG score.

...read moreread less

139 citations

Proceedings Article•DOI•

Optimisation methods for ranking functions with multiple parameters

[...]

Michael J. Taylor¹, Hugo Zaragoza², Nick Craswell¹, Stephen Robertson¹, Chris J.C. Burges¹ - Show less +1 more•Institutions (2)

Microsoft¹, Yahoo!²

06 Nov 2006

TL;DR: This work builds on recent advances in alternative differentiable pairwise cost functions, and shows that these techniques can be successfully applied to tuning the parameters of an existing family of IR scoring functions (BM25), in the sense that they cannot do better using sensible search heuristics that directly optimize the rank-based cost function NDCG.

...read moreread less

Abstract: Optimising the parameters of ranking functions with respect to standard IR rank-dependent cost functions has eluded satisfactory analytical treatment. We build on recent advances in alternative differentiable pairwise cost functions, and show that these techniques can be successfully applied to tuning the parameters of an existing family of IR scoring functions (BM25), in the sense that we cannot do better using sensible search heuristics that directly optimize the rank-based cost function NDCG. We also demonstrate how the size of training set affects the number of parameters we can hope to tune this way.

...read moreread less

125 citations

Patent•

System and method for inferring similarities between media objects

[...]

Chris J.C. Burges¹, Cormac Herley¹, John Platt¹•Institutions (1)

Microsoft¹

13 Oct 2004

TL;DR: In this paper, a combination of audio fingerprinting and repeat object detection is used for gathering statistics on broadcast media streams, which are then inferred based on the observation that objects appearing closer together in an authored stream are more likely to be similar.

...read moreread less

Abstract: A “similarity quantifier” automatically infers similarity between media objects which have no inherent measure of distance between them. For example, a human listener can easily determine that a song like Solsbury Hill by Peter Gabriel is more similar to Everybody Hurts by R.E.M. than it is to Highway to Hell by AC/DC. However, automatic determination of this similarity is typically a more difficult problem. This problem is addressed by using a combination of techniques for inferring similarities between media objects thereby facilitating media object filing, retrieval, classification, playlist construction, etc. Specifically, a combination of audio fingerprinting and repeat object detection is used for gathering statistics on broadcast media streams. These statistics include each media objects identity and positions within the media stream. Similarities between media objects are then inferred based on the observation that objects appearing closer together in an authored stream are more likely to be similar.

...read moreread less

89 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Proceedings Article•

BPR: Bayesian personalized ranking from implicit feedback

[...]

Steffen Rendle¹, Christoph Freudenthaler¹, Zeno Gantner¹, Lars Schmidt-Thieme¹•Institutions (1)

University of Hildesheim¹

18 Jun 2009

TL;DR: In this article, the authors proposed a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem, which is based on stochastic gradient descent with bootstrap sampling.

...read moreread less

Abstract: Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive k-nearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the importance of optimizing models for the right criterion.

...read moreread less

3,429 citations

Book•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

27 Jun 2009

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

...read moreread less

Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

...read moreread less

2,515 citations

Book•

The Probabilistic Relevance Framework

[...]

Stephen Robertson¹, Hugo Zaragoza²•Institutions (2)

Microsoft¹, Yahoo!²

17 Dec 2009

TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.

...read moreread less

Abstract: The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

...read moreread less

2,037 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse