Term Weighting Approaches in Automatic Text Retrieval

doi:10.1016/0306-4573(88)90021-0

Home
/
Papers
/
Term Weighting Approaches in Automatic Text Retrieval

Journal Article•DOI•

Term Weighting Approaches in Automatic Text Retrieval

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 Aug 1988-Information Processing and Management (Cornell University)-Vol. 24, Iss: 5, pp 323-328

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

read less

Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

Content maybe subject to copyright Report

HTML Viewer

Citations

PDF

Open Access

More filters

Book•

Gaussian Processes for Machine Learning

[...]

Carl Edward Rasmussen¹, Christopher Williams•Institutions (1)

Max Planck Society¹

23 Nov 2005

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.

...read moreread less

Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

...read moreread less

11,357 citations

Cites methods from "Term Weighting Approaches in Automa..."

...using the “term frequency inverse document frequency” (TF-IDF) weighting scheme developed in the information retrieval area [Salton and Buckley, 1988]....
[...]

Book Chapter•DOI•

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

[...]

Thorsten Joachims¹•Institutions (1)

Technical University of Dortmund¹

21 Apr 1998

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.

...read moreread less

Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

...read moreread less

8,658 citations

Journal Article•DOI•

Machine learning in automated text categorization

[...]

Fabrizio Sebastiani

01 Mar 2002-ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

7,539 citations

Cites background or result from "Term Weighting Approaches in Automa..."

...[Fuhr 1989; Salton and Buckley 1988]) accounting for the “importance” of tk for dj play a key role in IR....
[...]
...[Salton and Buckley 1988]....
[...]
...As such, this would seem to contradict a well-known law of IR, according to which the terms with low-to-medium document frequency are the most informative ones [Salton and Buckley 1988]....
[...]
...1998; Lewis 1992a] it has been found that representations more sophisticated than this do not yield significantly better effectiveness, thereby confirming similar results from IR [Salton and Buckley 1988]....
[...]
...As such, this would seem to contradict a well-known “law” of IR, according to which the terms with low-to-medium document frequency are the most informative ones [Salton and Buckley 1988]....
[...]

Proceedings Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, IBM², Google³

01 Jan 2010

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

7,244 citations

Journal Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, Princeton University², Google³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

6,984 citations

Additional excerpts

...The informativeness of rare features has led practitioners to craft domain-specific feature weightin s, such as TF-IDF (Salton and Buckley, 1988), which pre-emphasize infrequently occurring features....
[...]
...The informativeness of rare features has led practitioners to craft domain-specific feature weightings, such as TF-IDF (Salton and Buckley, 1988), which pre-emphasize infrequently occurring features. We use this old idea as a motivation for applying modern learning-theoretic techniques to the problem of online and stochastic learning, focusing specifically on (sub)gradient methods. Standard stochastic subgradient methods largely follow a predetermined procedural scheme that is oblivious to the characteristics of the data being observed. In contrast, our algorithms dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. The adaptation facilitates finding and identifying very predictive but comparatively rare features. The following toy example of classification with the hinge loss highlights a problem with standard subgradient methods. We receive a sequence of 3-dimensional vectors {zt} and labels yt ∈ {−1,+1}. Let ut be independent ±1-valued random variables, each with probability 1 2 . Each instance zt is associated with a label yt = −1 and is equal to (ut,−ut, 0) 99% of the time, while zt = (ut,−ut, 1) the remaining 1% of the cases with label yt = 1. We would like find a vector x with ‖x‖∞ ≤ 1 such that max{0,−yt 〈x, zt〉} = 0 for all t. Clearly, any solution vector takes the form x = (a, a, 1) with |a| ≤ 1. Standard subgradient methods, such as the one proposed by Zinkevich (2003), iterate as follows: xt+1 = Π{x:‖x‖ ∞ ≤1}(xt − ηtytzt) , where ΠX denotes Euclidean projection onto the set X and ηt = 1/ √ t....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Introduction to Modern Information Retrieval

[...]

Gerard Salton, Michael J. McGill

01 Jan 1983

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Abstract: Some people may be laughing when looking at you reading in your spare time. Some may be admired of you. And some may want be like you who have reading hobby. What about your own feel? Have you felt right? Reading is a need and a hobby at once. This condition is the on that will make you feel that you must read. If you know are looking for the book enPDFd introduction to modern information retrieval as the choice of reading, you can find here.

...read moreread less

12,059 citations

Journal Article•DOI•

A statistical interpretation of term specificity and its application in retrieval

[...]

Karen Sparck Jones¹•Institutions (1)

University of Cambridge¹

01 Jan 1972-Journal of Documentation

TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.

...read moreread less

Abstract: The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

...read moreread less

3,559 citations

Book•

The SMART Retrieval System—Experiments in Automatic Document Processing

[...]

Gerard Salton

01 Jan 1971

2,877 citations

Book•

Relevance weighting of search terms

[...]

Stephen Robertson, Karen Sparck Jones

01 Dec 1988

TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.

...read moreread less

Abstract: This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.

...read moreread less

2,105 citations

Journal Article•DOI•

Relevance weighting of search terms

[...]

Stephen Robertson¹, K. Sparck Jones²•Institutions (2)

University College London¹, University of Cambridge²

01 May 1976-Journal of the Association for Information Science and Technology

TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.

...read moreread less

1,889 citations