Home
/
Authors
/
Boris Babenko

Author

Boris Babenko

Other affiliations: University of California, San Diego, University of California

Bio: Boris Babenko is an academic researcher from Google. The author has contributed to research in topics: Boosting (machine learning) & Supervised learning. The author has an hindex of 16, co-authored 27 publications receiving 5914 citations. Previous affiliations of Boris Babenko include University of California, San Diego & University of California.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Robust Object Tracking with Online Multiple Instance Learning

[...]

Boris Babenko¹, Ming-Hsuan Yang², Serge Belongie¹•Institutions (2)

University of California, San Diego¹, University of California, Merced²

01 Aug 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems and can therefore lead to a more robust tracker with fewer parameter tweaks.

...read moreread less

Abstract: In this paper, we address the problem of tracking an object in a video given its location in the first frame and no other information. Recently, a class of tracking techniques called “tracking by detection” has been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrade the classifier and can cause drift. In this paper, we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems and can therefore lead to a more robust tracker with fewer parameter tweaks. We propose a novel online MIL algorithm for object tracking that achieves superior results with real-time performance. We present thorough experimental results (both qualitative and quantitative) on a number of challenging video clips.

...read moreread less

2,101 citations

Proceedings Article•DOI•

Visual tracking with online Multiple Instance Learning

[...]

Boris Babenko¹, Ming-Hsuan Yang², Serge Belongie¹•Institutions (2)

University of California, San Diego¹, University of California, Merced²

20 Jun 2009

TL;DR: It is shown that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks.

...read moreread less

Abstract: In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrades the classifier and can cause further drift. In this paper we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks. We present a novel online MIL algorithm for object tracking that achieves superior results with real-time performance.

...read moreread less

1,752 citations

Proceedings Article•DOI•

End-to-end scene text recognition

[...]

Kai Wang¹, Boris Babenko¹, Serge Belongie¹•Institutions (1)

University of California, San Diego¹

06 Nov 2011

TL;DR: While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.

...read moreread less

Abstract: This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.

...read moreread less

1,074 citations

Book Chapter•DOI•

Visual recognition with humans in the loop

[...]

Steve Branson¹, Catherine Wah¹, Florian Schroff¹, Boris Babenko¹, Peter Welinder², Pietro Perona², Serge Belongie¹ - Show less +3 more•Institutions (2)

University of California, San Diego¹, California Institute of Technology²

05 Sep 2010

TL;DR: The results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.

...read moreread less

Abstract: We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate our methods on Birds-200, a difficult dataset of 200 tightly-related bird species, and on the Animals With Attributes dataset. Our results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.

...read moreread less

492 citations

Book Chapter•DOI•

Multiple Component Learning for Object Detection

[...]

Piotr Dollár¹, Boris Babenko², Serge Belongie¹, Pietro Perona¹, Zhuowen Tu² - Show less +1 more•Institutions (2)

California Institute of Technology¹, University of California²

12 Oct 2008

TL;DR: The method, Multiple Component Learning (mcl), automatically learns individual component classifiers and combines these into an overall classifier, and unlike methods that are not part-based, mcl is quite robust to occlusions.

...read moreread less

Abstract: Object detection is one of the key problems in computer vision In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates The field has also seen a resurgence of part-based recognition methods, with impressive results on highly articulated, diverse object categories In this paper we propose a discriminative learning approach for detection that is inspired by part-based recognition approaches Our method, Multiple Component Learning (mcl), automatically learns individual component classifiers and combines these into an overall classifier Unlike previous methods, which rely on either fairly restricted part models or labeled part data, mcl learns powerful component classifiers in a weakly supervised manner, where object labels are provided but part labels are not The basis of mcl lies in learning a set classifier; we achieve this by combining boosting with weakly supervised learning, specifically the Multiple Instance Learning framework (mil) mcl is general, and we demonstrate results on a range of data from computer audition and computer vision In particular, mcl outperforms all existing methods on the challenging INRIA pedestrian detection dataset, and unlike methods that are not part-based, mcl is quite robust to occlusions

...read moreread less

162 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning Deep Features for Discriminative Localization

[...]

Bolei Zhou¹, Aditya Khosla¹, Agata Lapedriza¹, Aude Oliva¹, Antonio Torralba¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

27 Jun 2016

TL;DR: This work revisits the global average pooling layer proposed in [13], and sheds light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels.

...read moreread less

Abstract: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.

...read moreread less

5,978 citations

Book Chapter•DOI•

Prospective Cohort Study

[...]

Victor R. Preedy, Ronald R. Watson

01 Jan 2010

5,842 citations

Reading Digits in Natural Images with Unsupervised Feature Learning

[...]

Yuval Netzer¹, Tao Wang¹, Adam Coates¹, Alessandro Bissacco², Bo Wu², Andrew Y. Ng² - Show less +2 more•Institutions (2)

Google¹, Stanford University²

01 Jan 2011

TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

...read moreread less

Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

...read moreread less

5,311 citations

Posted Content•

Learning Deep Features for Discriminative Localization

[...]

Bolei Zhou¹, Aditya Khosla¹, Agata Lapedriza¹, Aude Oliva¹, Antonio Torralba¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

14 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors revisited the global average pooling layer and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

...read moreread less

Abstract: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them

...read moreread less

5,065 citations

Journal Article•DOI•

High-Speed Tracking with Kernelized Correlation Filters

[...]

João F. Henriques¹, Rui Caseiro¹, Pedro Martins¹, Jorge Batista¹•Institutions (1)

University of Coimbra¹

01 Mar 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.

...read moreread less

Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.

...read moreread less

4,994 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse