Home
/
Authors
/
Anoop Korattikara

Author

Anoop Korattikara

Other affiliations: University of California, Irvine

Bio: Anoop Korattikara is an academic researcher from Google. The author has contributed to research in topics: Bayesian probability & Artificial neural network. The author has an hindex of 14, co-authored 20 publications receiving 3399 citations. Previous affiliations of Anoop Korattikara include University of California, Irvine.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors

[...]

Jonathan Huang¹, Vivek Rathod¹, Chen Sun², Menglong Zhu³, Anoop Korattikara⁴, Alireza Fathi², Ian Fischer², Zbigniew Wojna⁵, Yang Song⁶, Sergio Guadarrama⁷, Kevin Murphy⁸ - Show less +7 more•Institutions (8)

Russian Academy of Sciences¹, Google², University of Pennsylvania³, University of California, Irvine⁴, University College London⁵, Chinese Center for Disease Control and Prevention⁶, University of California, Berkeley⁷, Cardiff University⁸

21 Jul 2017

TL;DR: A unified implementation of the Faster R-CNN, R-FCN and SSD systems is presented and the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures is traced out.

...read moreread less

Abstract: The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-toapples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [30], R-FCN [6] and SSD [25] systems, which we view as meta-architectures and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

...read moreread less

2,484 citations

Proceedings Article•DOI•

Im2Calories: Towards an Automated Mobile Vision Food Diary

[...]

Austin Myers¹, Nick Johnston², Vivek Rathod², Anoop Korattikara², Alexander Gorban², Nathan Silberman², Sergio Guadarrama³, George Papandreou², Jonathan Huang⁴, Kevin Murphy² - Show less +6 more•Institutions (4)

University of Maryland, College Park¹, Google², University of California, Berkeley³, Stanford University⁴

07 Dec 2015

TL;DR: A system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories, is presented, significantly outperforming previous work.

...read moreread less

Abstract: We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories. The simplest version assumes that the user is eating at a restaurant for which we know the menu. In this case, we can collect images offline to train a multi-label classifier. At run time, we apply the classifier (running on your phone) to predict which foods are present in your meal, and we lookup the corresponding nutritional facts. We apply this method to a new dataset of images from 23 different restaurants, using a CNN-based classifier, significantly outperforming previous work. The more challenging setting works outside of restaurants. In this case, we need to estimate the size of the foods, as well as their labels. This requires solving segmentation and depth / volume estimation from a single image. We present CNN-based approaches to these problems, with promising preliminary results.

...read moreread less

360 citations

Proceedings Article•

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

[...]

Anoop Korattikara¹, Yutian Chen², Max Welling³•Institutions (3)

University of California, Irvine¹, University of Cambridge², University of Amsterdam³

21 Jun 2014

TL;DR: In this paper, an approximate MH rule based on a sequential hypothesis test was proposed to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule.

...read moreread less

Abstract: Can we make Bayesian posterior MCMC sampling more efficient when faced with very large datasets? We argue that computing the likelihood for N datapoints in the Metropolis-Hastings (MH) test to reach a single binary decision is computationally inefficient. We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule. While this method introduces an asymptotic bias, we show that this bias can be controlled and is more than offset by a decrease in variance due to our ability to draw more samples per unit of time.

...read moreread less

241 citations

Posted Content•

Speed/accuracy trade-offs for modern convolutional object detectors

[...]

30 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems, and present a unified implementation of the Faster R-CNN, R-FCN and SSD systems, which they view as "meta-architectures".

...read moreread less

Abstract: The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

...read moreread less

158 citations

Proceedings Article•

Bayesian dark knowledge

[...]

Anoop Korattikara¹, Vivek Rathod¹, Kevin Murphy¹, Max Welling²•Institutions (2)

Google¹, University of Amsterdam²

07 Dec 2015

TL;DR: This work describes a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.

...read moreread less

Abstract: We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities p(y|x, D), eg, for applications involving bandits or active learning One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics) Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time) We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [HLA15] and an approach based on variational Bayes [BCKW15] Our method performs better than both of these, is much simpler to implement, and uses less computation at test time

...read moreread less

151 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Posted Content•

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

[...]

Andrew Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, M. Andreetto, Hartwig Adam - Show less +4 more

17 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

...read moreread less

Abstract: We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

...read moreread less

14,406 citations

Proceedings Article•DOI•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari¹, Piotr Dollár², Ross Girshick²•Institutions (2)

Facebook¹, École Centrale Paris²

20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

...read moreread less

14,299 citations

Posted Content•

YOLOv3: An Incremental Improvement.

[...]

Joseph Redmon, Ali Farhadi

08 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors present some updates to YOLO!

...read moreread less

Abstract: We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at this https URL

...read moreread less

12,770 citations

Proceedings Article•DOI•

Focal Loss for Dense Object Detection

[...]

Tsung-Yi Lin¹, Priya Goyal², Ross Girshick², Kaiming He², Piotr Dollár² - Show less +1 more•Institutions (2)

Cornell University¹, Facebook²

07 Aug 2017

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

...read moreread less

Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.

...read moreread less

12,161 citations

Proceedings Article•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari², Piotr Dollár³, Ross Girshick³•Institutions (3)

Microsoft¹, University of California², École Centrale Paris³

20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation that outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: this https URL

...read moreread less

11,343 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse