Showing papers by "Scott Reed published in 2014"

PDF

Open Access

Posted Content•

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

17 Sep 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep convolutional neural network architecture codenamed Inception is proposed that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

2,567 citations

Proceedings Article•DOI•

Evaluation of Output Embeddings for Fine-Grained Image Classification

[...]

Zeynep Akata¹, Scott Reed², Daniel J. Walter², Honglak Lee², Bernt Schiele¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, University of Michigan²

30 Sep 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, given image and class embeddings, they learn a compatibility function such that matching embedding are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score.

...read moreread less

Abstract: Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.

...read moreread less

777 citations

Posted Content•

Training Deep Neural Networks on Noisy Labels with Bootstrapping

[...]

Scott Reed¹, Honglak Lee¹, Dragomir Anguelov², Christian Szegedy², Dumitru Erhan³, Andrew Rabinovich² - Show less +2 more•Institutions (3)

University of Michigan¹, Google², Microsoft³

20 Dec 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors proposed a generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency, where a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.

...read moreread less

Abstract: Current state-of-the-art deep learning systems for visual object recognition and detection use purely supervised training with regularization such as dropout to avoid overfitting. The performance depends critically on the amount of labeled examples, and in current practice the labels are assumed to be unambiguous and accurate. However, this assumption often does not hold; e.g. in recognition, class labels may be missing; in detection, objects in the image may not be localized; and in general, the labeling may be subjective. In this work we propose a generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency. We consider a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data. In experiments we demonstrate that our approach yields substantial robustness to label noise on several datasets. On MNIST handwritten digits, we show that our model is robust to label corruption. On the Toronto Face Database, we show that our model handles well the case of subjective labels in emotion recognition, achieving state-of-the- art results, and can also benefit from unlabeled face images with no modification to our method. On the ILSVRC2014 detection challenge data, we show that our approach extends to very deep networks, high resolution images and structured outputs, and results in improved scalable detection.

...read moreread less

398 citations

Posted Content•

Scalable, high-quality object detection

[...]

Christian Szegedy, Scott Reed, Dumitru Erhan, Dragomir Anguelov

03 Dec 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is demonstrated that learning-based proposal methods can effectively match the performance of hand-engineered methods while allowing for very efficient runtime-quality trade-offs.

...read moreread less

Abstract: Current high-quality object detection approaches use the scheme of salience-based object proposal methods followed by post-classification using deep convolutional features. This spurred recent research in improving object proposal methods. However, domain agnostic proposal generation has the principal drawback that the proposals come unranked or with very weak ranking, making it hard to trade-off quality for running time. This raises the more fundamental question of whether high-quality proposal generation requires careful engineering or can be derived just from data alone. We demonstrate that learning-based proposal methods can effectively match the performance of hand-engineered methods while allowing for very efficient runtime-quality trade-offs. Using the multi-scale convolutional MultiBox (MSC-MultiBox) approach, we substantially advance the state-of-the-art on the ILSVRC 2014 detection challenge data set, with $0.5$ mAP for a single model and $0.52$ mAP for an ensemble of two models. MSC-Multibox significantly improves the proposal quality over its predecessor MultiBox~method: AP increases from $0.42$ to $0.53$ for the ILSVRC detection challenge. Finally, we demonstrate improved bounding-box recall compared to Multiscale Combinatorial Grouping with less proposals on the Microsoft-COCO data set.

...read moreread less

364 citations

Proceedings Article•

Learning to Disentangle Factors of Variation with Manifold Interaction

[...]

Scott Reed¹, Kihyuk Sohn¹, Yuting Zhang¹, Honglak Lee¹•Institutions (1)

University of Michigan¹

21 Jun 2014

TL;DR: A higher-order Boltzmann machine that incorporates multiplicative interactions among groups of hidden units that each learn to encode a distinct factor of variation is proposed and achieves state-of-the-art emotion recognition and face verification performance on the Toronto Face Database.

...read moreread less

Abstract: Many latent factors of variation interact to generate sensory data; for example, pose, morphology and expression in face images. In this work, we propose to learn manifold coordinates for the relevant factors of variation and to model their joint interaction. Many existing feature learning algorithms focus on a single task and extract features that are sensitive to the task-relevant factors and invariant to all others. However, models that just extract a single set of invariant features do not exploit the relationships among the latent factors. To address this, we propose a higher-order Boltzmann machine that incorporates multiplicative interactions among groups of hidden units that each learn to encode a distinct factor of variation. Furthermore, we propose correspondence-based training strategies that allow effective disentangling. Our model achieves state-of-the-art emotion recognition and face verification performance on the Toronto Face Database. We also demonstrate disentangled features learned on the CMU Multi-PIE dataset.

...read moreread less

275 citations