Showing papers by "Christian Szegedy published in 2016"

PDF

Open Access

Book Chapter•DOI•

[...]

Wei Liu¹, Dragomir Anguelov, Dumitru Erhan², Christian Szegedy², Scott Reed³, Cheng-Yang Fu¹, Alexander C. Berg¹ - Show less +3 more•Institutions (3)

University of North Carolina at Chapel Hill¹, Google², University of Michigan³

08 Oct 2016

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less

Abstract: We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For \(300 \times 300\) input, SSD achieves 74.3 % mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for \(512 \times 512\) input, SSD achieves 76.9 % mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https://github.com/weiliu89/caffe/tree/ssd.

...read moreread less

19,543 citations

Proceedings Article•DOI•

Rethinking the Inception Architecture for Computer Vision

[...]

Christian Szegedy¹, Vincent Vanhoucke¹, Sergey Ioffe¹, Jonathon Shlens¹, Zbigniew Wojna² - Show less +1 more•Institutions (2)

Google¹, University College London²

27 Jun 2016

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

Abstract: Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.

...read moreread less

16,962 citations

Proceedings Article•

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

[...]

Christian Szegedy¹, Sergey Ioffe¹, Vincent Vanhoucke¹, Alexander A. Alemi¹•Institutions (1)

Google¹

23 Feb 2016

TL;DR: In this paper, the authors show that training with residual connections accelerates the training of Inception networks significantly, and they also present several new streamlined architectures for both residual and non-residual Inception Networks.

...read moreread less

Abstract: Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge

...read moreread less

6,761 citations

Proceedings Article•

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

[...]

Christian Szegedy¹, Sergey Ioffe¹, Vincent Vanhoucke¹, Alexander A. Alemi¹•Institutions (1)

Google¹

23 Feb 2016

TL;DR: In this article, the authors show that training with residual connections accelerates the training of Inception networks significantly, and they also present several new streamlined architectures for both residual and non-residual Inception Networks.

...read moreread less

Abstract: Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question: Are there any benefits to combining Inception architectures with residual connections? Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4 networks, we achieve 3.08% top-5 error on the test set of the ImageNet classification (CLS) challenge.

...read moreread less

4,051 citations

Posted Content•

DeepMath - Deep Sequence Models for Premise Selection

[...]

Alexander A. Alemi, François Chollet, Niklas Een, Geoffrey Irving, Christian Szegedy, Josef Urban - Show less +2 more

14 Jun 2016-arXiv: Artificial Intelligence

TL;DR: A two stage approach is proposed that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models.

...read moreread less

Abstract: We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.

...read moreread less

85 citations

Proceedings Article•

DeepMath - Deep Sequence Models for Premise Selection

[...]

Alexander A. Alemi¹, François Chollet¹, Geoffrey Irving¹, Christian Szegedy¹, Josef Urban² - Show less +1 more•Institutions (2)

Google¹, Czech Technical University in Prague²

14 Jun 2016

TL;DR: In this paper, neural sequence models were used for premise selection in automated theorem proving, which is a key bottleneck for progress in formalized mathematics, and they showed good results for the premise selection task on the Mizar corpus.

...read moreread less

Abstract: We study the effectiveness of neural sequence models for premise selection in automated theorem proving, a key bottleneck for progress in formalized mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied theorem proving on a large scale.

...read moreread less

54 citations

Patent•

Image classification neural networks

[...]

Vincent Vanhoucke¹, Christian Szegedy¹, Sergey Ioffe¹•Institutions (1)

Google¹

30 Dec 2016

TL;DR: In this article, a neural network system that includes multiple subnetworks that includes a first subnetwork including multiple first modules, each first module including: a pass-through convolutional layer configured to process the subnetwork input for the first sub-network to generate a passthrough output; an average pooling stack of neural network layers that collectively processes the sub-networks inputs to generate an average Pooling output.

...read moreread less

Abstract: A neural network system that includes: multiple subnetworks that includes: a first subnetwork including multiple first modules, each first module including: a pass-through convolutional layer configured to process the subnetwork input for the first subnetwork to generate a pass-through output; an average pooling stack of neural network layers that collectively processes the subnetwork input for the first subnetwork to generate an average pooling output; a first stack of convolutional neural network layers configured to collectively process the subnetwork input for the first subnetwork to generate a first stack output; a second stack of convolutional neural network layers that are configured to collectively process the subnetwork input for the first subnetwork to generate a second stack output; and a concatenation layer configured to concatenate the pass-through output, the average pooling output, the first stack output, and the second stack output to generate a first module output for the first module.

...read moreread less

12 citations

Patent•

Systems and Methods for Identifying Entities Directly from Imagery

[...]

Qian Yu¹, Liron Yatziv¹, Yeqing Li¹, Christian Szegedy¹, Sacha Christopher Arnoud¹, Martin C. Stumpe¹ - Show less +2 more•Institutions (1)

Google¹

04 Apr 2016

TL;DR: In this paper, one or more candidate entity profiles can be determined from an entity directory based at least in part on the images that depict the entity and the candidate profiles provided as input to a machine learning model.

...read moreread less

Abstract: Systems and methods of identifying entities are disclosed. In particular, one or more images that depict an entity can be identified from a plurality of images. One or more candidate entity profiles can be determined from an entity directory based at least in part on the one or more images that depict the entity. The one or more images that depict the entity and the one or more candidate entity profiles can be provided as input to a machine learning model. One or more outputs of the machine learning model can be generated. Each output can include a match score associated with an image that depicts the entity and at least one candidate entity profile. The entity directory can be updated based at least in part on the one or more generated outputs of the machine learning model.

...read moreread less

8 citations

Patent•

Adversarial training of neural networks

[...]

Christian Szegedy¹, Ian Goodfellow¹•Institutions (1)

Google¹

28 Sep 2016