scispace - formally typeset
Search or ask a question

Showing papers by "Sergio Guadarrama published in 2015"


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

4,206 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: A system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories, is presented, significantly outperforming previous work.
Abstract: We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories. The simplest version assumes that the user is eating at a restaurant for which we know the menu. In this case, we can collect images offline to train a multi-label classifier. At run time, we apply the classifier (running on your phone) to predict which foods are present in your meal, and we lookup the corresponding nutritional facts. We apply this method to a new dataset of images from 23 different restaurants, using a CNN-based classifier, significantly outperforming previous work. The more challenging setting works outside of restaurants. In this case, we need to estimate the size of the foods, as well as their labels. This requires solving segmentation and depth / volume estimation from a single image. We present CNN-based approaches to these problems, with promising preliminary results.

360 citations


Book ChapterDOI
01 Jan 2015
TL;DR: Language is inextricably linked to knowledge communication and representation, and is viewed here as a complex reality to be mathematically represented step by step, in a incremental fashion.
Abstract: Language is basically the system used by humans for communication and covers a wide range of their activities. It is a social phenomenon resulting in an evolving system of great complexity. Language is inextricably linked to knowledge communication and representation, and is viewed here as a complex reality to be mathematically represented step by step, in a incremental fashion.

6 citations


Proceedings Article
25 Jan 2015
TL;DR: In this article, the Optimal Roundness Criterion (ORC) is proposed as a novel stopping criterion for sparse filtering, which is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with sparse filtering considerably faster and more accurate.
Abstract: Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering with spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests early stopping of Sparse Filtering. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with Sparse Filtering considerably faster and more accurate.

2 citations