Showing papers by "Sergio Guadarrama published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Long-term recurrent convolutional networks for visual recognition and description

[...]

Jeff Donahue¹, Lisa Anne Hendricks¹, Sergio Guadarrama¹, Marcus Rohrbach¹, Subhashini Venugopalan², Trevor Darrell¹, Kate Saenko³ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

07 Jun 2015

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

4,206 citations

Proceedings Article•DOI•

Im2Calories: Towards an Automated Mobile Vision Food Diary

[...]

Austin Myers¹, Nick Johnston², Vivek Rathod², Anoop Korattikara², Alexander Gorban², Nathan Silberman², Sergio Guadarrama³, George Papandreou², Jonathan Huang⁴, Kevin Murphy² - Show less +6 more•Institutions (4)

University of Maryland, College Park¹, Google², University of California, Berkeley³, Stanford University⁴

07 Dec 2015

TL;DR: A system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories, is presented, significantly outperforming previous work.

...read moreread less

Abstract: We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories. The simplest version assumes that the user is eating at a restaurant for which we know the menu. In this case, we can collect images offline to train a multi-label classifier. At run time, we apply the classifier (running on your phone) to predict which foods are present in your meal, and we lookup the corresponding nutritional facts. We apply this method to a new dataset of images from 23 different restaurants, using a CNN-based classifier, significantly outperforming previous work. The more challenging setting works outside of restaurants. In this case, we need to estimate the size of the foods, as well as their labels. This requires solving segmentation and depth / volume estimation from a single image. We present CNN-based approaches to these problems, with promising preliminary results.

...read moreread less

360 citations

Book Chapter•DOI•

A First Inquiry on Semantic-Based Models of And

[...]

Sergio Guadarrama, Eloy Renedo, Enric Trillas¹•Institutions (1)

Complutense University of Madrid¹

01 Jan 2015

TL;DR: Language is inextricably linked to knowledge communication and representation, and is viewed here as a complex reality to be mathematically represented step by step, in a incremental fashion.

...read moreread less

Abstract: Language is basically the system used by humans for communication and covers a wide range of their activities. It is a social phenomenon resulting in an evolving system of great complexity. Language is inextricably linked to knowledge communication and representation, and is viewed here as a complex reality to be mathematically represented step by step, in a incremental fashion.

...read moreread less

6 citations

Proceedings Article•

Compute less to get more: using ORC to improve sparse filtering

[...]

Johannes Lederer¹, Sergio Guadarrama²•Institutions (2)

Cornell University¹, University of California, Berkeley²

25 Jan 2015

TL;DR: In this article, the Optimal Roundness Criterion (ORC) is proposed as a novel stopping criterion for sparse filtering, which is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with sparse filtering considerably faster and more accurate.

...read moreread less

Abstract: Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering with spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests early stopping of Sparse Filtering. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with Sparse Filtering considerably faster and more accurate.

...read moreread less

2 citations