scispace - formally typeset
A

Antonio Torralba

Researcher at Massachusetts Institute of Technology

Publications -  437
Citations -  105763

Antonio Torralba is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 119, co-authored 388 publications receiving 84607 citations. Previous affiliations of Antonio Torralba include Vassar College & Nvidia.

Papers
More filters
Book ChapterDOI

Where Should Saliency Models Look Next

TL;DR: It is argued that to continue to approach human-level performance, saliency models will need to discover higher-level concepts in images: text, objects of gaze and action, locations of motion, and expected locations of people in images.
Journal ArticleDOI

Using the forest to see the trees: exploiting context for visual object detection and localization

TL;DR: In this paper, a probabilistic framework for encoding the relationships between context and object properties is proposed, which can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.
Book ChapterDOI

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

TL;DR: The authors explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to, and demonstrate that these audio-visual associative localizations emerge from network-internal representations learned as a byproduct of training to perform an image-audio retrieval task.
Book ChapterDOI

Semantic label sharing for learning with many categories

TL;DR: A simple method of label sharing between semantically similar categories is proposed that leverages the WordNet hierarchy to define semantic distance between any two categories and use this semantic distance to share labels.
Posted Content

The Sound of Pixels

TL;DR: Qualitative results suggest the PixelPlayer model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources, and experimental results show that the proposed Mix-and-Separate framework outperforms several baselines on source separation.