Antonio Torralba

Researcher at Massachusetts Institute of Technology

Publications - 437

Citations - 105763

Antonio Torralba is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 119, co-authored 388 publications receiving 84607 citations. Previous affiliations of Antonio Torralba include Vassar College & Nvidia.

Papers

PDF

Open Access

More filters

Book ChapterDOI

Where Should Saliency Models Look Next

Zoya Bylinskii, +5 more

TL;DR: It is argued that to continue to approach human-level performance, saliency models will need to discover higher-level concepts in images: text, objects of gaze and action, locations of motion, and expected locations of people in images.

...read moreread less

Journal ArticleDOI

Using the forest to see the trees: exploiting context for visual object detection and localization

Antonio Torralba, +2 more

- 01 Mar 2010 -

Communications of The ACM

TL;DR: In this paper, a probabilistic framework for encoding the relationships between context and object properties is proposed, which can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.

...read moreread less

Book ChapterDOI

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

David Harwath, +5 more

TL;DR: The authors explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to, and demonstrate that these audio-visual associative localizations emerge from network-internal representations learned as a byproduct of training to perform an image-audio retrieval task.

...read moreread less

Book ChapterDOI

Semantic label sharing for learning with many categories

Rob Fergus, +3 more

TL;DR: A simple method of label sharing between semantically similar categories is proposed that leverages the WordNet hierarchy to define semantic distance between any two categories and use this semantic distance to share labels.

...read moreread less

Posted Content

The Sound of Pixels

Hang Zhao, +7 more

- 09 Apr 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Qualitative results suggest the PixelPlayer model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources, and experimental results show that the proposed Mix-and-Separate framework outperforms several baselines on source separation.

...read moreread less

Collapse