Top 8 papers published by Antonio Torralba from Massachusetts Institute of Technology in 2008

Journal Article•DOI•

LabelMe: A Database and Web-Based Tool for Image Annotation

[...]

Bryan Russell¹, Antonio Torralba¹, Kevin Murphy², William T. Freeman¹•Institutions (2)

Massachusetts Institute of Technology¹, University of British Columbia²

01 May 2008-International Journal of Computer Vision

TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.

...read moreread less

Abstract: We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

...read moreread less

3,501 citations

Proceedings Article•

Spectral Hashing

[...]

Yair Weiss¹, Antonio Torralba², Rob Fergus³•Institutions (3)

Hebrew University of Jerusalem¹, Massachusetts Institute of Technology², Courant Institute of Mathematical Sciences³

08 Dec 2008

TL;DR: The problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard and a spectral method is obtained whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian.

...read moreread less

Abstract: Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel data-point. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art.

...read moreread less

2,641 citations

Journal Article•DOI•

80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

[...]

Antonio Torralba¹, Rob Fergus², William T. Freeman¹•Institutions (2)

Massachusetts Institute of Technology¹, New York University²

01 Nov 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

...read moreread less

Abstract: With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

...read moreread less

1,871 citations

80 million tiny images : a large dataset for non-parametric object and scene recognition

[...]

Antonio Torralba

01 Jan 2008

TL;DR: In this paper, a large dataset of 79,302,017 images collected from the Internet is used to explore the visual world with the aid of a variety of non-parametric methods.

...read moreread less

Abstract: With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

...read moreread less

1,607 citations

Proceedings Article•DOI•

Small codes and large image databases for recognition

[...]

Antonio Torralba¹, Rob Fergus², Yair Weiss³•Institutions (3)

Massachusetts Institute of Technology¹, Courant Institute of Mathematical Sciences², Vassar College³

23 Jun 2008

TL;DR: The goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices.

...read moreread less

Abstract: The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition, computer graphics, personal photo collections, online image search tools. In this paper, our goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices. Our approach uses recently developed machine learning techniques to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using our scheme, it is possible to perform real-time searches with millions from the Internet using a single large PC and obtain recognition results comparable to the full descriptor. Using our codes on high quality labeled images from the LabelMe database gives surprisingly powerful recognition results using simple nearest neighbor techniques.

...read moreread less

839 citations

Book Chapter•DOI•

SIFT Flow: Dense Correspondence across Different Scenes

[...]

Ce Liu¹, Jenny Yuen¹, Antonio Torralba¹, Josef Sivic², William T. Freeman¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Normale Supérieure²

12 Oct 2008

TL;DR: A method to align an image to its neighbors in a large image collection consisting of a variety of scenes, and applies the SIFT flow algorithm to two applications: motion field prediction from a single static image and motion synthesis via transfer of moving objects.

...read moreread less

Abstract: While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its neighbors in a large image collection consisting of a variety of scenes For a query image, histogram intersection on a bag-of-visual-words representation is used to find the set of nearest neighbors in the database The SIFT flow algorithm then consists of matching densely sampled SIFT features between the two images, while preserving spatial discontinuities The use of SIFT features allows robust matching across different scene/object appearances and the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene Experiments show that the proposed approach is able to robustly align complicated scenes with large spatial distortions We collect a large database of videos and apply the SIFT flow algorithm to two applications: (i) motion field prediction from a single static image and (ii) motion synthesis via transfer of moving objects

...read moreread less

690 citations

Journal Article•DOI•

Describing Visual Scenes Using Transformed Objects and Parts

[...]

Erik B. Sudderth¹, Antonio Torralba², William T. Freeman², Alan S. Willsky²•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

01 May 2008-International Journal of Computer Vision

TL;DR: This work develops hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them and proposes nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene.

...read moreread less

Abstract: We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

...read moreread less

195 citations

Proceedings Article•DOI•

Creating and exploring a large photorealistic virtual space

[...]

Josef Sivic¹, Biliana K. Kaneva², Antonio Torralba², Shai Avidan³, William T. Freeman² - Show less +1 more•Institutions (3)

École Normale Supérieure¹, Massachusetts Institute of Technology², Adobe Systems³

23 Jun 2008

TL;DR: A system for generating “infinite” images from large collections of photos by means of transformed image retrieval, which represents images in the database as a graph where each node is an image and different types of edges correspond to different type of geometric transformations simulating different camera motions.

...read moreread less

Abstract: We present a system for exploring large collections of photos in a virtual 3D space. Our system does not assume the photographs are of a single real 3D location, nor that they were taken at the same time. Instead, we organize the photos in themes, such as city streets or skylines, and let users navigate within each theme using intuitive 3D controls that include move left/right, zoom and rotate. Themes allow us to maintain a coherent semantic meaning of the tour, while visual similarity allows us to create a ldquobeing thererdquo impression, as if the images were of a particular location. We present results on a collection of several million images downloaded from Flickr and broken into themes that consist of a few hundred thousand images each. A byproduct of our system is the ability to construct extremely long panoramas, as well as image taxi, a program that generates a virtual tour between a user supplied start and finish images. The system, and its underlying technology can be used in a variety of applications such as games, movies and online virtual 3D spaces like Second Life.

...read moreread less

73 citations

Showing papers by "Antonio Torralba published in 2008"