scispace - formally typeset
Search or ask a question

Showing papers by "Ali Farhadi published in 2010"


Book ChapterDOI
05 Sep 2010
TL;DR: A system that can compute a score linking an image to a sentence, which can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence.
Abstract: Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned us-ingdata. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

1,228 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work introduces a new dataset that provides annotation for sharing models of appearance and correlation across categories and uses it to learn part and category detectors that serve as the visual basis for an integrated model of objects.
Abstract: We propose an approach to find and describe objects within broad domains. We introduce a new dataset that provides annotation for sharing models of appearance and correlation across categories. We use it to learn part and category detectors. These serve as the visual basis for an integrated model of objects. We describe objects by the spatial arrangement of their attributes and the interactions between them. Using this model, our system can find animals and vehicles that it has not seen and infer attributes, such as function and pose. Our experiments demonstrate that we can more reliably locate and describe both familiar and unfamiliar objects, compared to a baseline that relies purely on basic category detectors.

225 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: The challenges of constructing such a detailed dataset are presented, and why the benefits of using this data outweigh the difficulties of collecting it are discussed.
Abstract: Several recent works have explored the benefits of providing more detailed annotations for object recognition. These annotations provide information beyond object names, and allow a detector to reason and describe individual instances in plain English. However, by demanding more specific details from annotators, new difficulties arise, such as stronger language dependencies and limited annotator attention. In this work, we present the challenges of constructing such a detailed dataset, and discuss why the benefits of using this data outweigh the difficulties of collecting it.

40 citations


Journal ArticleDOI
17 May 2010
TL;DR: Methods to prepare data, check quality, and set prices for work for this annotation process are described, which can be used to build a large and challenging labelled face dataset with no manual intervention.
Abstract: Modern computer vision research consumes labelled data in quantity, and building datasets has become an important activity. The Internet has become a tremendous resource for computer vision researchers. By seeing the Internet as a vast, slightly disorganized collection of visual data, we can build datasets. The key point is that visual data are surrounded by contextual information like text and HTML tags, which is a strong, if noisy, cue to what the visual data means. In a series of case studies, we illustrate how useful this contextual information is. It can be used to build a large and challenging labelled face dataset with no manual intervention. With very small amounts of manual labor, contextual data can be used together with image data to identify pictures of animals. In fact, these contextual data are sufficiently reliable that a very large pool of noisily tagged images can be used as a resource to build image features, which reliably improve on conventional visual features. By seeing the Internet as a marketplace that can connect sellers of annotation services to researchers, we can obtain accurately annotated datasets quickly and cheaply. We describe methods to prepare data, check quality, and set prices for work for this annotation process. The problems posed by attempting to collect very big research datasets are fertile for researchers because collecting datasets requires us to focus on two important questions: What makes a good picture? What is the meaning of a picture?

35 citations


01 Jan 2010
TL;DR: Methods to prepare data, check quality, and set prices for work for this annotation process are described, which can be used to build a large and challenging labelled face dataset with no manual intervention.
Abstract: Modern computer vision research consumes labelled data in quantity, and building datasets has become an important activity. The Internet has become a tremendous resource for computer vision researchers. By seeing the Internet as a vast, slightly disorganized collection of visual data, we can build datasets. The key point is that visual data are surrounded by contextual information like text and HTML tags, which is a strong, if noisy, cue to what the visual data means. In a series of case studies, we illustrate how useful this contextual information is. It can be used to build a large and challenging labelled face dataset with no manual intervention. With very small amounts of manual labor, contextual data can be used together with image data to identify pictures of animals. In fact, these contextual data are sufficiently reliable that a very large pool of noisily tagged images can be used as a resource to build image features, which reliably improve on conventional visual features. By seeing the Internet as a marketplace that can connect sellers of annotation services to researchers, we can obtain accurately annotated datasets quickly and cheaply. We describe methods to prepare data, check quality, and set prices for work for this annotation process. The problems posed by attempting to collect very big research datasets are fertile for researchers because collecting datasets requires us to focus on two important questions: What makes a good picture? What is the meaning of a picture?

1 citations