scispace - formally typeset
Search or ask a question

Showing papers by "Ali Farhadi published in 2011"


Proceedings ArticleDOI
20 Jun 2011
TL;DR: It is shown that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects.
Abstract: In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

518 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work presents a method to analyze daily activities using video from an egocentric camera, and shows that joint modeling of activities, actions, and objects leads to superior performance in comparison to the case where they are considered independently.
Abstract: We present a method to analyze daily activities, such as meal preparation, using video from an egocentric camera. Our method performs inference about activities, actions, hands, and objects. Daily activities are a challenging domain for activity recognition which are well-suited to an egocentric approach. In contrast to previous activity recognition methods, our approach does not require pre-trained detectors for objects and hands. Instead we demonstrate the ability to learn a hierarchical model of an activity by exploiting the consistent appearance of objects, hands, and actions that results from the egocentric context. We show that joint modeling of activities, actions, and objects leads to superior performance in comparison to the case where they are considered independently. We introduce a novel representation of actions based on object-hand interactions and experimentally demonstrate the superior performance of our representation in comparison to standard activity representations such as bag of words.

461 citations


Proceedings Article
01 Feb 2011
TL;DR: This paper proposes CUSP, a new technique based on machine learning that uses a trusted initial set of signal propagation data in a region as input to build a classifier using Support Vector Machines, subsequently used to detect integrity violations.
Abstract: The emerging paradigm for using the wireless spectrum more efficiently is based on enabling secondary users to exploit white-space frequencies that are not occupied by primary users. A key enabling technology for forming networks over white spaces is distributed spectrum measurements to identify and assess the quality of unused channels. This spectrum availability data is often aggregated at a central base station or database to govern the usage of spectrum. This process is vulnerable to integrity violations if the devices are malicious and misreport spectrum sensing results. In this paper we propose CUSP, a new technique based on machine learning that uses a trusted initial set of signal propagation data in a region as input to build a classifier using Support Vector Machines. The classifier is subsequently used to detect integrity violations. Using classification eliminates the need for arbitrary assumptions about signal propagation models and parameters or thresholds in favor of direct training data. Extensive evaluations using TV transmitter data from the FCC, terrain data from NASA, and house density data from the US Census Bureau for areas in Illinois and Pennsylvania show that our technique is effective against attackers of varying sophistication, while accommodating for regional terrain and shadowing diversity. 1

50 citations


Dissertation
01 Jan 2011
TL;DR: This thesis shows that including visual phrases in the vocabulary of recognition results in significant improvements in recognition, and introduces visual attributes, elements of recognition that correspond to a chunk of meaning bigger than objects and smaller than scenes.
Abstract: Recognition is a deep and fundamental question in computer vision. If approached correctly, object recognition provides insight to several interesting problems with crucial applications. In a typical setting, recognition is defined as the problem of learning about a fixed set of categories from training examples provided for those categories. At test time, then the problem is to which of those learned categories a test image belongs. This thesis tries to question the typical settings of recognition and shows remarkable achievements as a result of shifting our point of view to fundamentals of recognition. In current settings, the final goal of recognition systems is to predict a list of category name tags for images. But there is more to recognition that a list of category names. Images exhibit a great deal of information that cannot be conveyed with a list of name tags. The main focus of this thesis is to produce richer descriptions for images. Inspired by how human describe images, our goal is to describe images with sentences. This thesis introduces a non-parametric approach for describing images with sentences that produces promising results. Exploring the idea of describing images with sentences raises deep and interesting concerns in recognition: how to deal with unfamiliar objects, how to describe objects, and how to recognize complex composites of objects. This thesis introduces visual attributes and shows how the attribute-based recognition can reason about unfamiliar objects. The attribute-based recognition also allows description of objects, the reporting of unusual properties of familiar objects, and learning about novel categories with few or even no visual training examples (from pure textual descriptions of categories). Analogous to phrases in machine translation, this thesis also introduces visual phrases; elements of recognition that correspond to a chunk of meaning bigger than objects and smaller than scenes. Visual phrases exhibit such a characteristic appearance that makes detecting them as one entity much simpler and significantly more accurate than detecting the participating objects. This thesis shows that including visual phrases in the vocabulary of recognition results in significant improvements in recognition. The work presented in this thesis tries to provide insight to deep and yet basic questions in recognition: What should we recognize? At what level should we recognize entities? What does learning about some objects reveal about other objects? What should we say when an unfamiliar object is presented? How can we learn to predict deviations from typicalities in categories? What should be the output of a recognition system? And what is the quantum of recognition?

1 citations