Search or ask a question

Showing papers by "Aditya Khosla published in 2011"

PDF

Open Access

Proceedings Article•

[...]

Jiquan Ngiam¹, Aditya Khosla¹, Mingyu Kim¹, Juhan Nam¹, Honglak Lee², Andrew Y. Ng¹ - Show less +2 more•Institutions (2)

Stanford University¹, University of Michigan²

28 Jun 2011

TL;DR: This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.

...read moreread less

Abstract: Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning.

...read moreread less

2,830 citations

Proceedings Article•DOI•

Human action recognition by learning bases of action attributes and parts

[...]

Bangpeng Yao¹, Xiaoye Jiang¹, Aditya Khosla¹, Andy Lai Lin¹, Leonidas J. Guibas¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Stanford University¹

06 Nov 2011

TL;DR: This work proposes to use attributes and parts for recognizing human actions in still images by learning a set of sparse bases that are shown to carry much semantic meaning, and shows that this dual sparsity provides theoretical guarantee of the bases learning and feature reconstruction approach.

...read moreread less

Abstract: In this work, we propose to use attributes and parts for recognizing human actions in still images. We define action attributes as the verbs that describe the properties of human actions, while the parts of actions are objects and poselets that are closely related to the actions. We jointly model the attributes and parts by learning a set of sparse bases that are shown to carry much semantic meaning. Then, the attributes and parts of an action image can be reconstructed from sparse coefficients with respect to the learned bases. This dual sparsity provides theoretical guarantee of our bases learning and feature reconstruction approach. On the PASCAL action dataset and a new “Stanford 40 Actions” dataset, we show that our method extracts meaningful high-order interactions between attributes and parts in human actions while achieving state-of-the-art classification performance.

...read moreread less

662 citations

Proceedings Article•DOI•

Combining randomization and discrimination for fine-grained image categorization

[...]

Bangpeng Yao¹, Aditya Khosla¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

20 Jun 2011

TL;DR: Results show that the proposed random forest with discriminative decision trees algorithm identifies semantically meaningful visual information and outperforms state-of-the-art algorithms on various datasets.

...read moreread less

Abstract: In this paper, we study the problem of fine-grained image categorization. The goal of our method is to explore fine image statistics and identify the discriminative image patches for recognition. We achieve this goal by combining two ideas, discriminative feature mining and randomization. Discriminative feature mining allows us to model the detailed information that distinguishes different classes of images, while randomization allows us to handle the huge feature space and prevents over-fitting. We propose a random forest with discriminative decision trees algorithm, where every tree node is a discriminative classifier that is trained by combining the information in this node as well as all upstream nodes. Our method is tested on both subordinate categorization and activity recognition datasets. Experimental results show that our method identifies semantically meaningful visual information and outperforms state-of-the-art algorithms on various datasets.

...read moreread less

297 citations