Search or ask a question

Showing papers by "Paul A. Viola published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Integrated feature selection and higher-order spatial feature extraction for object categorization

[...]

David Liu¹, Gang Hua², Paul A. Viola², Tsuhan Chen¹•Institutions (2)

Carnegie Mellon University¹, Microsoft²

23 Jun 2008

TL;DR: A novel method that simultaneously performs feature selection and feature extraction is proposed, which is computationally much more efficient than previous approaches, without sacrificing accuracy.

...read moreread less

Abstract: In computer vision, the bag-of-visual words image representation has been shown to yield good results. Recent work has shown that modeling the spatial relationship between visual words further improves performance. Previous work extracts higher-order spatial features exhaustively. However, these spatial features are expensive to compute. We propose a novel method that simultaneously performs feature selection and feature extraction. Higher-order spatial features are progressively extracted based on selected lower order ones, thereby avoiding exhaustive computation. The method can be based on any additive feature selection algorithm such as boosting. Experimental results show that the method is computationally much more efficient than previous approaches, without sacrificing accuracy.

...read moreread less

161 citations

Patent•

Personal broadcast server system for providing a customized broadcast

[...]

Jeremy S. De Bonet¹, Paul A. Viola¹•Institutions (1)

Wilmington University¹

16 Sep 2008

TL;DR: In this article, a personal broadcast server system provides a customized broadcast to one or more users over a transmission media, where a data storage device stores a plurality of broadcast elements and a data management system stores a user profile and a user state for each of the users.

...read moreread less

Abstract: A personal broadcast server system provides a customized broadcast to one or more users over a transmission media. A data storage device stores a plurality of broadcast elements. A data management system stores a user profile and a user state for each of the one or more users and also stores information associated with each of the plurality of broadcast elements. A broadcast element selector, having at least one broadcast element selector function, selects broadcast elements from the data storage device based on information contained in the data management system. A broadcast server receives the selected broadcast elements from the data storage device and provides the selected broadcast elements to a user over the transmission media. The personal broadcast server system may provide streaming audio, streaming video, or other forms of broadcast signals.

...read moreread less

52 citations

Journal Article•DOI•

Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos

[...]

Cha Zhang¹, Pei Yin², Yong Rui¹, Ross Cutler¹, Paul A. Viola¹, Xinding Sun¹, N. Pinto¹, Zhengyou Zhang¹ - Show less +4 more•Institutions (2)

Microsoft¹, Georgia Institute of Technology²

01 Dec 2008-IEEE Transactions on Multimedia

TL;DR: The challenges met while designing a speaker detector for the Microsoft RoundTable distributed meeting device are presented, and a novel boosting-based multimodal speaker detection (BMSD) algorithm is proposed that reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%.

...read moreread less

Abstract: Identifying the active speaker in a video of a distributed meeting can be very helpful for remote participants to understand the dynamics of the meeting. A straightforward application of such analysis is to stream a high resolution video of the speaker to the remote participants. In this paper, we present the challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and propose a novel boosting-based multimodal speaker detection (BMSD) algorithm. Instead of separately performing sound source localization (SSL) and multiperson detection (MPD) and subsequently fusing their individual results, the proposed algorithm fuses audio and visual information at feature level by using boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. In experiments that includes hundreds of real-world meetings, the proposed BMSD algorithm reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%. To the best of our knowledge, this is the first real-time multimodal speaker detection algorithm that is deployed in commercial products.

...read moreread less

46 citations

Patent•

Feature selection and extraction

[...]

Gang Hua¹, Paul A. Viola¹, David Liu¹•Institutions (1)

Microsoft¹

25 Apr 2008

TL;DR: In this article, the first-order image features are selected for image classification from an image feature pool, initially populated with pre-extracted first order image features, which are paired with previously selected firstorder classifying features to generate higher-order features.

...read moreread less

Abstract: Image feature selection and extraction (e.g., for image classifier training) is accomplished in an integrated manner, such that higher-order features are merely developed from first-order features selected for image classification. That is, first-order image features are selected for image classification from an image feature pool, initially populated with pre-extracted first-order image features. The selected first-order classifying features are paired with previously selected first-order classifying features to generate higher-order features. The higher-order features are placed into the image feature pool as they are developed or “on-the-fly” (e.g., for use in image classifier training).

...read moreread less

14 citations