scispace - formally typeset
Search or ask a question

Showing papers by "Paul A. Viola published in 2008"


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel method that simultaneously performs feature selection and feature extraction is proposed, which is computationally much more efficient than previous approaches, without sacrificing accuracy.
Abstract: In computer vision, the bag-of-visual words image representation has been shown to yield good results. Recent work has shown that modeling the spatial relationship between visual words further improves performance. Previous work extracts higher-order spatial features exhaustively. However, these spatial features are expensive to compute. We propose a novel method that simultaneously performs feature selection and feature extraction. Higher-order spatial features are progressively extracted based on selected lower order ones, thereby avoiding exhaustive computation. The method can be based on any additive feature selection algorithm such as boosting. Experimental results show that the method is computationally much more efficient than previous approaches, without sacrificing accuracy.

161 citations


Patent
16 Sep 2008
TL;DR: In this article, a personal broadcast server system provides a customized broadcast to one or more users over a transmission media, where a data storage device stores a plurality of broadcast elements and a data management system stores a user profile and a user state for each of the users.
Abstract: A personal broadcast server system provides a customized broadcast to one or more users over a transmission media. A data storage device stores a plurality of broadcast elements. A data management system stores a user profile and a user state for each of the one or more users and also stores information associated with each of the plurality of broadcast elements. A broadcast element selector, having at least one broadcast element selector function, selects broadcast elements from the data storage device based on information contained in the data management system. A broadcast server receives the selected broadcast elements from the data storage device and provides the selected broadcast elements to a user over the transmission media. The personal broadcast server system may provide streaming audio, streaming video, or other forms of broadcast signals.

52 citations


Journal ArticleDOI
TL;DR: The challenges met while designing a speaker detector for the Microsoft RoundTable distributed meeting device are presented, and a novel boosting-based multimodal speaker detection (BMSD) algorithm is proposed that reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%.
Abstract: Identifying the active speaker in a video of a distributed meeting can be very helpful for remote participants to understand the dynamics of the meeting. A straightforward application of such analysis is to stream a high resolution video of the speaker to the remote participants. In this paper, we present the challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and propose a novel boosting-based multimodal speaker detection (BMSD) algorithm. Instead of separately performing sound source localization (SSL) and multiperson detection (MPD) and subsequently fusing their individual results, the proposed algorithm fuses audio and visual information at feature level by using boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. In experiments that includes hundreds of real-world meetings, the proposed BMSD algorithm reduces the error rate of SSL-only approach by 24.6%, and the SSL and MPD fusion approach by 20.9%. To the best of our knowledge, this is the first real-time multimodal speaker detection algorithm that is deployed in commercial products.

46 citations


Patent
Gang Hua1, Paul A. Viola1, David Liu1
25 Apr 2008
TL;DR: In this article, the first-order image features are selected for image classification from an image feature pool, initially populated with pre-extracted first order image features, which are paired with previously selected firstorder classifying features to generate higher-order features.
Abstract: Image feature selection and extraction (e.g., for image classifier training) is accomplished in an integrated manner, such that higher-order features are merely developed from first-order features selected for image classification. That is, first-order image features are selected for image classification from an image feature pool, initially populated with pre-extracted first-order image features. The selected first-order classifying features are paired with previously selected first-order classifying features to generate higher-order features. The higher-order features are placed into the image feature pool as they are developed or “on-the-fly” (e.g., for use in image classifier training).

14 citations