scispace - formally typeset
Search or ask a question

Showing papers by "Paul A. Viola published in 2007"


Proceedings Article
Cha Zhang1, Paul A. Viola1
03 Dec 2007
TL;DR: The multiple instance pruning (MIP) algorithm for soft cascades is proposed, which computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset.
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection Driven by this success, cascade learning has been an area of active research in recent years Nevertheless, there are still challenging technical problems during the training process of cascade detectors In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets Experimental results on the MIT+CMU dataset demonstrate significant performance advantages

183 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel and effective technique is proposed to ensure that the rank one tensor projections are orthogonal to one another, which provides a strong inductive bias and result in better generalization on small training sets.
Abstract: We propose a method for face recognition based on a discriminative linear projection. In this formulation images are treated as tensors, rather than the more conventional vector of pixels. Projections are pursued sequentially and take the form of a rank one tensor, i.e., a tensor which is the outer product of a set of vectors. A novel and effective technique is proposed to ensure that the rank one tensor projections are orthogonal to one another. These constraints on the tensor projections provide a strong inductive bias and result in better generalization on small training sets. Our work is related to spectrum methods, which achieve orthogonal rank one projections by pursuing consecutive projections in the complement space of previous projections. Although this may be meaningful for applications such as reconstruction, it is less meaningful for pursuing discriminant projections. Our new scheme iteratively solves an eigenvalue problem with orthogonality constraints on one dimension, and solves unconstrained eigenvalue problems on the other dimensions. Experiments demonstrate that on small and medium sized face recognition datasets, this approach outperforms previous embedding methods. On large face datasets this approach achieves results comparable with the best, often using fewer discriminant projections.

57 citations


Patent
Cha Zhang1, Paul A. Viola1
13 Jul 2007
TL;DR: In this paper, a combination classifier and intermediate rejection threshold are learned using a pruning process, which ensures that objects detected by the original classifier are also detected by another classifier, thereby guaranteeing the same detection rate on the training set after pruning.
Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system. In additional embodiments, combination classifiers are trained using various combinations of weight trimming, bootstrapping, and a weak classifier termed a “fat stump” classifier.

37 citations


Patent
Cha Zhang1, Paul A. Viola1, Yin Pei1, Ross Cutler1, Xinding Sun1, Yong Rui1 
13 Feb 2007
TL;DR: In this paper, a pool of features including more than one type of input (like audio input and video input) was identified and used with a learning algorithm to generate a classifier that identifies people or speakers.
Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

32 citations


Patent
25 Jun 2007
TL;DR: In this article, a distributed computing devices comprising a system for sharing computing resources can provide shared computing resources to users having sufficient resource credits, where a user can earn resource credits by reliably offering a computing resource for sharing for a predetermined amount of time.
Abstract: Distributed computing devices comprising a system for sharing computing resources can provide shared computing resources to users having sufficient resource credits. A user can earn resource credits by reliably offering a computing resource for sharing for a predetermined amount of time. The conversion rate between the amount of credits awarded, and the computing resources provided by a user can be varied to maintain balance within the system, and to foster beneficial user behavior. Once earned, the credits can be used to fund the user's account, joint accounts which include the user and others, or others' accounts that do not provide any access to the user. Computing resources can be exchanged on a peer-to-peer basis, though a centralized mechanism can link relevant peers together. To verify integrity, and protect against maliciousness, offered resources can be periodically tested.

25 citations


Patent
06 Feb 2007
TL;DR: In this article, a landmark detection technique that can quickly detect both objects of interest and landmarks within the objects in an image using regression methods was proposed, which reuses existing feature values used for object detection to find the landmarks in an object (e.g., the eyes and mouth of the face).
Abstract: A landmark detection technique that can quickly detect both objects of interest and landmarks within the objects in an image using regression methods. The present fast landmark detection scheme reuses existing feature values used for object detection (e.g., face detection) to find the landmarks in an object (e.g., the eyes and mouth of the face). Hence, the technique provides landmark detection functionality at almost no cost.

18 citations


Patent
Gang Hua1, Steven M. Drucker1, Michael Revow1, Paul A. Viola1, Richard Zemel1 
15 Mar 2007
TL;DR: In this paper, a comparison component computes similarity confidence data between the extracted visual information (e.g., faces, scenes, etc.) and the comparison component then generates a visual distribution based upon the similarity confidence.
Abstract: A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.

16 citations


Patent
15 Jun 2007
TL;DR: In this paper, a discriminatively trained orthogonal rank one tensor projections are used for face recognition, which minimizes intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people.
Abstract: Systems and methods are described for face recognition using discriminatively trained orthogonal rank one tensor projections. In an exemplary system, images are treated as tensors, rather than as conventional vectors of pixels. During runtime, the system designs visual features—embodied as tensor projections—that minimize intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people. Tensor projections are pursued sequentially over a training set of images and take the form of a rank one tensor, i.e., the outer product of a set of vectors. An exemplary technique ensures that the tensor projections are orthogonal to one another, thereby increasing ability to generalize and discriminate image features over conventional techniques. Orthogonality among tensor projections is maintained by iteratively solving an ortho-constrained eigenvalue problem in one dimension of a tensor while solving unconstrained eigenvalue problems in additional dimensions of the tensor.

14 citations


Patent
Cha Zhang1, Paul A. Viola1
13 Jul 2007
TL;DR: In this paper, a combination classifier and intermediate rejection threshold are learned using a pruning process, which ensures that objects detected by the original classifier are also detected by another classifier, thereby guaranteeing the same detection rate on the training set after pruning.
Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system. In additional embodiments, combination classifiers are trained using various combinations of weight trimming, bootstrapping, and a weak classifier termed a “fat stump” classifier.

13 citations


Proceedings ArticleDOI
Ming Ye1, Paul A. Viola1, Sashi Raghupathy1, Herry Sutanto1, Chengyang Li1 
23 Sep 2007
TL;DR: This paper proposes a machine learning approach to grouping problems in ink parsing, where hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features.
Abstract: This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

11 citations


Patent
19 Apr 2007
TL;DR: In this article, a user may input strokes as digital ink to a processing device and the processing device may partition the input strokes into multiple regions of strokes and then convert the scores to a converted score which may have at least a near standard normal distribution.
Abstract: In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.

Proceedings Article
11 Mar 2007
TL;DR: This technique resolves one of the biggest obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function, and results in a algorithm that is more than 3 times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.
Abstract: We present a technique for speeding up inference of structured variables using a prioritydriven search algorithm rather than the more conventional dynamic programing. A priority-driven search algorithm is guaranteed to return the optimal answer if the priority function is an underestimate of the true cost function. We introduce the notion of a probable approximate underestimate, and show that it can be used to compute a probable approximate solution to the inference problem when used as a priority function. We show that we can learn probable approximate underestimate functions which have the functional form of simpler, easy to decode models. These models can be learned from unlabeled data by solving a linear/quadratic optimization problem. As a result, we get a priority function that can be computed quickly, and results in solutions that are (provably) almost optimal most of the time. Using these ideas, discriminative classifiers such as semi-Markov CRFs and discriminative parsers can be sped up using a generalization of the A* algorithm. Further, this technique resolves one of the biggest obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function. Applying this technique results in a algorithm that is more than 3 times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.

Patent
10 Apr 2007
TL;DR: In this paper, a technique for increasing efficiency of inference of structure variables (e.g., an inference problem) using a priority-driven algorithm rather than conventional dynamic programming was proposed.
Abstract: A technique for increasing efficiency of inference of structure variables (e.g., an inference problem) using a priority-driven algorithm rather than conventional dynamic programming. The technique employs a probable approximate underestimate which can be used to compute a probable approximate solution to the inference problem when used as a priority function (“a probable approximate underestimate function”) for a more computationally complex classification function. The probable approximate underestimate function can have a functional form of a simpler, easier to decode model. The model can be learned from unlabeled data by solving a linear/quadratic optimization problem. The priority function can be computed quickly, and can result in solutions that are substantially optimal. Using the priority function, computation efficiency of a classification function (e.g., discriminative classifier) can be increased using a generalization of the A* algorithm.


Patent
13 Feb 2007
TL;DR: In this paper, the authors propose an approach to detect personnes or locuteurs in a mode automatique using a group of caracteristiques comprenant plus d'un type d'entree (comme une entree audio and a entree video).
Abstract: L'invention concerne des systemes et des procedes de detection de personnes ou de locuteurs en mode automatique. Un groupe de caracteristiques comprenant plus d'un type d'entree (comme une entree audio et une entree video) peut etre identifie et utilise avec un algorithme d'apprentissage pour generer un classificateur qui identifie des personnes ou des locuteurs. Le classificateur resultant peut etre evalue pour detecter des personnes ou des locuteurs.