Showing papers by "Paul A. Viola published in 2007"

PDF

Open Access

Proceedings Article•

Multiple-Instance Pruning For Learning Efficient Cascade Detectors

[...]

Cha Zhang¹, Paul A. Viola¹•Institutions (1)

03 Dec 2007

TL;DR: The multiple instance pruning (MIP) algorithm for soft cascades is proposed, which computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset.

...read moreread less

Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection Driven by this success, cascade learning has been an area of active research in recent years Nevertheless, there are still challenging technical problems during the training process of cascade detectors In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets Experimental results on the MIT+CMU dataset demonstrate significant performance advantages

...read moreread less

183 citations

Proceedings Article•DOI•

Face Recognition using Discriminatively Trained Orthogonal Rank One Tensor Projections

[...]

Gang Hua¹, Paul A. Viola¹, Steven M. Drucker¹•Institutions (1)

Microsoft¹

17 Jun 2007

TL;DR: A novel and effective technique is proposed to ensure that the rank one tensor projections are orthogonal to one another, which provides a strong inductive bias and result in better generalization on small training sets.

...read moreread less

Abstract: We propose a method for face recognition based on a discriminative linear projection. In this formulation images are treated as tensors, rather than the more conventional vector of pixels. Projections are pursued sequentially and take the form of a rank one tensor, i.e., a tensor which is the outer product of a set of vectors. A novel and effective technique is proposed to ensure that the rank one tensor projections are orthogonal to one another. These constraints on the tensor projections provide a strong inductive bias and result in better generalization on small training sets. Our work is related to spectrum methods, which achieve orthogonal rank one projections by pursuing consecutive projections in the complement space of previous projections. Although this may be meaningful for applications such as reconstruction, it is less meaningful for pursuing discriminant projections. Our new scheme iteratively solves an eigenvalue problem with orthogonality constraints on one dimension, and solves unconstrained eigenvalue problems on the other dimensions. Experiments demonstrate that on small and medium sized face recognition datasets, this approach outperforms previous embedding methods. On large face datasets this approach achieves results comparable with the best, often using fewer discriminant projections.

...read moreread less

57 citations

Patent•

Learning classifiers using combined boosting and weight trimming

[...]

Cha Zhang¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

13 Jul 2007

TL;DR: In this paper, a combination classifier and intermediate rejection threshold are learned using a pruning process, which ensures that objects detected by the original classifier are also detected by another classifier, thereby guaranteeing the same detection rate on the training set after pruning.

...read moreread less

Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system. In additional embodiments, combination classifiers are trained using various combinations of weight trimming, bootstrapping, and a weak classifier termed a “fat stump” classifier.

...read moreread less

37 citations

Patent•

Identification of people using multiple types of input

[...]

Cha Zhang¹, Paul A. Viola¹, Yin Pei¹, Ross Cutler¹, Xinding Sun¹, Yong Rui¹ - Show less +2 more•Institutions (1)

Microsoft¹

13 Feb 2007

TL;DR: In this paper, a pool of features including more than one type of input (like audio input and video input) was identified and used with a learning algorithm to generate a classifier that identifies people or speakers.

...read moreread less

Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

...read moreread less

32 citations

Patent•

Credit-based peer-to-peer storage

[...]

Patrice Y. Simard¹, Paul A. Viola¹, Jin Li¹•Institutions (1)

Microsoft¹

25 Jun 2007

TL;DR: In this article, a distributed computing devices comprising a system for sharing computing resources can provide shared computing resources to users having sufficient resource credits, where a user can earn resource credits by reliably offering a computing resource for sharing for a predetermined amount of time.

...read moreread less

Abstract: Distributed computing devices comprising a system for sharing computing resources can provide shared computing resources to users having sufficient resource credits. A user can earn resource credits by reliably offering a computing resource for sharing for a predetermined amount of time. The conversion rate between the amount of credits awarded, and the computing resources provided by a user can be varied to maintain balance within the system, and to foster beneficial user behavior. Once earned, the credits can be used to fund the user's account, joint accounts which include the user and others, or others' accounts that do not provide any access to the user. Computing resources can be exchanged on a peer-to-peer basis, though a centralized mechanism can link relevant peers together. To verify integrity, and protect against maliciousness, offered resources can be periodically tested.

...read moreread less

25 citations

Patent•

Fast Landmark Detection Using Regression Methods

[...]

Cha Zhang¹, Paul A. Viola¹, Sang Min Oh¹•Institutions (1)

Microsoft¹

06 Feb 2007

TL;DR: In this article, a landmark detection technique that can quickly detect both objects of interest and landmarks within the objects in an image using regression methods was proposed, which reuses existing feature values used for object detection to find the landmarks in an object (e.g., the eyes and mouth of the face).

...read moreread less

Abstract: A landmark detection technique that can quickly detect both objects of interest and landmarks within the objects in an image using regression methods. The present fast landmark detection scheme reuses existing feature values used for object detection (e.g., face detection) to find the landmarks in an object (e.g., the eyes and mouth of the face). Hence, the technique provides landmark detection functionality at almost no cost.

...read moreread less

18 citations

Patent•

Image organization based on image content

[...]

Gang Hua¹, Steven M. Drucker¹, Michael Revow¹, Paul A. Viola¹, Richard Zemel¹ - Show less +1 more•Institutions (1)

Microsoft¹

15 Mar 2007

TL;DR: In this paper, a comparison component computes similarity confidence data between the extracted visual information (e.g., faces, scenes, etc.) and the comparison component then generates a visual distribution based upon the similarity confidence.

...read moreread less

Abstract: A system for organizing images includes an extraction component that extracts visual information (e.g., faces, scenes, etc.) from the images. The extracted visual information is provided to a comparison component which computes similarity confidence data between the extracted visual information. The similarity confidence data is an indication of the likelihood that items of extracted visual information are similar. The comparison component then generates a visual distribution of the extracted visual information based upon the similarity confidence data. The visual distribution can include groupings of the extracted visual information based on computed similarity confidence data. For example, the visual distribution can be a two-dimensional layout of faces organized based on the computed similarity confidence data—with faces in closer proximity faces computed to have a greater probability of representing the same person. The visual distribution can then be utilized by a user to sort, organize and/or tag images.

...read moreread less

16 citations

Patent•

Face recognition using discriminatively trained orthogonal tensor projections

[...]

Gang Hua¹, Paul A. Viola¹, Steven M. Drucker¹, Michael Revow¹•Institutions (1)

Microsoft¹

15 Jun 2007

TL;DR: In this paper, a discriminatively trained orthogonal rank one tensor projections are used for face recognition, which minimizes intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people.

...read moreread less

Abstract: Systems and methods are described for face recognition using discriminatively trained orthogonal rank one tensor projections. In an exemplary system, images are treated as tensors, rather than as conventional vectors of pixels. During runtime, the system designs visual features—embodied as tensor projections—that minimize intraclass differences between instances of the same face while maximizing interclass differences between the face and faces of different people. Tensor projections are pursued sequentially over a training set of images and take the form of a rank one tensor, i.e., the outer product of a set of vectors. An exemplary technique ensures that the tensor projections are orthogonal to one another, thereby increasing ability to generalize and discriminate image features over conventional techniques. Orthogonality among tensor projections is maintained by iteratively solving an ortho-constrained eigenvalue problem in one dimension of a tensor while solving unconstrained eigenvalue problems in additional dimensions of the tensor.

...read moreread less

14 citations

Patent•

Histogram-based classifiers having variable bin sizes

[...]

Cha Zhang¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

13 Jul 2007

...read moreread less

13 citations

Proceedings Article•DOI•

Learning to Group Text Lines and Regions in Freeform Handwritten Notes

[...]

Ming Ye¹, Paul A. Viola¹, Sashi Raghupathy¹, Herry Sutanto¹, Chengyang Li¹ - Show less +1 more•Institutions (1)

Microsoft¹

23 Sep 2007

TL;DR: This paper proposes a machine learning approach to grouping problems in ink parsing, where hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features.

...read moreread less

Abstract: This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

...read moreread less

11 citations

Patent•

Recognition of mathematical expressions

[...]

Goran Predovic¹, Ahmad Abdulkader¹, Bodin Dresevic¹, Paul A. Viola¹, Milan Vukosavljevic¹ - Show less +1 more•Institutions (1)

Microsoft¹

19 Apr 2007

TL;DR: In this article, a user may input strokes as digital ink to a processing device and the processing device may partition the input strokes into multiple regions of strokes and then convert the scores to a converted score which may have at least a near standard normal distribution.

...read moreread less

Abstract: In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.

...read moreread less

Proceedings Article•

Learning A* underestimates: Using inference to guide inference

[...]

Gregory Druck¹, Mukund Narasimhan, Paul A. Viola•Institutions (1)

University of Massachusetts Amherst¹

11 Mar 2007

TL;DR: This technique resolves one of the biggest obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function, and results in a algorithm that is more than 3 times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.

...read moreread less

Abstract: We present a technique for speeding up inference of structured variables using a prioritydriven search algorithm rather than the more conventional dynamic programing. A priority-driven search algorithm is guaranteed to return the optimal answer if the priority function is an underestimate of the true cost function. We introduce the notion of a probable approximate underestimate, and show that it can be used to compute a probable approximate solution to the inference problem when used as a priority function. We show that we can learn probable approximate underestimate functions which have the functional form of simpler, easy to decode models. These models can be learned from unlabeled data by solving a linear/quadratic optimization problem. As a result, we get a priority function that can be computed quickly, and results in solutions that are (provably) almost optimal most of the time. Using these ideas, discriminative classifiers such as semi-Markov CRFs and discriminative parsers can be sped up using a generalization of the A* algorithm. Further, this technique resolves one of the biggest obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function. Applying this technique results in a algorithm that is more than 3 times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.

...read moreread less

Patent•

Learning A* priority function from unlabeled data

[...]

Mukund Narasimhan¹, Paul A. Viola¹, Gregory Druck¹•Institutions (1)

Microsoft¹

10 Apr 2007

TL;DR: In this paper, a technique for increasing efficiency of inference of structure variables (e.g., an inference problem) using a priority-driven algorithm rather than conventional dynamic programming was proposed.

...read moreread less

Abstract: A technique for increasing efficiency of inference of structure variables (e.g., an inference problem) using a priority-driven algorithm rather than conventional dynamic programming. The technique employs a probable approximate underestimate which can be used to compute a probable approximate solution to the inference problem when used as a priority function (“a probable approximate underestimate function”) for a more computationally complex classification function. The probable approximate underestimate function can have a functional form of a simpler, easier to decode model. The model can be learned from unlabeled data by solving a linear/quadratic optimization problem. The priority function can be computed quickly, and can result in solutions that are substantially optimal. Using the priority function, computation efficiency of a classification function (e.g., discriminative classifier) can be increased using a generalization of the A* algorithm.

...read moreread less

Patent•

Identifikation von personen mithilfe mehrerer eingabearten

[...]

Cha Zhang, Paul A. Viola, Pei Yin, Ross Cutler, Xinding Sun, Yong Rui - Show less +2 more

13 Feb 2007

Patent•

Identification de personnes au moyen de types d'entrées multiples

[...]

Cha Zhang, Paul A. Viola, Pei Yin, Ross Cutler, Xinding Sun, Yong Rui - Show less +2 more

13 Feb 2007

TL;DR: In this paper, the authors propose an approach to detect personnes or locuteurs in a mode automatique using a group of caracteristiques comprenant plus d'un type d'entree (comme une entree audio and a entree video).

...read moreread less

Abstract: L'invention concerne des systemes et des procedes de detection de personnes ou de locuteurs en mode automatique. Un groupe de caracteristiques comprenant plus d'un type d'entree (comme une entree audio et une entree video) peut etre identifie et utilise avec un algorithme d'apprentissage pour generer un classificateur qui identifie des personnes ou des locuteurs. Le classificateur resultant peut etre evalue pour detecter des personnes ou des locuteurs.

...read moreread less