scispace - formally typeset
Search or ask a question

Showing papers by "Paul A. Viola published in 2004"


Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations


Journal ArticleDOI
TL;DR: This work proposes a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure and shows results on a wide variety of image queries.
Abstract: We present an approach for image retrieval using a very large number of highly selective features and efficient learning of queries. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes” and that images which are visually similar share causes. We propose a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure (in our implementation there are over 46,000 highly selective features). At query time a user selects a few example images, and the AdaBoost algorithm is used to learn a classification function which depends on a small number of the most appropriate features. This yields a highly efficient classification function. In addition we show that the AdaBoost framework provides a natural mechanism for the incorporation of relevance feedback. Finally we show results on a wide variety of image queries.

419 citations


Patent
24 Mar 2004
TL;DR: In this paper, a computer-implemented method and apparatus are provided for populating an electronic form from an electronic image, identifying a size, orientation and position of an object within the electronic image and identifying information elements from pixels within the image that correspond to the object.
Abstract: A computer-implemented method and apparatus are provided for populating an electronic form from an electronic image. The method and apparatus identify a size, orientation and position of an object within the electronic image, and identify information elements from pixels within the image that correspond to the object. Fields of the electronic form are displayed to a user along with the identified information elements through a graphical user interface. The information elements are parsed into tagged groups of different information types. At least some of the fields of the electronic form are populated with the tagged groups to produce a populated form. The user is allowed to edit the populated fields through the graphical user interface.

204 citations


Proceedings Article
25 Jul 2004
TL;DR: This work applies a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted.
Abstract: Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have achieved low error rates but invariably make a number of errors. The goal of an interactive information extraction system is to assist the user in filling in database fields while giving the user confidence in the integrity of the data. The user is presented with an interactive interface that allows both the rapid verification of automatic field assignments and the correction of errors. In cases where there are multiple errors, our system takes into account user corrections, and immediately propagates these constraints such that other fields are often corrected automatically. Linear-chain conditional random fields (CRFs) have been shown to perform well for information extraction and other language modelling tasks due to their ability to capture arbitrary, overlapping features of the input in a Markov model. We apply this framework with two extensions: a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted. Both of these mechanisms are incorporated in a novel user interface for form filling that is intuitive and speeds the entry of data--providing a 23% reduction in error due to automated corrections.

176 citations


Patent
20 May 2004
TL;DR: This paper proposed a global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process.
Abstract: A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.

123 citations


Proceedings ArticleDOI
08 Aug 2004
TL;DR: This research presents a low cost camera-based system that allows an untrained user to “drive” the dance of the couple interactively, and yields results for more complex behaviors and longer sequences than have been demonstrated in previous silhouette-based systems.
Abstract: We present a vision-based performance interface for controlling animated human characters. The system interactively combines information about the user's motion contained in silhouettes from three viewpoints with domain knowledge contained in a motion capture database to produce an animation of high quality. Such an interactive system might be useful for authoring, for teleconferencing, or as a control interface for a character in a game. In our implementation, the user performs in front of three video cameras; the resulting silhouettes are used to estimate his orientation and body configuration based on a set of discriminative local features. Those features are selected by a machine-learning algorithm during a preprocessing step. Sequences of motions that approximate the user's actions are extracted from the motion database and scaled in time to match the speed of the user's motion. We use swing dancing, a complex human motion, to demonstrate the effectiveness of our approach. We compare our results to those obtained with a set of global features, Hu moments, and ground truth measurements from a motion capture system.

109 citations


Proceedings ArticleDOI
30 Aug 2004
TL;DR: A framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams that can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers is presented.
Abstract: We present a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The approach is completely spatial, that is it does not require any ordering on the strokes. It also does not place any constraint on the relative placement of the shapes or symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative classifier is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This classifier combines the rendered image of the strokes with stroke features such as curvature and endpoints. A small subset of very efficient features is selected, yielding an extremely fast classifier. An A-star search algorithm over connected subsets of the proximity graph is used to simultaneously find the optimal segmentation and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers.

105 citations


Patent
29 Apr 2004
TL;DR: In this article, features and/or properties of words are identified from a set of training documents to aid in extracting information from documents to be processed, and a classifier is developed to express these features and or properties.
Abstract: The present invention relates generally to automatically processing electronic documents. In one aspect, features and/or properties of words are identified from a set of training documents to aid in extracting information from documents to be processed. The features and/or properties relate to text of the words, position of the words and the relationship to other words. A classifier is developed to express these features and/or properties. During information extraction, documents are processed and analyzed based on the classifier and information is extracted based on correspondence of the documents and the features/properties expressed by the classifier.

101 citations


Proceedings ArticleDOI
26 Oct 2004
TL;DR: A framework for grouping and recognition of characters and symbols in online free-form ink expressions that can achieve 94% grouping/recognition accuracy on a test dataset containing symbols from 25 writers held out from the training process.
Abstract: We present a framework for grouping and recognition of characters and symbols in online free-form ink expressions. The approach is completely spatial; it does not require any ordering on the strokes. It also does not place any constraints on the layout of the symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative recognizer is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This recognizer operates on the rendered image of the strokes plus stroke features such as curvature and endpoints. A small subset of very efficient image features is selected, yielding an extremely fast recognizer. Dynamic programming over connected subsets of the proximity graph is used to simultaneously find the optimal grouping and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 94% grouping/recognition accuracy on a test dataset containing symbols from 25 writers held out from the training process.

42 citations


Proceedings ArticleDOI
Ming Ye1, Paul A. Viola1
26 Oct 2004
TL;DR: A system is presented, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure, which provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.
Abstract: Handwritten notes are complex structures, which include blocks of text, drawings, and annotations. The main challenge for the newly emerging tablet computer is to provide high-level tools for editing and authoring handwritten documents using a natural interface. One frequent component of natural notes are lists and hierarchical outlines, which correspond directly to the bulleted lists and itemized structures in conventional text, editing tools. We present a system, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure. This inferred structure provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.

13 citations


Book ChapterDOI
08 Sep 2004
TL;DR: A system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias by combining the quality of the matches and the relevance of the words.
Abstract: We present a system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias. The system first performs optical character recognition to find words and in some cases parts of words (we have observed error rates as high as 10 to 20 percent). For all these “noisy” words, a set of features is computed which include internal text features, location features, and relationship features. These features are combined to estimate the relevance of the word in the context of the page and the recipient database. The parameters of the word relevance function are learned from training data using the AdaBoost learning algorithm. Words are then compared to the database of recipients to find likely matches. The recipients are finally ranked by combining the quality of the matches and the relevance of the words. Experiments are presented which demonstrate the effectiveness of this system on a large set of real data.

Patent
09 Sep 2004
TL;DR: In this article, an image is first partitioned into varisized patches by use of either an integral image or a Gaussian pyramid, and features in each patch are evaluated to determine a cumulative score.
Abstract: PROBLEM TO BE SOLVED: To provide a method for detecting an object, such as a human face, in an image. SOLUTION: An image is first partitioned into varisized patches by use of either an integral image or a Gaussian pyramid. Features in each patch are evaluated to determine a cumulative score. The evaluation is repeated as long as the cumulative score is within the range between an acceptance threshold and a rejection threshold, or otherwise the image is rejected when the cumulative score is smaller than the rejection threshold while the image is accepted as the one including the object when the cumulative score is greater than the acceptance threshold. COPYRIGHT: (C)2004,JPO&NCIPI

Patent
23 Dec 2004
TL;DR: In this article, an orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation.
Abstract: A method for detects a specific object in an image. An orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation. The arbitrary object is classified as a specific object with the selected orientation and object specific classifier.

Patent
Paul A. Viola1, Michael Jones1
31 May 2004
TL;DR: In this paper, a linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window, which are summed to determine a cumulative score.
Abstract: A method detects a moving object in a temporal sequence of images. Images are selected from the temporally ordered sequence of images. A set of functions is applied to the selected images to generate a set of combined images. A linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window. The motion and appearance features are summed to determine a cumulative score, which enables a classification of the detection window as including the moving object.

Patent
30 Apr 2004
TL;DR: In this article, a probe image is paired with each gallery image to generate pairs of images, and first filters are applied to the probe image of each pair to obtain a first feature value for each application of each filter.
Abstract: PROBLEM TO BE SOLVED: To provide a method for recognizing an object in an image. SOLUTION: Gallery images 121 include identified objects, and a probe image includes an unidentified object. The probe image is paired with each gallery image to generate pairs of images. First filters are applied to the probe image of each pair to obtain a first feature value for each application of each filter. Second filters are similarly applied to the each gallery image of each pair to obtain a second feature value. The feature values are summed for each application of each filter, and for each application, a score is set to an acceptance weight if the sum is greater than a predetermined threshold, and otherwise setting the score to a rejection weight. The scores are summed for all of the application, and the probe image is identified with any gallery image when the score is greater than zero. COPYRIGHT: (C)2004,JPO

Patent
11 Jun 2004
TL;DR: In this article, an orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation.
Abstract: A method for detects a specific object in an image. An orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation. The arbitrary object is classified as a specific object with the selected orientation and object specific classifier.

Patent
11 Jun 2004
TL;DR: In this paper, a procedure for detecting an object in an image is described, which consiste in detecting the object in the image and selecting the orientation of the object according to a plan d'image.
Abstract: L'invention concerne un procede qui consiste a detecter un objet dans une image. L'orientation d'un objet arbitraire par rapport a un plan d'image est determinee et un classificateur parmi plusieurs classificateurs specifiques d'objet et d'orientation est selectionne selon l'orientation. L'objet arbitraire est classe en tant qu'objet specifique a l'aide du classificateur specifique d'objet et d'orientation selectionne.