Showing papers by "Paul A. Viola published in 2004"

PDF

Open Access

Journal Article•DOI•

[...]

Paul A. Viola¹, Michael Jones²•Institutions (2)

01 May 2004-International Journal of Computer Vision

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.

...read moreread less

Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

...read moreread less

13,037 citations

Journal Article•DOI•

Boosting Image Retrieval

[...]

Kinh Tieu¹, Paul A. Viola²•Institutions (2)

Massachusetts Institute of Technology¹, Mitsubishi Electric Research Laboratories²

01 Jan 2004-International Journal of Computer Vision

TL;DR: This work proposes a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure and shows results on a wide variety of image queries.

...read moreread less

Abstract: We present an approach for image retrieval using a very large number of highly selective features and efficient learning of queries. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes” and that images which are visually similar share causes. We propose a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure (in our implementation there are over 46,000 highly selective features). At query time a user selects a few example images, and the AdaBoost algorithm is used to learn a classification function which depends on a small number of the most appropriate features. This yields a highly efficient classification function. In addition we show that the AdaBoost framework provides a natural mechanism for the incorporation of relevance feedback. Finally we show results on a wide variety of image queries.

...read moreread less

419 citations

Patent•

Method and apparatus for populating electronic forms from scanned documents

[...]

Kumar Chellapilla¹, Cormac Herley¹, Trausti Kristjansson, Paul A. Viola¹•Institutions (1)

Microsoft¹

24 Mar 2004

TL;DR: In this paper, a computer-implemented method and apparatus are provided for populating an electronic form from an electronic image, identifying a size, orientation and position of an object within the electronic image and identifying information elements from pixels within the image that correspond to the object.

...read moreread less

Abstract: A computer-implemented method and apparatus are provided for populating an electronic form from an electronic image. The method and apparatus identify a size, orientation and position of an object within the electronic image, and identify information elements from pixels within the image that correspond to the object. Fields of the electronic form are displayed to a user along with the identified information elements through a graphical user interface. The information elements are parsed into tagged groups of different information types. At least some of the fields of the electronic form are populated with the tagged groups to produce a populated form. The user is allowed to edit the populated fields through the graphical user interface.

...read moreread less

204 citations

Proceedings Article•

Interactive information extraction with constrained conditional random fields

[...]

Trausti Kristjansson¹, Aron Culotta², Paul A. Viola¹, Andrew McCallum²•Institutions (2)

Microsoft¹, University of Massachusetts Amherst²

25 Jul 2004

TL;DR: This work applies a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted.

...read moreread less

Abstract: Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have achieved low error rates but invariably make a number of errors. The goal of an interactive information extraction system is to assist the user in filling in database fields while giving the user confidence in the integrity of the data. The user is presented with an interactive interface that allows both the rapid verification of automatic field assignments and the correction of errors. In cases where there are multiple errors, our system takes into account user corrections, and immediately propagates these constraints such that other fields are often corrected automatically. Linear-chain conditional random fields (CRFs) have been shown to perform well for information extraction and other language modelling tasks due to their ability to capture arbitrary, overlapping features of the input in a Markov model. We apply this framework with two extensions: a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted. Both of these mechanisms are incorporated in a novel user interface for form filling that is intuitive and speeds the entry of data--providing a 23% reduction in error due to automated corrections.

...read moreread less

176 citations

Patent•

Low resolution OCR for camera acquired documents

[...]

Charles E. Jacobs¹, James Russell Rinker¹, Patrice Y. Simard¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

20 May 2004

TL;DR: This paper proposed a global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process.

...read moreread less

Abstract: A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.

...read moreread less

123 citations

Proceedings Article•DOI•

Learning silhouette features for control of human motion

[...]

Liu Ren¹, Gregory Shakhnarovich², Jessica K. Hodgins¹, Hanspeter Pfister³, Paul A. Viola⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Massachusetts Institute of Technology², Mitsubishi Electric Research Laboratories³, Microsoft⁴

08 Aug 2004

TL;DR: This research presents a low cost camera-based system that allows an untrained user to “drive” the dance of the couple interactively, and yields results for more complex behaviors and longer sequences than have been demonstrated in previous silhouette-based systems.

...read moreread less

Abstract: We present a vision-based performance interface for controlling animated human characters. The system interactively combines information about the user's motion contained in silhouettes from three viewpoints with domain knowledge contained in a motion capture database to produce an animation of high quality. Such an interactive system might be useful for authoring, for teleconferencing, or as a control interface for a character in a game. In our implementation, the user performs in front of three video cameras; the resulting silhouettes are used to estimate his orientation and body configuration based on a set of discriminative local features. Those features are selected by a machine-learning algorithm during a preprocessing step. Sequences of motions that approximate the user's actions are extracted from the motion database and scaled in time to match the speed of the user's motion. We use swing dancing, a complex human motion, to demonstrate the effectiveness of our approach. We compare our results to those obtained with a set of global features, Hu moments, and ground truth measurements from a motion capture system.

...read moreread less

109 citations

Proceedings Article•DOI•

Spatial recognition and grouping of text and graphics

[...]

Michael Shilman¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

30 Aug 2004

TL;DR: A framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams that can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers is presented.

...read moreread less

Abstract: We present a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The approach is completely spatial, that is it does not require any ordering on the strokes. It also does not place any constraint on the relative placement of the shapes or symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative classifier is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This classifier combines the rendered image of the strokes with stroke features such as curvature and endpoints. A small subset of very efficient features is selected, yielding an extremely fast classifier. An A-star search algorithm over connected subsets of the proximity graph is used to simultaneously find the optimal segmentation and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers.

...read moreread less

105 citations

Patent•

Processing an electronic document for information extraction

[...]

Paul A. Viola¹, Hiu Chung Law¹, James Russell Rinker¹•Institutions (1)

Microsoft¹

29 Apr 2004

TL;DR: In this article, features and/or properties of words are identified from a set of training documents to aid in extracting information from documents to be processed, and a classifier is developed to express these features and or properties.

...read moreread less

Abstract: The present invention relates generally to automatically processing electronic documents. In one aspect, features and/or properties of words are identified from a set of training documents to aid in extracting information from documents to be processed. The features and/or properties relate to text of the words, position of the words and the relationship to other words. A classifier is developed to express these features and/or properties. During information extraction, documents are processed and analyzed based on the classifier and information is extracted based on correspondence of the documents and the features/properties expressed by the classifier.

...read moreread less

101 citations

Proceedings Article•DOI•

Recognition and grouping of handwritten text in diagrams and equations

[...]

Michael Shilman¹, Paul A. Viola¹, Kumar Chellapilla¹•Institutions (1)

Microsoft¹

26 Oct 2004

TL;DR: A framework for grouping and recognition of characters and symbols in online free-form ink expressions that can achieve 94% grouping/recognition accuracy on a test dataset containing symbols from 25 writers held out from the training process.

...read moreread less

Abstract: We present a framework for grouping and recognition of characters and symbols in online free-form ink expressions. The approach is completely spatial; it does not require any ordering on the strokes. It also does not place any constraints on the layout of the symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative recognizer is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This recognizer operates on the rendered image of the strokes plus stroke features such as curvature and endpoints. A small subset of very efficient image features is selected, yielding an extremely fast recognizer. Dynamic programming over connected subsets of the proximity graph is used to simultaneously find the optimal grouping and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 94% grouping/recognition accuracy on a test dataset containing symbols from 25 writers held out from the training process.

...read moreread less

42 citations

Proceedings Article•DOI•

Learning to parse hierarchical lists and outlines using conditional random fields

[...]

Ming Ye¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

26 Oct 2004

TL;DR: A system is presented, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure, which provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.

...read moreread less

Abstract: Handwritten notes are complex structures, which include blocks of text, drawings, and annotations. The main challenge for the newly emerging tablet computer is to provide high-level tools for editing and authoring handwritten documents using a natural interface. One frequent component of natural notes are lists and hierarchical outlines, which correspond directly to the bulleted lists and itemized structures in conventional text, editing tools. We present a system, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure. This inferred structure provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.

...read moreread less

13 citations

Book Chapter•DOI•

Automatic Fax Routing

[...]

Paul A. Viola¹, James Russell Rinker¹, Martin Law¹•Institutions (1)

Microsoft¹

08 Sep 2004

TL;DR: A system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias by combining the quality of the matches and the relevance of the words.

...read moreread less

Abstract: We present a system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias. The system first performs optical character recognition to find words and in some cases parts of words (we have observed error rates as high as 10 to 20 percent). For all these “noisy” words, a set of features is computed which include internal text features, location features, and relationship features. These features are combined to estimate the relevance of the word in the context of the page and the recipient database. The parameters of the word relevance function are learned from training data using the AdaBoost learning algorithm. Words are then compared to the database of recipients to find likely matches. The recipients are finally ranked by combining the quality of the matches and the relevance of the words. Experiments are presented which demonstrate the effectiveness of this system on a large set of real data.

...read moreread less

Patent•

System and method for detecting object in image

[...]

Paul A. Viola, Michael Jones

09 Sep 2004

TL;DR: In this article, an image is first partitioned into varisized patches by use of either an integral image or a Gaussian pyramid, and features in each patch are evaluated to determine a cumulative score.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for detecting an object, such as a human face, in an image. SOLUTION: An image is first partitioned into varisized patches by use of either an integral image or a Gaussian pyramid. Features in each patch are evaluated to determine a cumulative score. The evaluation is repeated as long as the cumulative score is within the range between an acceptance threshold and a rejection threshold, or otherwise the image is rejected when the cumulative score is smaller than the rejection threshold while the image is accepted as the one including the object when the cumulative score is greater than the acceptance threshold. COPYRIGHT: (C)2004,JPO&NCIPI

...read moreread less

Patent•

Method and system for detecting specific object in image

[...]

Michael Jones, Paul A. Viola

23 Dec 2004

TL;DR: In this article, an orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation.

...read moreread less

Abstract: A method for detects a specific object in an image. An orientation of an arbitrary object with respect to an image plane is determined and one of a plurality orientation and object specific classifiers is selected according to the orientation. The arbitrary object is classified as a specific object with the selected orientation and object specific classifier.

...read moreread less

Patent•

Method for detecting a moving object in a temporal sequence of images of a video

[...]

Paul A. Viola¹, Michael Jones¹•Institutions (1)

Mitsubishi¹

31 May 2004

TL;DR: In this paper, a linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window, which are summed to determine a cumulative score.

...read moreread less

Abstract: A method detects a moving object in a temporal sequence of images. Images are selected from the temporally ordered sequence of images. A set of functions is applied to the selected images to generate a set of combined images. A linear combination of filters is applied to a detection window in the set of combined images to determine motion and appearance features of the detection window. The motion and appearance features are summed to determine a cumulative score, which enables a classification of the detection window as including the moving object.

...read moreread less

Patent•

Method and system for recognizing image object

[...]

Paul A. Viola, Michael Jones

30 Apr 2004

TL;DR: In this article, a probe image is paired with each gallery image to generate pairs of images, and first filters are applied to the probe image of each pair to obtain a first feature value for each application of each filter.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a method for recognizing an object in an image. SOLUTION: Gallery images 121 include identified objects, and a probe image includes an unidentified object. The probe image is paired with each gallery image to generate pairs of images. First filters are applied to the probe image of each pair to obtain a first feature value for each application of each filter. Second filters are similarly applied to the each gallery image of each pair to obtain a second feature value. The feature values are summed for each application of each filter, and for each application, a score is set to an acceptance weight if the sum is greater than a predetermined threshold, and otherwise setting the score to a rejection weight. The scores are summed for all of the application, and the probe image is identified with any gallery image when the score is greater than zero. COPYRIGHT: (C)2004,JPO

...read moreread less

Patent•

Analisis of object orientation in an image, followed by object recognition

[...]

Michael Jones¹, Paul A. Viola¹•Institutions (1)

Mitsubishi Electric¹

11 Jun 2004

...read moreread less

Patent•

Procede et systeme de detection d'un objet specifique dans une image

[...]

Michael Jones, Paul A. Viola

11 Jun 2004

TL;DR: In this paper, a procedure for detecting an object in an image is described, which consiste in detecting the object in the image and selecting the orientation of the object according to a plan d'image.

...read moreread less

Abstract: L'invention concerne un procede qui consiste a detecter un objet dans une image. L'orientation d'un objet arbitraire par rapport a un plan d'image est determinee et un classificateur parmi plusieurs classificateurs specifiques d'objet et d'orientation est selectionne selon l'orientation. L'objet arbitraire est classe en tant qu'objet specifique a l'aide du classificateur specifique d'objet et d'orientation selectionne.

...read moreread less