scispace - formally typeset
Search or ask a question
Author

Paul A. Viola

Bio: Paul A. Viola is an academic researcher from Microsoft. The author has contributed to research in topics: Parsing & Boosting (machine learning). The author has an hindex of 52, co-authored 115 publications receiving 59853 citations. Previous affiliations of Paul A. Viola include IBM & Wilmington University.


Papers
More filters
Proceedings ArticleDOI
Ming Ye1, Paul A. Viola1, Sashi Raghupathy1, Herry Sutanto1, Chengyang Li1 
23 Sep 2007
TL;DR: This paper proposes a machine learning approach to grouping problems in ink parsing, where hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features.
Abstract: This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

11 citations

Patent
20 Jun 2011
TL;DR: In this article, the visual structure is exposed to grammatical analysis by association of multiple grammatical rules with multiple types of symbols identifier in the visual structures of the document, which makes it possible to recognise components of the documents (for instance, columns, names of authors, headings, references, etc.).
Abstract: FIELD: information technologies. ^ SUBSTANCE: 2D representation of a document is used to identify a visual structure, which helps to recognise a document. The visual structure is exposed to grammatical analysis by association of multiple grammatical rules with multiple types of symbols identifier in the visual structure of the document. This makes it possible to recognise components of the document (for instance, columns, names of authors, headings, references, etc.), as a result of which structural components of the document may be accurately interpreted. At the same time the grammatical analysis is based on a function of grammatical value, which is produced by means of a machine training procedure. At the same time the grammatical analysis comprises representation of analysis in the form of an image and estimation of an image for execution of the grammatical value function with definition of optimal analysis. To simplify document recognition, it is possible to use procedures of grammatical analysis, where procedures of amplification and/or "quick recognition criteria", etc. are used. ^ EFFECT: improved accuracy of document detection. ^ 19 cl, 10 dwg, 5 tbl

10 citations

Patent
19 Apr 2007
TL;DR: In this article, a user may input strokes as digital ink to a processing device and the processing device may partition the input strokes into multiple regions of strokes and then convert the scores to a converted score which may have at least a near standard normal distribution.
Abstract: In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.

10 citations

Patent
24 May 2005
TL;DR: In this article, a computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document and an analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document.
Abstract: A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.

10 citations

Proceedings ArticleDOI
13 Aug 1999
TL;DR: In this article, a feature set which is specifically motivated by scattering aspect dependencies present in SAR images is described, which are learned with a nonparametric density estimator allowing the full richness of the data to reveal itself.
Abstract: In conventional SAR image formation, idealizations are made about the underlying scattering phenomena in the target field. In particular, the reflected signal is modeled as a pure delay and scaling of the transmitted signal where the delay is determined by the distance to the scatterer. Inherent in this assumption is that the scatterers are isotropic, i.e. their reflectivity appears the same from all orientations, and frequency independent, i.e. the magnitude and phase of the reflectivity are constant with respect to the frequency of the transmitted signal. Frequently, these assumptions are relatively poor resulting in an image which is highly variable with respect to imaging aspect. This variability often poses a difficulty for subsequent processing such as ATR. However, this need not be the case if the nonideal scattering is taken into account. In fact, we believe that if utilized properly, these nonideal characteristics may actually be used to aid in the processing as they convey distinguishing information about the content of the scene under investigation. In this paper, we describe a feature set which is specifically motivated by scattering aspect dependencies present in SAR. These dependencies are learned with a nonparametric density estimator allowing the full richness of the data to reveal itself. These densities are then used to determine the classification of the image content.

9 citations


Cited by
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations

Proceedings ArticleDOI
Ross Girshick1
07 Dec 2015
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

14,824 citations