Home
/
Authors
/
Paul A. Viola

Author

Paul A. Viola

Other affiliations: IBM, Wilmington University, Mitsubishi Electric Research Laboratories ...read more

Bio: Paul A. Viola is an academic researcher from Microsoft. The author has contributed to research in topics: Parsing & Boosting (machine learning). The author has an hindex of 52, co-authored 115 publications receiving 59853 citations. Previous affiliations of Paul A. Viola include IBM & Wilmington University.

Topics: Parsing, Boosting (machine learning), Face detection, AdaBoost, Object detection ...read more

Papers published on a yearly basis

2013
2011
2010
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1993
1991
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning to Group Text Lines and Regions in Freeform Handwritten Notes

[...]

Ming Ye¹, Paul A. Viola¹, Sashi Raghupathy¹, Herry Sutanto¹, Chengyang Li¹ - Show less +1 more•Institutions (1)

Microsoft¹

23 Sep 2007

TL;DR: This paper proposes a machine learning approach to grouping problems in ink parsing, where hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features.

...read moreread less

Abstract: This paper proposes a machine learning approach to grouping problems in ink parsing. Starting from an initial segmentation, hypotheses are generated by perturbing local configurations and processed in a high-confidence-first fashion, where the confidence of each hypothesis is produced by a data-driven AdaBoost decision-tree classifier with a set of intuitive features. This framework has successfully applied to grouping text lines and regions in complex freeform digital ink notes from real TabletPC users. It holds great potential in solving many other grouping problems in the ink parsing and document image analysis domains.

...read moreread less

11 citations

Patent•

Parsing of document visual structures

[...]

Paul A. Viola, Shil Man Majkl

20 Jun 2011

TL;DR: In this article, the visual structure is exposed to grammatical analysis by association of multiple grammatical rules with multiple types of symbols identifier in the visual structures of the document, which makes it possible to recognise components of the documents (for instance, columns, names of authors, headings, references, etc.).

...read moreread less

Abstract: FIELD: information technologies. ^ SUBSTANCE: 2D representation of a document is used to identify a visual structure, which helps to recognise a document. The visual structure is exposed to grammatical analysis by association of multiple grammatical rules with multiple types of symbols identifier in the visual structure of the document. This makes it possible to recognise components of the document (for instance, columns, names of authors, headings, references, etc.), as a result of which structural components of the document may be accurately interpreted. At the same time the grammatical analysis is based on a function of grammatical value, which is produced by means of a machine training procedure. At the same time the grammatical analysis comprises representation of analysis in the form of an image and estimation of an image for execution of the grammatical value function with definition of optimal analysis. To simplify document recognition, it is possible to use procedures of grammatical analysis, where procedures of amplification and/or "quick recognition criteria", etc. are used. ^ EFFECT: improved accuracy of document detection. ^ 19 cl, 10 dwg, 5 tbl

...read moreread less

10 citations

Patent•

Recognition of mathematical expressions

[...]

Goran Predovic¹, Ahmad Abdulkader¹, Bodin Dresevic¹, Paul A. Viola¹, Milan Vukosavljevic¹ - Show less +1 more•Institutions (1)

Microsoft¹

19 Apr 2007

TL;DR: In this article, a user may input strokes as digital ink to a processing device and the processing device may partition the input strokes into multiple regions of strokes and then convert the scores to a converted score which may have at least a near standard normal distribution.

...read moreread less

Abstract: In embodiments consistent with the subject matter of this disclosure, a user may input strokes as digital ink to a processing device. The processing device may partition the input strokes into multiple regions of strokes. A first recognizer and a second recognizer may score grammar objects included in regions and represented by chart entries. The scores may be converted to a converted score, which may have at least a near standard normal distribution. The processing device may present a recognition result based on highest converted scores according to a recurrence formula. The processing device may receive a correction hint with respect to misrecognized strokes and may add a penalty score with respect to chart entries representing grammar objects breaking the correction hint. Incremental recognition may be performed when a pause is detected during inputting of strokes.

...read moreread less

10 citations

Patent•

Systems and methods that facilitate improved display of electronic documents

[...]

Radoslav Petrov Nickolov¹, Kumar Chellapilla¹, David M. Bargeron¹, Patrice Y. Simard¹, Paul A. Viola¹ - Show less +1 more•Institutions (1)

Microsoft¹

24 May 2005

TL;DR: In this article, a computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document and an analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document.

...read moreread less

Abstract: A computer-implemented word processing system comprises an interface component that receives a features vector associated with an electronic document. An analysis component communicatively coupled to the interface component analyzes the features vector and determines a viewing mode in which to display the electronic document. In accordance with one aspect of the subject invention, the viewing mode can be one of a conventional viewing mode and a viewing mode associated with enhanced readability.

...read moreread less

10 citations

Proceedings Article•DOI•

Nonparametric estimation of aspect dependence for ATR

[...]

Andrew J. Kim¹, John W. Fisher¹, Alan S. Willsky¹, Paul A. Viola¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Aug 1999

TL;DR: In this article, a feature set which is specifically motivated by scattering aspect dependencies present in SAR images is described, which are learned with a nonparametric density estimator allowing the full richness of the data to reveal itself.

...read moreread less

Abstract: In conventional SAR image formation, idealizations are made about the underlying scattering phenomena in the target field. In particular, the reflected signal is modeled as a pure delay and scaling of the transmitted signal where the delay is determined by the distance to the scatterer. Inherent in this assumption is that the scatterers are isotropic, i.e. their reflectivity appears the same from all orientations, and frequency independent, i.e. the magnitude and phase of the reflectivity are constant with respect to the frequency of the transmitted signal. Frequently, these assumptions are relatively poor resulting in an image which is highly variable with respect to imaging aspect. This variability often poses a difficulty for subsequent processing such as ATR. However, this need not be the case if the nonideal scattering is taken into account. In fact, we believe that if utilized properly, these nonideal characteristics may actually be used to aid in the processing as they convey distinguishing information about the content of the scene under investigation. In this paper, we describe a feature set which is specifically motivated by scattering aspect dependencies present in SAR. These dependencies are learned with a nonparametric density estimator allowing the full richness of the data to reveal itself. These densities are then used to determine the classification of the image content.

...read moreread less

9 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
…
16
17
18
19
20
21
22
…
23

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Histograms of oriented gradients for human detection

[...]

Navneet Dalal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

...read moreread less

31,952 citations

Proceedings Article•DOI•

You Only Look Once: Unified, Real-Time Object Detection

[...]

Joseph Redmon¹, Santosh K. Divvala², Ross Girshick³, Ali Farhadi²•Institutions (3)

University of Washington¹, Allen Institute for Artificial Intelligence², Facebook³

27 Jun 2016

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

27,256 citations

Proceedings Article•DOI•

Rapid object detection using a boosted cascade of simple features

[...]

Paul A. Viola¹, Michael Jones•Institutions (1)

Mitsubishi¹

01 Dec 2001

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.

...read moreread less

Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

...read moreread less

18,620 citations

Journal Article•DOI•

The Pascal Visual Object Classes (VOC) Challenge

[...]

Mark Everingham¹, Luc Van Gool², Christopher Williams³, John Winn⁴, Andrew Zisserman⁵ - Show less +1 more•Institutions (5)

University of Leeds¹, Katholieke Universiteit Leuven², University of Edinburgh³, Microsoft⁴, University of Oxford⁵

01 Jun 2010-International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

...read moreread less

15,935 citations

Proceedings Article•DOI•

Fast R-CNN

[...]

Ross Girshick¹•Institutions (1)

Microsoft¹

07 Dec 2015

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

...read moreread less

14,824 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse