Home
/
Authors
/
Paul A. Viola

Author

Paul A. Viola

Other affiliations: IBM, Wilmington University, Mitsubishi Electric Research Laboratories ...read more

Bio: Paul A. Viola is an academic researcher from Microsoft. The author has contributed to research in topics: Parsing & Boosting (machine learning). The author has an hindex of 52, co-authored 115 publications receiving 59853 citations. Previous affiliations of Paul A. Viola include IBM & Wilmington University.

Topics: Parsing, Boosting (machine learning), Face detection, AdaBoost, Object detection ...read more

Papers published on a yearly basis

2013
2011
2010
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1993
1991
1990
1989

Papers

PDF

Open Access

More filters

Patent•

Histogram-based classifiers having variable bin sizes

[...]

Cha Zhang¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

13 Jul 2007

TL;DR: In this paper, a combination classifier and intermediate rejection threshold are learned using a pruning process, which ensures that objects detected by the original classifier are also detected by another classifier, thereby guaranteeing the same detection rate on the training set after pruning.

...read moreread less

Abstract: A “Classifier Trainer” trains a combination classifier for detecting specific objects in signals (e.g., faces in images, words in speech, patterns in signals, etc.). In one embodiment “multiple instance pruning” (MIP) is introduced for training weak classifiers or “features” of the combination classifier. Specifically, a trained combination classifier and associated final threshold for setting false positive/negative operating points are combined with learned intermediate rejection thresholds to construct the combination classifier. Rejection thresholds are learned using a pruning process which ensures that objects detected by the original combination classifier are also detected by the combination classifier, thereby guaranteeing the same detection rate on the training set after pruning. The only parameter required throughout training is a target detection rate for the final cascade system. In additional embodiments, combination classifiers are trained using various combinations of weight trimming, bootstrapping, and a weak classifier termed a “fat stump” classifier.

...read moreread less

13 citations

Patent•

Utilizing grammatical parsing for structured layout analysis

[...]

Paul A. Viola¹, Michael Shilman¹, Mukund Narasimhan¹, Percy Liang¹•Institutions (1)

Microsoft¹

29 Apr 2005

TL;DR: Grammatical parsing is utilized to parse structured layouts that are modeled as grammars as mentioned in this paper, which provides an optimal parse tree for the structured layout based on a grammatical cost function associated with a global search Machine learning techniques facilitate in discriminatively selecting features and setting parameters in the grammatical parsing process.

...read moreread less

Abstract: Grammatical parsing is utilized to parse structured layouts that are modeled as grammars This type of parsing provides an optimal parse tree for the structured layout based on a grammatical cost function associated with a global search Machine learning techniques facilitate in discriminatively selecting features and setting parameters in the grammatical parsing process In one instance, labeled examples are parsed and a chart is generated The chart is then converted into a subsequent set of labeled learning examples Classifiers are then trained utilizing conventional machine learning and the subsequent example set The classifiers are then employed to facilitate scoring of succedent sub-parses A global reference grammar can also be established to facilitate in completing varying tasks without requiring additional grammar learning, substantially increasing the efficiency of the structured layout analysis techniques

...read moreread less

13 citations

Proceedings Article•DOI•

Learning to parse hierarchical lists and outlines using conditional random fields

[...]

Ming Ye¹, Paul A. Viola¹•Institutions (1)

Microsoft¹

26 Oct 2004

TL;DR: A system is presented, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure, which provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.

...read moreread less

Abstract: Handwritten notes are complex structures, which include blocks of text, drawings, and annotations. The main challenge for the newly emerging tablet computer is to provide high-level tools for editing and authoring handwritten documents using a natural interface. One frequent component of natural notes are lists and hierarchical outlines, which correspond directly to the bulleted lists and itemized structures in conventional text, editing tools. We present a system, which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure. This inferred structure provides the foundation for new user interfaces and facilitates the importation of handwritten notes into conventional editing tools.

...read moreread less

13 citations

Proceedings Article•

Neurally Inspired Plasticity in Oculomotor Processes

[...]

Paul A. Viola¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1989

TL;DR: This Artificial-Eye (A-eye) combines the signals generated by two rate gyroscopes with motion information extracted from visual analysis to stabilize its camera and learns a system model that can be incrementally modified to adapt to changes in its structure, performance and environment.

...read moreread less

Abstract: We have constructed a two axis camera positioning system which is roughly analogous to a single human eye. This Artificial-Eye (A-eye) combines the signals generated by two rate gyroscopes with motion information extracted from visual analysis to stabilize its camera. This stabilization process is similar to the vestibulo-ocular response (VOR); like the VOR, A-eye learns a system model that can be incrementally modified to adapt to changes in its structure, performance and environment. A-eye is an example of a robust sensory system that performs computations that can be of significant use to the designers of mobile robots.

...read moreread less

12 citations

Book Chapter•DOI•

Automatic Fax Routing

[...]

Paul A. Viola¹, James Russell Rinker¹, Martin Law¹•Institutions (1)

Microsoft¹

08 Sep 2004

TL;DR: A system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias by combining the quality of the matches and the relevance of the words.

...read moreread less

Abstract: We present a system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias. The system first performs optical character recognition to find words and in some cases parts of words (we have observed error rates as high as 10 to 20 percent). For all these “noisy” words, a set of features is computed which include internal text features, location features, and relationship features. These features are combined to estimate the relevance of the word in the context of the page and the recipient database. The parameters of the word relevance function are learned from training data using the AdaBoost learning algorithm. Words are then compared to the database of recipients to find likely matches. The recipients are finally ranked by combining the quality of the matches and the relevance of the words. Experiments are presented which demonstrate the effectiveness of this system on a large set of real data.

...read moreread less

11 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
…
22
23

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Histograms of oriented gradients for human detection

[...]

Navneet Dalal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

...read moreread less

31,952 citations

Proceedings Article•DOI•

You Only Look Once: Unified, Real-Time Object Detection

[...]

Joseph Redmon¹, Santosh K. Divvala², Ross Girshick³, Ali Farhadi²•Institutions (3)

University of Washington¹, Allen Institute for Artificial Intelligence², Facebook³

27 Jun 2016

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

27,256 citations

Proceedings Article•DOI•

Rapid object detection using a boosted cascade of simple features

[...]

Paul A. Viola¹, Michael Jones•Institutions (1)

Mitsubishi¹

01 Dec 2001

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.

...read moreread less

Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

...read moreread less

18,620 citations

Journal Article•DOI•

The Pascal Visual Object Classes (VOC) Challenge

[...]

Mark Everingham¹, Luc Van Gool², Christopher Williams³, John Winn⁴, Andrew Zisserman⁵ - Show less +1 more•Institutions (5)

University of Leeds¹, Katholieke Universiteit Leuven², University of Edinburgh³, Microsoft⁴, University of Oxford⁵

01 Jun 2010-International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

...read moreread less

15,935 citations

Proceedings Article•DOI•

Fast R-CNN

[...]

Ross Girshick¹•Institutions (1)

Microsoft¹

07 Dec 2015

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

...read moreread less

14,824 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse