Home
/
Authors
/
Baoguang Shi

Author

Baoguang Shi

Huazhong University of Science and Technology

Bio: Baoguang Shi is an academic researcher from Huazhong University of Science and Technology. The author has contributed to research in topics: Convolutional neural network & Noisy text analytics. The author has an hindex of 9, co-authored 10 publications receiving 1340 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Robust Scene Text Recognition with Automatic Rectification

[...]

Baoguang Shi¹, Xinggang Wang¹, Pengyuan Lyu¹, Cong Yao¹, Xiang Bai¹ - Show less +1 more•Institutions (1)

Huazhong University of Science and Technology¹

12 Mar 2016

TL;DR: This article proposed a robust text recognizer with automatic rectification (RARE), which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN).

...read moreread less

Abstract: Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a speciallydesigned deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

...read moreread less

606 citations

Journal Article•DOI•

DeepPano: Deep Panoramic Representation for 3-D Shape Recognition

[...]

Baoguang Shi¹, Song Bai¹, Zhichao Zhou¹, Xiang Bai¹•Institutions (1)

Huazhong University of Science and Technology¹

22 Sep 2015-IEEE Signal Processing Letters

TL;DR: This letter introduces a robust representation of 3-D shapes, named DeepPano, learned with deep convolutional neural networks (CNN), where a row-wise max-pooling layer is inserted between the convolution and fully-connected layers, making the learned representations invariant to the rotation around a principle axis.

...read moreread less

Abstract: This letter introduces a robust representation of 3-D shapes, named DeepPano, learned with deep convolutional neural networks (CNN). Firstly, each 3-D shape is converted into a panoramic view, namely a cylinder projection around its principle axis. Then, a variant of CNN is specifically designed for learning the deep representations directly from such views. Different from typical CNN, a row-wise max-pooling layer is inserted between the convolution and fully-connected layers, making the learned representations invariant to the rotation around a principle axis. Our approach achieves state-of-the-art retrieval/classification results on two large-scale 3-D model datasets (ModelNet-10 and ModelNet-40), outperforming typical methods by a large margin.

...read moreread less

404 citations

Proceedings Article•DOI•

Strokelets: A Learned Multi-scale Representation for Scene Text Recognition

[...]

Cong Yao¹, Xiang Bai¹, Baoguang Shi¹, Wenyu Liu¹•Institutions (1)

Huazhong University of Science and Technology¹

23 Jun 2014

TL;DR: This paper proposes a novel multi-scale representation for scene text recognition that consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities.

...read moreread less

Abstract: Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision. Though extensively studied, localizing and reading text in uncontrolled environments remain extremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for scene text recognition. This representation consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities. Strokelets possess four distinctive advantages: (1) Usability: automatically learned from bounding box labels, (2) Robustness: insensitive to interference factors, (3) Generality: applicable to variant languages, and (4) Expressivity: effective at describing characters. Extensive experiments on standard benchmarks verify the advantages of strokelets and demonstrate the effectiveness of the proposed algorithm for text recognition.

...read moreread less

303 citations

Journal Article•DOI•

Script identification in the wild via discriminative convolutional neural network

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Apr 2016-Pattern Recognition

TL;DR: The proposed DiscCNN achieves state-of-the-art performances on scene, video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features.

...read moreread less

111 citations

Proceedings Article•DOI•

Automatic script identification in the wild

[...]

Baoguang Shi¹, Cong Yao¹, Chengquan Zhang¹, Xiaowei Guo, Feiyue Huang, Xiang Bai¹ - Show less +2 more•Institutions (1)

Huazhong University of Science and Technology¹

23 Aug 2015

TL;DR: In this paper, a large-scale dataset with a great quantity of natural images and 10 types of widely-used languages is constructed and released, and a deep learning based algorithm is proposed.

...read moreread less

Abstract: With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations. In this paper, we are concerned with a relatively new problem: script identification at word or line levels in natural scenes. A large-scale dataset with a great quantity of natural images and 10 types of widely-used languages is constructed and released. In allusion to the challenges in script identification in real-world scenarios, a deep learning based algorithm is proposed. The experiments on the proposed dataset demonstrate that our algorithm achieves superior performance, compared with conventional image classification or script identification methods, including as the original CNN architecture, LLC and GLCM.

...read moreread less

49 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Recent advances in convolutional neural networks

[...]

Jiuxiang Gu¹, Zhenhua Wang¹, Jason Kuen¹, Lianyang Ma¹, Amir Shahroudy¹, Bing Shuai¹, Ting Liu¹, Xingxing Wang¹, Gang Wang¹, Jianfei Cai¹, Tsuhan Chen¹ - Show less +7 more•Institutions (1)

Nanyang Technological University¹

01 May 2018-Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

3,125 citations

Journal Article•DOI•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

...read moreread less

2,184 citations

Proceedings Article•DOI•

Volumetric and Multi-view CNNs for Object Classification on 3D Data

[...]

Charles R. Qi¹, Hao Su¹, Matthias NieBner¹, Angela Dai¹, Mengyuan Yan¹, Leonidas J. Guibas¹ - Show less +2 more•Institutions (1)

Stanford University¹

27 Jun 2016

TL;DR: In this paper, two distinct network architectures of volumetric CNNs and multi-view CNNs are introduced, where they introduce multiresolution filtering in 3D. And they provide extensive experiments designed to evaluate underlying design choices.

...read moreread less

Abstract: 3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-theart methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view representations. Empirical results from these two types of CNNs exhibit a large gap, indicating that existing volumetric CNN architectures and approaches are unable to fully exploit the power of 3D representations. In this paper, we aim to improve both volumetric CNNs and multi-view CNNs according to extensive analysis of existing approaches. To this end, we introduce two distinct network architectures of volumetric CNNs. In addition, we examine multi-view CNNs, where we introduce multiresolution filtering in 3D. Overall, we are able to outperform current state-of-the-art methods for both volumetric CNNs and multi-view CNNs. We provide extensive experiments designed to evaluate underlying design choices, thus providing a better understanding of the space of methods available for object classification on 3D data.

...read moreread less

1,488 citations

Journal Article•DOI•

State-of-the-art in artificial neural network applications: A survey

[...]

Oludare Isaac Abiodun¹, Oludare Isaac Abiodun², Aman Jantan², Abiodun Esther Omolara³, Kemi Victoria Dada⁴, Nachaat AbdElatif Mohamed⁵, Humaira Arshad⁶ - Show less +3 more•Institutions (6)

ECWA Bingham University¹, Universiti Sains Malaysia², Ahmadu Bello University³, Nasarawa State University⁴, University of California, Berkeley⁵, Islamia University⁶

01 Nov 2018-Heliyon

TL;DR: The study found that neural-network models such as feedforward and feedback propagation artificial neural networks are performing better in its application to human problems and proposed feedforwardand feedback propagation ANN models for research focus based on data analysis factors like accuracy, processing speed, latency, fault tolerance, volume, scalability, convergence, and performance.

...read moreread less

1,471 citations

Posted Content•

Deep Sets

[...]

Manzil Zaheer¹, Satwik Kottur¹, Siamak Ravanbakhsh¹, Barnabás Póczos², Ruslan Salakhutdinov³, Alexander J. Smola⁴ - Show less +2 more•Institutions (4)

Carnegie Mellon University¹, University of California, Riverside², University of Toronto³, Google⁴

10 Mar 2017-arXiv: Learning

TL;DR: The main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation covariant objective function must belong, which enables the design of a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks.

...read moreread less

Abstract: We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of embankment dams \cite{Jung15Exploration}, to cosmology \cite{Ntampaka16Dynamical,Ravanbakhsh16ICML1}. Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We also derive the necessary and sufficient conditions for permutation equivariance in deep models. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection.

...read moreread less

1,329 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse