Home
/
Authors
/
Cong Yao

Author

Cong Yao

Huazhong University of Science and Technology

Other affiliations: Peking University, Nanjing University, University of Rochester ...read more

Bio: Cong Yao is an academic researcher from Huazhong University of Science and Technology. The author has contributed to research in topics: Object detection & Feature (computer vision). The author has an hindex of 40, co-authored 112 publications receiving 9389 citations. Previous affiliations of Cong Yao include Peking University & Nanjing University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

...read moreread less

2,184 citations

Proceedings Article•DOI•

EAST: An Efficient and Accurate Scene Text Detector

[...]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, He Weiran, Jiajun Liang - Show less +3 more

01 Jul 2017

TL;DR: This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.

...read moreread less

Abstract: Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.

...read moreread less

1,161 citations

Proceedings Article•DOI•

Detecting texts of arbitrary orientations in natural images

[...]

Cong Yao¹, Xiang Bai¹, Wenyu Liu¹, Yi Ma², Zhuowen Tu² - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Microsoft²

16 Jun 2012

TL;DR: A system which detects texts of arbitrary orientations in natural images using a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts to better evaluate its algorithm and compare it with other competing algorithms.

...read moreread less

Abstract: With the increasing popularity of practical vision systems and smart phones, text detection in natural scenes becomes a critical yet challenging task. Most existing methods have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a system which detects texts of arbitrary orientations in natural images. Our algorithm is equipped with a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts. To better evaluate our algorithm and compare it with other competing algorithms, we generate a new dataset, which includes various texts in diverse real-world scenarios; we also propose a protocol for performance evaluation. Experiments on benchmark datasets and the proposed dataset demonstrate that our algorithm compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on texts of arbitrary orientations in complex natural scenes.

...read moreread less

750 citations

Proceedings Article•DOI•

Robust Scene Text Recognition with Automatic Rectification

[...]

Baoguang Shi¹, Xinggang Wang¹, Pengyuan Lyu¹, Cong Yao¹, Xiang Bai¹ - Show less +1 more•Institutions (1)

Huazhong University of Science and Technology¹

12 Mar 2016

TL;DR: This article proposed a robust text recognizer with automatic rectification (RARE), which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN).

...read moreread less

Abstract: Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a speciallydesigned deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

...read moreread less

606 citations

Journal Article•DOI•

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

[...]

Baoguang Shi¹, Mingkun Yang¹, Xinggang Wang¹, Pengyuan Lyu¹, Cong Yao, Xiang Bai¹ - Show less +2 more•Institutions (1)

Huazhong University of Science and Technology¹

01 Sep 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

...read moreread less

Abstract: A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

...read moreread less

592 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Recent advances in convolutional neural networks

[...]

Jiuxiang Gu¹, Zhenhua Wang¹, Jason Kuen¹, Lianyang Ma¹, Amir Shahroudy¹, Bing Shuai¹, Ting Liu¹, Xingxing Wang¹, Gang Wang¹, Jianfei Cai¹, Tsuhan Chen¹ - Show less +7 more•Institutions (1)

Nanyang Technological University¹

01 May 2018-Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

3,125 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Proceedings Article•DOI•

FCOS: Fully Convolutional One-Stage Object Detection

[...]

Zhi Tian¹, Chunhua Shen¹, Hao Chen¹, Tong He¹•Institutions (1)

University of Adelaide¹

02 Apr 2019

TL;DR: For the first time, a much simpler and flexible detection framework achieving improved detection accuracy is demonstrated, and it is hoped that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.

...read moreread less

Abstract: We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at: https://tinyurl.com/FCOSv1

...read moreread less

2,244 citations

Journal Article•DOI•

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[...]

Baoguang Shi¹, Xiang Bai¹, Cong Yao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Nov 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

...read moreread less

2,184 citations

Posted Content•

FCOS: Fully Convolutional One-Stage Object Detection

[...]

Zhi Tian¹, Chunhua Shen¹, Hao Chen¹, Tong He¹•Institutions (1)

University of Adelaide¹

02 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a fully convolutional one-stage object detector (FCOS) is proposed to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation.

...read moreread less

Abstract: We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks. Code is available at:Code is available at: this https URL

...read moreread less

2,160 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse