(Open Access) Focusing Attention: Towards Accurate Text Recognition in Natural Images (2017) | Zhanzhan Cheng

Citations

PDF

Open Access

More filters

Journal Article•DOI•

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

[...]

Baoguang Shi¹, Mingkun Yang¹, Xinggang Wang¹, Pengyuan Lyu¹, Cong Yao, Xiang Bai¹ - Show less +2 more•Institutions (1)

Huazhong University of Science and Technology¹

01 Sep 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

...read moreread less

Abstract: A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

...read moreread less

592 citations

Posted Content•

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

[...]

Pengyuan Lyu¹, Minghui Liao¹, Cong Yao, Wenhao Wu, Xiang Bai¹ - Show less +1 more•Institutions (1)

Huazhong University of Science and Technology¹

06 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper investigates the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images, and proposes an end-to-end trainable neural network model, named as Mask TextSpotter, which is inspired by the newly published work Mask R-CNN.

...read moreread less

Abstract: Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.

...read moreread less

326 citations

Proceedings Article•DOI•

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

[...]

Jeonghun Baek¹, Geewook Kim², Junyeop Lee¹, Sungrae Park¹, Dongyoon Han¹, Sangdoo Yun¹, Seong Joon Oh, Hwalsuk Lee¹ - Show less +4 more•Institutions (2)

Naver Corporation¹, Kyoto University²

03 Apr 2019

TL;DR: In this paper, a unified four-stage scene text recognition (STR) framework is introduced to compare the performance of different models. But, the performance gap results from inconsistencies in the training and evaluation datasets.

...read moreread less

Abstract: Many new proposals for scene text recognition (STR) models have been introduced in recent years. While each claim to have pushed the boundary of the technology, a holistic and fair comparison has been largely missing in the field due to the inconsistent choices of training and evaluation datasets. This paper addresses this difficulty with three major contributions. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies. Second, we introduce a unified four-stage STR framework that most existing STR models fit into. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules. Our code is publicly available.

...read moreread less

280 citations

Proceedings Article•DOI•

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

[...]

Fangneng Zhan¹, Shijian Lu¹•Institutions (1)

Nanyang Technological University¹

15 Jun 2019

TL;DR: Li et al. as discussed by the authors proposed an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better text recognition performance.

...read moreread less

Abstract: Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary text appearance variation in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed, where a line-fitting transformation is designed to estimate the pose of text lines in scenes. Additionally, an iterative rectification framework is developed which corrects scene text distortions iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and easy to train, where the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

...read moreread less

262 citations

Proceedings Article•DOI•

AON: Towards Arbitrarily-Oriented Text Recognition

[...]

Zhanzhan Cheng, Yangliu Xu¹, Fan Bai², Yi Niu², Shiliang Pu², Shuigeng Zhou² - Show less +2 more•Institutions (2)

Tongji University¹, Fudan University²

18 Jun 2018

TL;DR: The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets.

...read moreread less

Abstract: Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.

...read moreread less

252 citations

Collapse

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Citations

References

Related Papers (5)