ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

doi:10.1109/CVPR.2019.00216

Open AccessProceedings ArticleDOI

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Fangneng Zhan, +1 more

- pp 2059-2068

Chats0

TLDR

Li et al. as discussed by the authors proposed an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better text recognition performance.

Abstract:

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary text appearance variation in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed, where a line-fitting transformation is designed to estimate the pose of text lines in scenes. Additionally, an iterative rectification framework is developed which corrects scene text distortions iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and easy to train, where the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Scene Text Detection and Recognition: The Deep Learning Era

Shangbang Long, +2 more

- 01 Jan 2021 -

International Journal of Computer Vision

TL;DR: Jiang et al. as mentioned in this paper summarized and analyzed the major changes and significant progresses of scene text detection and recognition in the deep learning era, highlighting recent techniques and benchmarks, and looking ahead into future trends.

...read moreread less

Proceedings ArticleDOI

Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

Deli Yu, +6 more

TL;DR: Zhang et al. as discussed by the authors proposed a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission.

...read moreread less

Proceedings ArticleDOI

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Zhi Qiao, +4 more

TL;DR: This work proposes a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts and integrates the state-of-the-art ASTER method into the proposed framework as an exemplar.

...read moreread less

Posted Content

Scene Text Detection and Recognition: The Deep Learning Era

Shangbang Long, +2 more

- 10 Nov 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era.

...read moreread less

Proceedings ArticleDOI

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Shancheng Fang, +4 more

TL;DR: In this article, Fang et al. proposed an autonomous, bidirectional and iterative ABINet for scene text recognition, which blocks gradient flow between vision and language models to enforce explicitly language modeling.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings ArticleDOI

Effective Approaches to Attention-based Neural Machine Translation

Minh-Thang Luong, +2 more

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Proceedings Article

Spatial transformer networks

Max Jaderberg, +3 more

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.

...read moreread less

Journal ArticleDOI

Principal warps: thin-plate splines and the decomposition of deformations

Fred L. Bookstein

- 01 Jun 1989 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The decomposition of deformations by principal warps is demonstrated and the method is extended to deal with curving edges between landmarks to aid the extraction of features for analysis, comparison, and diagnosis of biological and medical images.

...read moreread less

Journal ArticleDOI

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Collapse

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Citations

Scene Text Detection and Recognition: The Deep Learning Era

Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Scene Text Detection and Recognition: The Deep Learning Era

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

References

Deep Residual Learning for Image Recognition

Effective Approaches to Attention-based Neural Machine Translation

Spatial transformer networks

Principal warps: thin-plate splines and the decomposition of deformations

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

End-to-end scene text recognition

Synthetic Data for Text Localisation in Natural Images

ICDAR 2013 Robust Reading Competition

Deep Residual Learning for Image Recognition