Focusing Attention: Towards Accurate Text Recognition in Natural Images

doi:10.1109/ICCV.2017.543

Open AccessProceedings ArticleDOI

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Zhanzhan Cheng, +5 more

- pp 5086-5094

Chats0

TLDR

Zhang et al. as mentioned in this paper proposed Focusing Attention Network (FAN) which employs a focusing attention mechanism to automatically draw back the drifted attention. But the FAN method is not suitable for complex and low-quality images and it cannot get accurate alignment between feature areas and targets for such images.

Abstract:

Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon “attention drift”. To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

Baoguang Shi, +5 more

- 01 Sep 2019 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

...read moreread less

Posted Content

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Pengyuan Lyu, +4 more

- 06 Jul 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper investigates the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images, and proposes an end-to-end trainable neural network model, named as Mask TextSpotter, which is inspired by the newly published work Mask R-CNN.

...read moreread less

Proceedings ArticleDOI

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

Jeonghun Baek, +7 more

TL;DR: In this paper, a unified four-stage scene text recognition (STR) framework is introduced to compare the performance of different models. But, the performance gap results from inconsistencies in the training and evaluation datasets.

...read moreread less

Proceedings ArticleDOI

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Fangneng Zhan, +1 more

TL;DR: Li et al. as discussed by the authors proposed an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better text recognition performance.

...read moreread less

Proceedings ArticleDOI

AON: Towards Arbitrarily-Oriented Text Recognition

Zhanzhan Cheng, +5 more

TL;DR: The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Posted Content

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

- 22 Dec 2012 -

arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Journal ArticleDOI

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Proceedings Article

Attention-based models for speech recognition

Jan Chorowski, +4 more

TL;DR: The authors proposed a location-aware attention mechanism for the TIMET phoneme recognition task, which achieved an improved 18.7% phoneme error rate (PER) on utterances which are roughly as long as the ones it was trained on.

...read moreread less

Collapse

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

arXiv: Computer Vision and Pattern Recog...

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Citations

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

AON: Towards Arbitrarily-Oriented Text Recognition

References

Speech recognition with deep recurrent neural networks

ADADELTA: An Adaptive Learning Rate Method

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Attention-based models for speech recognition

ICDAR 2015 competition on Robust Reading

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Synthetic Data for Text Localisation in Natural Images

End-to-end scene text recognition

ICDAR 2013 Robust Reading Competition

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition