Focusing Attention: Towards Accurate Text Recognition in Natural Images

doi:10.1109/ICCV.2017.543

Open AccessProceedings ArticleDOI

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Zhanzhan Cheng, +5 more

- pp 5086-5094

Chats0

TLDR

Zhang et al. as mentioned in this paper proposed Focusing Attention Network (FAN) which employs a focusing attention mechanism to automatically draw back the drifted attention. But the FAN method is not suitable for complex and low-quality images and it cannot get accurate alignment between feature areas and targets for such images.

Abstract:

Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon “attention drift”. To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.

Citations

PDF

Open Access

More filters

Posted Content

Convolutional Character Networks

Linjie Xing, +3 more

- 17 Oct 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: CharNet as discussed by the authors directly outputs bounding boxes of words and characters, with corresponding character labels, and uses character as basic element, allowing them to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch.

...read moreread less

Posted Content

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Liang Qiao, +6 more

- 08 Dec 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation is proposed, which achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks.

...read moreread less

Posted Content

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Deli Yu, +5 more

- 27 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission.

...read moreread less

Journal ArticleDOI

A Robust License Plate Recognition Model Based on Bi-LSTM

Yongjie Zou, +6 more

- 24 Nov 2020 -

IEEE Access

TL;DR: A robust license plate recognition model is proposed in this paper, which mainly includes license plate feature extraction, license plate character localization, and feature extraction of characters, which proves the effectiveness and robustness of the model.

...read moreread less

Posted Content

Attention, please! A survey of Neural Attention Models in Deep Learning.

Alana de Santana Correia, +1 more

- 31 Mar 2021 -

arXiv: Learning

TL;DR: A comprehensive overview and analysis of neural attention models can be found in this article, where the authors systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Learning representations by back-propagating errors

David E. Rumelhart, +2 more

- 01 Jan 1988 -

Nature

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Collapse

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

arXiv: Computer Vision and Pattern Recog...

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Citations

Convolutional Character Networks

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

A Robust License Plate Recognition Model Based on Bi-LSTM

Attention, please! A survey of Neural Attention Models in Deep Learning.

References

Deep Residual Learning for Image Recognition

Learning representations by back-propagating errors

Neural Machine Translation by Jointly Learning to Align and Translate

Sequence to Sequence Learning with Neural Networks

Caffe: Convolutional Architecture for Fast Feature Embedding

Related Papers (5)

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Synthetic Data for Text Localisation in Natural Images

End-to-end scene text recognition

ICDAR 2013 Robust Reading Competition

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition