Focusing Attention: Towards Accurate Text Recognition in Natural Images
Zhanzhan Cheng,Fan Bai,Yunlu Xu,Gang Zheng,Shiliang Pu,Shuigeng Zhou +5 more
- pp 5086-5094
Reads0
Chats0
TLDR
Zhang et al. as mentioned in this paper proposed Focusing Attention Network (FAN) which employs a focusing attention mechanism to automatically draw back the drifted attention. But the FAN method is not suitable for complex and low-quality images and it cannot get accurate alignment between feature areas and targets for such images.Abstract:
Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon “attention drift”. To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.read more
Citations
More filters
Journal ArticleDOI
Convolutional Neural Networks with Gated Recurrent Connections.
Jianfeng Wang,Xiaolin Hu +1 more
TL;DR: Gated Recurrent Convolutional Neural Networks (GRCNNs) as discussed by the authors modifies the receptive fields (RFs) of neurons by introducing gates to the recurrent connections, which can control the amount of context information inputting to the neurons and the neurons' RFs therefore become adaptive.
Proceedings ArticleDOI
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista,Rowel Atienza +1 more
TL;DR: This method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling that unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context.
Journal ArticleDOI
FREE: A Fast and Robust End-to-End Video Text Spotter
Zhanzhan Cheng,Jing Lu,Baorui Zou,Liang Qiao,Yunlu Xu,Shiliang Pu,Yi Niu,Fei Wu,Shuigeng Zhou +8 more
TL;DR: A fast and robust end-to-end video text spotting framework named FREE is proposed by only recognizing the localized text stream one-time instead of frame-wise recognition, which greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.
Proceedings ArticleDOI
Text Recognition in Images Based on Transformer with Hierarchical Attention
TL;DR: A new Transformer-like structure for text recognition in images, referred to as the Hierarchical Attention Transformer Network (HATN), which can be trained end-to-end by using only images and sentence-level annotations.
Book ChapterDOI
Constrained Relation Network for Character Detection in Scene Images
TL;DR: A new module named constrained relation module is proposed which utilizes both the geometric and contextual information to exploit the strong relationship between characters and is the first work to utilize contextual information among texts for character detection in scene images.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Learning representations by back-propagating errors
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Proceedings ArticleDOI
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.