scispace - formally typeset
Open AccessProceedings ArticleDOI

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Reads0
Chats0
TLDR
Zhang et al. as mentioned in this paper proposed Focusing Attention Network (FAN) which employs a focusing attention mechanism to automatically draw back the drifted attention. But the FAN method is not suitable for complex and low-quality images and it cannot get accurate alignment between feature areas and targets for such images.
Abstract
Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon “attention drift”. To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.

read more

Citations
More filters
Proceedings ArticleDOI

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

TL;DR: In this article, a self-attention-based neural network model with the focal loss was proposed for scene text recognition, where the use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training.
Book ChapterDOI

Text Detection by Jointly Learning Character and Word Regions

TL;DR: Zhang et al. as discussed by the authors proposed a weakly supervised method to fully use the word-level labels of the real dataset to generate character-level pseudo-labels for training.
Patent

Recognition method and device

TL;DR: In this article, a recognition method and device which is used for solving the technical problem of poor recognition effect of an electronic device on a scene text recognition method is presented. And the method comprises the steps of obtaining a to-be-recognized text image, wherein the text image comprises at least one character, performing image feature extraction on the to be-identified text image through a deep residual network model, and inputting the feature matrix into a bidirectional long short short-term memory network BLSTM model and processing the result being used for indicating characters contained in the
Proceedings ArticleDOI

Visual Recognition of Container Number with Arbitrary Orientations Based on Deep Convolutional Neural Network

TL;DR: Experimental results show that the pipeline realizes excellent detection and recognition of container number with arbitrary orientations and various arrangements and the total accuracy of the proposed approach is over 85% at present.
Book ChapterDOI

Improving Machine Understanding of Human Intent in Charts

TL;DR: Wang et al. as mentioned in this paper focus on three key sub-tasks, including chart image classification, text detection and recognition, and text role classification, and design and propose a set of effective methods.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)