Accurate Scene Text Recognition Based on Recurrent Neural Network

doi:10.1007/978-3-319-16865-4_3

Book ChapterDOI

Accurate Scene Text Recognition Based on Recurrent Neural Network

Bolan Su, +1 more

- pp 35-48

Chats0

TLDR

This paper presents a novel approach to recognize text in scene images that outperforms the state-of-the-art techniques significantly and is able to recognize the whole word images without character-level segmentation and recognition.

Abstract:

Scene text recognition is a useful but very challenging task due to uncontrolled condition of text in natural scenes. This paper presents a novel approach to recognize text in scene images. In the proposed technique, a word image is first converted into a sequential column vectors based on Histogram of Oriented Gradient (HOG). The Recurrent Neural Network (RNN) is then adapted to classify the sequential feature vectors into the corresponding word. Compared with most of the existing methods that follow a bottom-up approach to form words by grouping the recognized characters, our proposed method is able to recognize the whole word images without character-level segmentation and recognition. Experiments on a number of publicly available datasets show that the proposed method outperforms the state-of-the-art techniques significantly. In addition, the recognition results on publicly available datasets provide a good benchmark for the future research in this area.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

Baoguang Shi, +2 more

- 01 Nov 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.

...read moreread less

Proceedings ArticleDOI

Robust Scene Text Recognition with Automatic Rectification

Baoguang Shi, +4 more

TL;DR: This article proposed a robust text recognizer with automatic rectification (RARE), which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN).

...read moreread less

Journal ArticleDOI

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

Baoguang Shi, +5 more

- 01 Sep 2019 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

...read moreread less

Proceedings ArticleDOI

FOTS: Fast Oriented Text Spotting with a Unified Network

Xuebo Liu, +5 more

TL;DR: In this article, a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network is proposed for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks.

...read moreread less

Journal ArticleDOI

Scene text detection and recognition: recent advances and future trends

Yingying Zhu, +2 more

- 01 Feb 2016 -

Frontiers of Computer Science

TL;DR: This literature review can serve as a good reference for researchers in the areas of scene text detection and recognition and identify state-of-the-art algorithms, and predict potential research directions in the future.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Journal ArticleDOI

Learning to Forget: Continual Prediction with LSTM

Felix A. Gers, +2 more

- 01 Oct 2000 -

Neural Computation

TL;DR: This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

...read moreread less

Proceedings Article

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Alex Graves, +1 more

TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

...read moreread less

Journal ArticleDOI

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Alex Graves, +1 more

- 01 Jun 2005 -

Neural Networks

TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

...read moreread less