Book ChapterDOI
Accurate Scene Text Recognition Based on Recurrent Neural Network
Bolan Su,Shijian Lu +1 more
- pp 35-48
Reads0
Chats0
TLDR
This paper presents a novel approach to recognize text in scene images that outperforms the state-of-the-art techniques significantly and is able to recognize the whole word images without character-level segmentation and recognition.Abstract:
Scene text recognition is a useful but very challenging task due to uncontrolled condition of text in natural scenes. This paper presents a novel approach to recognize text in scene images. In the proposed technique, a word image is first converted into a sequential column vectors based on Histogram of Oriented Gradient (HOG). The Recurrent Neural Network (RNN) is then adapted to classify the sequential feature vectors into the corresponding word. Compared with most of the existing methods that follow a bottom-up approach to form words by grouping the recognized characters, our proposed method is able to recognize the whole word images without character-level segmentation and recognition. Experiments on a number of publicly available datasets show that the proposed method outperforms the state-of-the-art techniques significantly. In addition, the recognition results on publicly available datasets provide a good benchmark for the future research in this area.read more
Citations
More filters
Journal ArticleDOI
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi,Xiang Bai,Cong Yao +2 more
TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.
Proceedings ArticleDOI
Robust Scene Text Recognition with Automatic Rectification
TL;DR: This article proposed a robust text recognizer with automatic rectification (RARE), which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN).
Journal ArticleDOI
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
TL;DR: This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.
Proceedings ArticleDOI
FOTS: Fast Oriented Text Spotting with a Unified Network
TL;DR: In this article, a unified end-to-end trainable Fast Oriented Text Spotting (FOTS) network is proposed for simultaneous detection and recognition, sharing computation and visual information among the two complementary tasks.
Journal ArticleDOI
Scene text detection and recognition: recent advances and future trends
Yingying Zhu,Cong Yao,Xiang Bai +2 more
TL;DR: This literature review can serve as a good reference for researchers in the areas of scene text detection and recognition and identify state-of-the-art algorithms, and predict potential research directions in the future.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
Learning to Forget: Continual Prediction with LSTM
TL;DR: This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Proceedings Article
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Alex Graves,Jürgen Schmidhuber +1 more
TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.
Journal ArticleDOI
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Alex Graves,Jürgen Schmidhuber +1 more
TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.