scispace - formally typeset
Open AccessPosted Content

Scene Text Detection and Recognition: The Deep Learning Era

Reads0
Chats0
TLDR
This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era.
Abstract
With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and the grand challenges still remained. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected and compiled in our Github repository: this https URL.

read more

Citations
More filters
Journal ArticleDOI

MASTER: Multi-aspect non-local network for scene text recognition

TL;DR: Wen et al. as discussed by the authors proposed MASTER, a self-attention based scene text recognizer that not only encodes the input-output attention but also learns selfattention which encodes feature-feature and target-target relationships inside the encoder and decoder and owns a great training efficiency because of high training parallelization and a high speed inference because of an efficient memory-cache mechanism.
Posted Content

Decoupled Attention Network for Text Recognition

TL;DR: A decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results, and achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.
Posted Content

Text Recognition in the Wild: A Survey

TL;DR: This literature review attempts to present the entire picture of the field of scene text recognition, which provides a comprehensive reference for people entering this field, and could be helpful to inspire future research.
Journal ArticleDOI

Text Recognition in the Wild: A Survey

TL;DR: A recent literature review as discussed by the authors summarizes the fundamental problems and the state-of-the-art associated with scene text recognition, introduces new insights and ideas, provides a comprehensive review of publicly available resources, and points out directions for future work.
Posted Content

Towards Unconstrained End-to-End Text Spotting

TL;DR: This article proposed an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Related Papers (5)