Open AccessPosted Content
Scene Text Detection and Recognition: The Deep Learning Era
Shangbang Long,Xin He,Cong Yao +2 more
Reads0
Chats0
TLDR
This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era.Abstract:
With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inescapably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, approach and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and the grand challenges still remained. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected and compiled in our Github repository: this https URL.read more
Citations
More filters
Journal ArticleDOI
MASTER: Multi-aspect non-local network for scene text recognition
TL;DR: Wen et al. as discussed by the authors proposed MASTER, a self-attention based scene text recognizer that not only encodes the input-output attention but also learns selfattention which encodes feature-feature and target-target relationships inside the encoder and decoder and owns a great training efficiency because of high training parallelization and a high speed inference because of an efficient memory-cache mechanism.
Posted Content
Decoupled Attention Network for Text Recognition
Tianwei Wang,Yuanzhi Zhu,Lianwen Jin,Canjie Luo,Xiaoxue Chen,Yaqiang Wu,Qianying Wang,Mingxiang Cai +7 more
TL;DR: A decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results, and achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.
Posted Content
Text Recognition in the Wild: A Survey
TL;DR: This literature review attempts to present the entire picture of the field of scene text recognition, which provides a comprehensive reference for people entering this field, and could be helpful to inspire future research.
Journal ArticleDOI
Text Recognition in the Wild: A Survey
TL;DR: A recent literature review as discussed by the authors summarizes the fundamental problems and the state-of-the-art associated with scene text recognition, introduces new insights and ideas, provides a comprehensive review of publicly available resources, and points out directions for future work.
Posted Content
Towards Unconstrained End-to-End Text Spotting
TL;DR: This article proposed an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Book ChapterDOI
U-Net: Convolutional Networks for Biomedical Image Segmentation
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.