AON: Towards Arbitrarily-Oriented Text Recognition

doi:10.1109/CVPR.2018.00584

Open AccessProceedings ArticleDOI

AON: Towards Arbitrarily-Oriented Text Recognition

Zhanzhan Cheng, +5 more

- pp 5571-5579

Chats0

TLDR

The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets.

Abstract:

Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.

Citations

PDF

Open Access

More filters

Posted Content

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Pengyuan Lyu, +4 more

- 06 Jul 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper investigates the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images, and proposes an end-to-end trainable neural network model, named as Mask TextSpotter, which is inspired by the newly published work Mask R-CNN.

...read moreread less

Journal ArticleDOI

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Hui Li, +3 more

TL;DR: This work proposes an easy-to-implement strong baseline for irregular scene text recognition, using off- the-shelf neural network components and only word-level annotations, and achieves state-of-the-art performance on both regular and irregular sceneText recognition benchmarks.

...read moreread less

Journal ArticleDOI

MORAN: A Multi-Object Rectified Attention Network for scene text recognition

Canjie Luo, +2 more

- 01 Jun 2019 -

Pattern Recognition

TL;DR: A multi-object rectified attention network (MORAN) for general scene text recognition that can read both regular and irregular scene text and achieves state-of-the-art performance.

...read moreread less

Proceedings ArticleDOI

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

Jeonghun Baek, +7 more

TL;DR: In this paper, a unified four-stage scene text recognition (STR) framework is introduced to compare the performance of different models. But, the performance gap results from inconsistencies in the training and evaluation datasets.

...read moreread less

Proceedings ArticleDOI

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Fangneng Zhan, +1 more

TL;DR: Li et al. as discussed by the authors proposed an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better text recognition performance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 20 Jun 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less