scispace - formally typeset
Open AccessProceedings ArticleDOI

AON: Towards Arbitrarily-Oriented Text Recognition

Reads0
Chats0
TLDR
The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets.
Abstract
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

TL;DR: This paper investigates the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images, and proposes an end-to-end trainable neural network model, named as Mask TextSpotter, which is inspired by the newly published work Mask R-CNN.
Journal ArticleDOI

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

TL;DR: This work proposes an easy-to-implement strong baseline for irregular scene text recognition, using off- the-shelf neural network components and only word-level annotations, and achieves state-of-the-art performance on both regular and irregular sceneText recognition benchmarks.
Journal ArticleDOI

MORAN: A Multi-Object Rectified Attention Network for scene text recognition

TL;DR: A multi-object rectified attention network (MORAN) for general scene text recognition that can read both regular and irregular scene text and achieves state-of-the-art performance.
Proceedings ArticleDOI

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

TL;DR: In this paper, a unified four-stage scene text recognition (STR) framework is introduced to compare the performance of different models. But, the performance gap results from inconsistencies in the training and evaluation datasets.
Proceedings ArticleDOI

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

TL;DR: Li et al. as discussed by the authors proposed an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better text recognition performance.
References
More filters
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)