scispace - formally typeset
Open AccessPosted Content

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Reads0
Chats0
TLDR
This work presents a framework for the recognition of natural scene text that does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past.
Abstract
In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

AUTNT - A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN

TL;DR: A new multi-script dataset of text and non-text components have been reported along with multi-purpose ground truth annotations and a Deep Convolution Neural Network (D-CNN) based automated feature extraction and classification framework is developed for benchmarking purpose.
Journal ArticleDOI

Semi-Supervised Pixel-Level Scene Text Segmentation by Mutually Guided Network

TL;DR: Zhang et al. as mentioned in this paper proposed a mutually guided network which produces a polygon-level mask in one branch and a pixel-level text mask in the other, which serve as guidance for each other and the whole network is trained via a semi-supervised learning strategy.
Book ChapterDOI

AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks

Dmitrijs Kass, +1 more
TL;DR: The authors proposed an attention-based sequence-to-sequence model for handwritten word recognition and explored transfer learning for data-efficient training of HTR systems, which leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models.
Proceedings ArticleDOI

A Probabilistic Retrieval Model for Word Spotting Based on Direct Attribute Prediction

TL;DR: This work presents a new approach for ranking retrieval lists originally proposed for zero-shot learning where attribute representations play an important role, and shows that this probabilistic ranking improves word spotting performance, especially in the query-by-string scenario.
Posted Content

KISS: Keeping It Simple for Scene Text Recognition.

TL;DR: A new model for scene text recognition that only consists of off-the-shelf building blocks for neural networks, which reaches state-of- the-art or competitive performance, although the model does not use methods like 2D-attention, or image rectification.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Journal ArticleDOI

DRC: a dual route cascaded model of visual word recognition and reading aloud.

TL;DR: The DRC model is a computational realization of the dual-route theory of reading, and is the only computational model of reading that can perform the 2 tasks most commonly used to study reading: lexical decision and reading aloud.
Proceedings Article

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

TL;DR: In this article, a multiscale and sliding window approach is proposed to predict object boundaries, which is then accumulated rather than suppressed in order to increase detection confidence, and OverFeat is the winner of the ImageNet Large Scale Visual Recognition Challenge 2013.
Related Papers (5)