Open AccessPosted Content
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
Reads0
Chats0
TLDR
This work presents a framework for the recognition of natural scene text that does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past.Abstract:
In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.read more
Citations
More filters
Journal ArticleDOI
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
TL;DR: In this paper , a modal adaption is proposed to bridge the OCR encoder and MT decoder to align the feature distribution of OCR and MT tasks, and the proposed method outperforms the existing two-stage cascade and one-stage end-to-end models with a lighter and faster architecture.
Journal ArticleDOI
An empirical study of CTC based models for OCR of Indian languages
Minesh Mathew,C. V. Jawahar +1 more
TL;DR:
Journal ArticleDOI
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
TL;DR: Sterling et al. as discussed by the authors proposed a global linguistic reconstruction module (GLRM) to improve the representation of visual features by perceiving the linguistic information in the visual space, which gradually converts visual features into semantically rich ones during the cascade process.
Posted Content
Why You Should Try the Real Data for the Scene Text Recognition.
TL;DR: In this article, a text recognition head architecture from the Yet Another Mask Text Spotter (SOTA) was used for training text recognition models on a large synthetic dataset, which is comparable to the SOTA results.
Seatbelt Segmentation Using Synthetic Images
TL;DR: In this paper , the authors exploit the textureless and shape characteristics of the seatbelts to programmatically synthesize images, and then fine-tune the model using naturalistic images extracted from online video sharing websites.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Posted Content
Improving neural networks by preventing co-adaptation of feature detectors
TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Journal ArticleDOI
DRC: a dual route cascaded model of visual word recognition and reading aloud.
TL;DR: The DRC model is a computational realization of the dual-route theory of reading, and is the only computational model of reading that can perform the 2 tasks most commonly used to study reading: lexical decision and reading aloud.
Proceedings Article
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
TL;DR: In this article, a multiscale and sliding window approach is proposed to predict object boundaries, which is then accumulated rather than suppressed in order to increase detection confidence, and OverFeat is the winner of the ImageNet Large Scale Visual Recognition Challenge 2013.
Related Papers (5)
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi,Xiang Bai,Cong Yao +2 more