A Simple and Effective Solution for Script Identification in the Wild

doi:10.1109/DAS.2016.57

Home
/
Papers
/
A Simple and Effective Solution for Script Identification in the Wild

Proceedings Article•DOI•

A Simple and Effective Solution for Script Identification in the Wild

Ajeet Kumar Singh¹, Anand Mishra¹, Pranav Dabral¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

11 Apr 2016-pp 428-433

TL;DR: This work presents an approach for automatically identifying the script of the text localized in the scene images using an off-the-shelf classifier, which is efficient and requires very less labeled data.

read less

Abstract: We present an approach for automatically identifying the script of the text localized in the scene images. Our approach is inspired by the advancements in mid-level features. We represent the text images using mid-level features which are pooled from densely computed local features. Once text images are represented using the proposed mid-level feature representation, we use an off-the-shelf classifier to identify the script of the text image. Our approach is efficient and requires very less labeled data. We evaluate the performance of our method on a recently introduced CVSI dataset, demonstrating that the proposed approach can correctly identify script of 96.70% of the text images. In addition, we also introduce and benchmark a more challenging Indian Language Scene Text (ILST) dataset for evaluating the performance of our method.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT

[...]

Nibal Nayef, Fei Yin¹, Imen Bizid, Hyun-Soo Choi², Yuan Feng¹, Dimosthenis Karatzas³, Zhenbo Luo², Umapada Pal, Christophe Rigaud, Joseph Chazalon, Wafa Khlif, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Cheng-Lin Liu, Jean-Marc Ogier - Show less +11 more•Institutions (3)

Chinese Academy of Sciences¹, Samsung², CVC Capital Partners³

01 Nov 2017

TL;DR: This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge, which aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together.

...read moreread less

Abstract: Text detection and recognition in a natural environment are key components of many applications, ranging from business card digitization to shop indexation in a street. This competition aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text (MLT) in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together. This competition is an extension of the Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context. The proposed competition is presented as a new challenge of the RRC. The dataset built for this challenge largely extends the previous RRC editions in many aspects: the multi-lingual text, the size of the dataset, the multi-oriented text, the wide variety of scenes. The dataset is comprised of 18,000 images which contain text belonging to 9 languages. The challenge is comprised of three tasks related to text detection and script classification. We have received a total of 16 participations from the research and industrial communities. This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge.

...read moreread less

321 citations

Cites background from "A Simple and Effective Solution for..."

...The previous editions of RRC competitions [1], [2] and other works [3], [4], [5], [6], [7], have provided useful datasets to help researchers tackle each of those problems in order to robustly read text in natural scene images....
[...]
...Despite the available datasets related to scene text detection or to script identification [2], [3], [4], [5], [6], [7], our dataset offers interesting novel aspects....
[...]

Proceedings Article•DOI•

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019

[...]

Nibal Nayef, Cheng-Lin Liu¹, Jean-Marc Ogier², Yash Patel³, Michal Busta³, Pinaki Nath Chowdhury⁴, Dimosthenis Karatzas⁵, Wafa Khlif², Jiri Matas³, Umapada Pal⁴, Jean-Christophe Burie² - Show less +7 more•Institutions (5)

Chinese Academy of Sciences¹, University of La Rochelle², Czech Technical University in Prague³, Indian Statistical Institute⁴, Autonomous University of Barcelona⁵

01 Sep 2019

TL;DR: The RRC-MLT-2019 challenge as discussed by the authors was the first edition of the multi-lingual scene text (MLT) detection and recognition challenge, which aims to systematically benchmark and push the state-of-the-art forward.

...read moreread less

Abstract: With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.

...read moreread less

175 citations

Journal Article•DOI•

Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network

[...]

Ankan Kumar Bhunia¹, Aishik Konwer², Ayan Kumar Bhunia², Abir Bhowmick², Partha Pratim Roy³, Umapada Pal⁴ - Show less +2 more•Institutions (4)

Jadavpur University¹, Future Institute of Engineering and Management², Indian Institute of Technology Roorkee³, Indian Statistical Institute⁴

01 Jan 2019-Pattern Recognition

TL;DR: A novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification is proposed and achieves superior results in comparison to conventional methods.

...read moreread less

110 citations

Posted Content•

Text Recognition in the Wild: A Survey

[...]

Xiaoxue Chen¹, Lianwen Jin¹, Yuanzhi Zhu¹, Canjie Luo¹, Tianwei Wang¹ - Show less +1 more•Institutions (1)

South China University of Technology¹

07 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This literature review attempts to present the entire picture of the field of scene text recognition, which provides a comprehensive reference for people entering this field, and could be helpful to inspire future research.

...read moreread less

Abstract: The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research field in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising in terms of innovation, practicality, and efficiency. This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work. In summary, this literature review attempts to present the entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research. Related resources are available at our Github repository: this https URL.

...read moreread less

72 citations

Cites background from "A Simple and Effective Solution for..."

...Script identification can be interpreted as an image classification problem, where discriminative representations are usually designed, such as mid-level features [81], [82], convolutional features [83], [84], [85], and stroke-parts representations [86]....
[...]

Journal Article•DOI•

Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images

[...]

Liqiong Lu¹, Yaohua Yi¹, Faliang Huang², Kaili Wang¹, Qi Wang³ - Show less +1 more•Institutions (3)

Wuhan University¹, Fujian Normal University², Nanjing Forestry University³

18 Apr 2019-IEEE Access

TL;DR: A novel framework integrating Local CNN and Global CNN both of which are based on ResNet-20 for script identification is presented, which fully exploits the local features of the image, effectively revealing subtle differences among the scripts that are difficult to distinguish.

...read moreread less

Abstract: Script identification in natural scene images is a key pre-step for text recognition and is also an indispensable condition for automatic text understanding systems that are designed for multi-language environments. In this paper, we present a novel framework integrating Local CNN and Global CNN both of which are based on ResNet-20 for script identification. We first obtain a lot of patches and segmented images based on the aspect ratios of the images. Subsequently, these patches and segmented images are used as inputs to Local CNN and Global CNN for training, respectively. Finally, to get the final results, the Adaboost algorithm is used to combine the results of Local CNN and Global CNN for decision-level fusion. Benefiting from such a strategy, Local CNN fully exploits the local features of the image, effectively revealing subtle differences among the scripts that are difficult to distinguish such as English, Greek, and Russian. Moreover, Global CNN mines the global features of the image to improve the accuracy of script identification. The experimental results demonstrate that our approach has a good performance on four public datasets.

...read moreread less

43 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Journal Article•DOI•

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

[...]

Timo Ojala¹, Matti Pietikäinen¹, Topi Mäenpää¹•Institutions (1)

University of Oulu¹

01 Jul 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.

...read moreread less

Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

...read moreread less

14,245 citations

"A Simple and Effective Solution for..." refers methods in this paper

...We compare our methods with popular features used for script identifications in document images namely LBP [9], Gabor features [7]....
[...]
...Texture based features such as Gabor filter [7], LBP [9] have been used for script identification....
[...]
...67% which is significantly better than methods used in document image script identification domain such as [7, 9]....
[...]
...We compare our methods with popular features used for script identifications in document images namely LBP [9], Language Success Failure...
[...]

Proceedings Article•DOI•

Learning mid-level features for recognition

[...]

Y-Lan Boureau¹, Francis Bach¹, Yann LeCun², Jean Ponce¹•Institutions (2)

École Normale Supérieure¹, New York University²

13 Jun 2010

TL;DR: This work seeks to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules and pooling schemes and shows how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding.

...read moreread less

Abstract: Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.

...read moreread less

1,177 citations

"A Simple and Effective Solution for..." refers background or methods in this paper

...Mid-level features have achieved noticeable success in image classification and retrieval tasks [11, 12, 10]....
[...]
...Our method is inspired by recent advancements made in mid-level features [10, 11, 12]....
[...]

Proceedings Article•DOI•

Scene Text Recognition using Higher Order Language Priors

[...]

Anand Mishra, Karteek Alahari¹, C. V. Jawahar•Institutions (1)

École Normale Supérieure¹

07 Sep 2009

TL;DR: A framework is presented that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary, and achieves significant improvement in word recognition accuracies without using a restricted word list.

...read moreread less

Abstract: The problem of recognizing text in images taken in the wild has gained significant attention from the computer vision community in recent years. Contrary to recognition of printed documents, recognizing scene text is a challenging problem. We focus on the problem of recognizing text extracted from natural scene images and the web. Significant attempts have been made to address this problem in the recent past. However, many of these works benefit from the availability of strong context, which naturally limits their applicability. In this work we present a framework that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary. We show experimental results on publicly available datasets. Furthermore, we introduce a large challenging word dataset with five thousand words to evaluate various steps of our method exhaustively. The main contributions of this work are: (1) We present a framework, which incorporates higher order statistical language models to recognize words in an unconstrained manner (i.e. we overcome the need for restricted word lists, and instead use an English dictionary to compute the priors). (2) We achieve significant improvement (more than 20%) in word recognition accuracies without using a restricted word list. (3) We introduce a large word recognition dataset (atleast 5 times larger than other public datasets) with character level annotation and benchmark it.

...read moreread less

789 citations

"A Simple and Effective Solution for..." refers background in this paper

...Scene text understanding has gained huge attention in last decade, and several benchmark datasets has been introduced [13, 14]....
[...]

Posted Content•

Unsupervised Discovery of Mid-Level Discriminative Patches

[...]

Saurabh Singh¹, Abhinav Gupta¹, Alexei A. Efros¹•Institutions (1)

Carnegie Mellon University¹

14 May 2012-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation is discovered. But these patches are not restricted to be any one of the parts, objects, visual phrases, etc.

...read moreread less

Abstract: The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation. The desired patches need to satisfy two requirements: 1) to be representative, they need to occur frequently enough in the visual world; 2) to be discriminative, they need to be different enough from the rest of the visual world. The patches could correspond to parts, objects, "visual phrases", etc. but are not restricted to be any one of them. We pose this as an unsupervised discriminative clustering problem on a huge dataset of image patches. We use an iterative procedure which alternates between clustering and training discriminative classifiers, while applying careful cross-validation at each step to prevent overfitting. The paper experimentally demonstrates the effectiveness of discriminative patches as an unsupervised mid-level visual representation, suggesting that it could be used in place of visual words for many tasks. Furthermore, discriminative patches can also be used in a supervised regime, such as scene classification, where they demonstrate state-of-the-art performance on the MIT Indoor-67 dataset.

...read moreread less

539 citations

Proceedings Article•

Character recognition in natural images

[...]

Teofilo de Campos¹, Bodla Rakesh Babu², Manik Varma³•Institutions (3)

Xerox¹, International Institute of Information Technology, Hyderabad², Microsoft³

01 Feb 2009

TL;DR: It is demonstrated that the performance of the proposed method can be far superior to that of commercial OCR systems, and can benefit from synthetically generated training data obviating the need for expensive data collection and annotation.

...read moreread less

Abstract: This paper tackles the problem of recognizing characters in images of natural scenes. In particular, we focus on recognizing characters in situations that would traditionally not be handled well by OCR techniques. We present an annotated database of images containing English and Kannada characters. The database comprises of images of street scenes taken in Bangalore, India using a standard camera. The problem is addressed in an object cateogorization framework based on a bag-of-visual-words representation. We assess the performance of various features based on nearest neighbour and SVM classification. It is demonstrated that the performance of the proposed method, using as few as 15 training images, can be far superior to that of commercial OCR systems. Furthermore, the method can benefit from synthetically generated training data obviating the need for expensive data collection and annotation.

...read moreread less

520 citations

"A Simple and Effective Solution for..." refers background in this paper

...Languages # scene images # word images Mode of collection Hindi 76 514 Authors, Google Images Malayalam 121 515 Authors, Google Images Kannada 115 534 Char74K [16] Tamil 59 563 Authors Telugu 79 510 Authors English 128 850 Authors total 578 3486 -...
[...]