Text recognition using deep BLSTM networks

doi:10.1109/ICAPR.2015.7050699

Home
/
Papers
/
Text recognition using deep BLSTM networks

Proceedings Article•DOI•

Text recognition using deep BLSTM networks

Anupama Ray¹, Sai Rajeswar¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

01 Jan 2015-pp 1-6

TL;DR: A Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition that uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment.

read less

Abstract: This paper presents a Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition. This architecture uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment. This work is motivated by the results of Deep Neural Networks for isolated numeral recognition and improved speech recognition using Deep BLSTM based approaches. Deep BLSTM architecture is chosen due to its ability to access long range context, learn sequence alignment and work without the need of segmented data. Due to the use of CTC and forward backward algorithms for alignment of output labels, there are no unicode re-ordering issues, thus no need of lexicon or postprocessing schemes. This is a script independent and segmentation free approach. This system has been implemented for the recognition of unsegmented words of printed Oriya text. This system achieves 4.18% character level error and 12.11% word error rate on printed Oriya text.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Botnet Detection in the Internet of Things using Deep Learning Approaches

[...]

Christopher D. McDermott¹, Farzan Majdani¹, Andrei Petrovski¹•Institutions (1)

Robert Gordon University¹

08 Jul 2018

TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.

...read moreread less

Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

...read moreread less

230 citations

Cites methods from "Text recognition using deep BLSTM n..."

...However [18] demonstrated that a Deep Bidirectional Long Short Term Memory based RNN (BLSTM-RNN) can be used which provides promising results for text recognition....
[...]

Journal Article•

Document Analysis and Recognition

[...]

Takahiro Watanabe

25 Mar 1999-IEICE Transactions on Information and Systems

TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.

...read moreread less

Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

...read moreread less

222 citations

Cites methods from "Text recognition using deep BLSTM n..."

...[26,27], etc, that have achieved higher accuracies than the presently proposed method....
[...]

Journal Article•DOI•

Parallel Architecture of Convolutional Bi-Directional LSTM Neural Networks for Network-Wide Metro Ridership Prediction

[...]

Xiaolei Ma¹, Jiyu Zhang¹, Bowen Du¹, Chuan Ding¹, Leilei Sun² - Show less +1 more•Institutions (2)

Beihang University¹, Tsinghua University²

01 Jun 2019-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively, suitable for ridership prediction in large-scale metro networks is proposed.

...read moreread less

Abstract: Accurate metro ridership prediction can guide passengers in efficiently selecting their departure time and transferring from station to station. An increasing number of deep learning algorithms are being utilized to forecast metro ridership due to the development of computational intelligence. However, limited efforts have been exerted to consider spatiotemporal features, which are important in forecasting ridership through deep learning methods, in large-scale metro networks. To fill this gap, this paper proposes a parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively. Metro ridership data are transformed into ridership images and time series. Spatial features can be learned from ridership image data by using CNN, which demonstrates favorable performance in video detection. Time series data are input into the BLSTM which considers the historical and future impacts of ridership in temporal feature extraction. The two networks are concatenated in parallel and prevented from interfering with each other. Joint spatiotemporal features are fed into a fully connected network for metro ridership prediction. The Beijing metro network is used to demonstrate the efficiency of the proposed algorithm. The proposed model outperforms traditional statistical models, deep learning architectures, and sequential structures, and is suitable for ridership prediction in large-scale metro networks. Metro authorities can thus effectively allocate limited resources to overcrowded areas for service improvement.

...read moreread less

117 citations

Cites methods from "Text recognition using deep BLSTM n..."

...This model consists of a forward and backward LSTM to extract temporal features in two directions; hence, it can effectively capture the periodicity and regularity of ridership data [33]....
[...]

Journal Article•DOI•

A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition

[...]

Ritesh Sarkhel¹, Nibaran Das¹, Amit Saha¹, Mita Nasipuri¹•Institutions (1)

Jadavpur University¹

01 Oct 2016-Pattern Recognition

TL;DR: A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed and an AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi- objective heuristics algorithms.

...read moreread less

99 citations

Journal Article•DOI•

A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts

[...]

Ritesh Sarkhel¹, Nibaran Das¹, Aritra Das¹, Mahantapas Kundu¹, Mita Nasipuri¹ - Show less +1 more•Institutions (1)

Jadavpur University¹

01 Nov 2017-Pattern Recognition

TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.

...read moreread less

88 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Rule Based Evidence Mining for Network Attack

[...]

B. B. Jayasingh¹, Manas Ranjan Patra•Institutions (1)

CVR College of Engineering¹

17 Dec 2007

TL;DR: In this paper, an in-depth study of major security threats are made and a decision tree has been built for evidence mining to help the law enforcement agencies and establish network crimes.

...read moreread less

Abstract: Today the Internet technology has made it possible to form virtual communities in the cyber space thereby facilitating different types of electronic transactions. There is also a negative force to this technology, which is trying to threaten the security aspects. In order to guard against such threats one must adopt a well-planned approach and apply it systematically. In this paper, an in-depth study of major security threats are made and a decision tree has been built for evidence mining to help the law enforcement agencies and establish network crimes.

...read moreread less

36 citations

Proceedings Article•DOI•

Low resolution Arabic recognition with multidimensional recurrent neural networks

[...]

Sheikh Faisal Rashid¹, Marc-Peter Schambach², Jörg Rottland², Stephan von der Nüll²•Institutions (2)

Kaiserslautern University of Technology¹, Siemens²

24 Aug 2013

TL;DR: A multi-font, low resolution, and open vocabulary OCR system based on a multidimensional recurrent neural network architecture that performs very well on the task of printed Arabic text recognition even for very low resolution and small font size images.

...read moreread less

Abstract: OCR of multi-font Arabic text is difficult due to large variations in character shapes from one font to another. It becomes even more challenging if the text is rendered at very low resolution. This paper describes a multi-font, low resolution, and open vocabulary OCR system based on a multidimensional recurrent neural network architecture. For this work, we have developed various systems, trained for single-font/single-size, single-font/multi-size, and multi-font/multi-size data of the well known Arabic printed text image database (APTI). The evaluation tasks from the second Arabic text recognition competition, organized in conjunction with ICDAR 2013, have been adopted. Ten Arabic fonts in six font size categories are used for evaluation. Results show that the proposed method performs very well on the task of printed Arabic text recognition even for very low resolution and small font size images. Overall, the system yields above 99% recognition accuracy at character and word level for most of the printed Arabic fonts.

...read moreread less

36 citations

Proceedings Article•DOI•

Towards a Robust OCR System for Indic Scripts

[...]

Praveen Krishnan¹, Naveen Sankaran¹, Ajeet Kumar Singh¹, C. V. Jawahar¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

07 Apr 2014

TL;DR: A web based OCR system which follows a unified architecture for seven Indian languages, is robust against popular degradations, follows a segmentation free approach, addresses the UNICODE re-ordering issues, and can enable continuous learning with user inputs and feedbacks is proposed.

...read moreread less

Abstract: The current Optical Character Recognition OCR systems for Indic scripts are not robust enough for recognizing arbitrary collection of printed documents. Reasons for this limitation includes the lack of resources (e.g. not enough examples with natural variations, lack of documentation available about the possible font/style variations) and the architecture which necessitates hard segmentation of word images followed by an isolated symbol recognition. Variations among scripts, latent symbol to UNICODE conversion rules, non-standard fonts/styles and large degradations are some of the major reasons for the unavailability of robust solutions. In this paper, we propose a web based OCR system which (i) follows a unified architecture for seven Indian languages, (ii) is robust against popular degradations, (iii) follows a segmentation free approach, (iv) addresses the UNICODE re-ordering issues, and (v) can enable continuous learning with user inputs and feedbacks. Our system is designed to aid the continuous learning while being usable i.e., we capture the user inputs (say example images) for further improving the OCRs. We use the popular BLSTM based transcription scheme to achieve our target. This also enables incremental training and refinement in a seamless manner. We report superior accuracy rates in comparison with the available OCRs for the seven Indian languages.

...read moreread less

24 citations

"Text recognition using deep BLSTM n..." refers methods in this paper

...Naveen et al presented a direct implementation of single layer LSTM network for the recognition of Devanagiri scripts [25], [26] and further experimented on more Indic scripts [27]....
[...]
...Single layer BLSTM network has been used for the recongition of some Indic scripts [27] obtaining around 3% label error and 5-13% word error after training on 1,80,000 words which is comparable to our 4....
[...]

Proceedings Article•DOI•

Devanagari Text Recognition: A Transcription Based Formulation

[...]

Naveen Sankaran, Aman Neelappa, C. V. Jawahar

25 Aug 2013

TL;DR: This paper proposes a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme, and forms the recognition as a direct transcription problem.

...read moreread less

Abstract: Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like Unicode generation, error correction etc.) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol segmentation module that can robustly work in presence of degraded (cut/fused) images and (ii) converting the outputs of the classifiers to a valid sequence of Unicodes. In this paper, we propose a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme. We thus formulate the recognition as a direct transcription problem. Given many examples of feature sequences and their corresponding Unicode representations, our objective is to learn a mapping which can convert a word directly into a Unicode sequence. This formulation has multiple practical advantages: (i) This reduces the number of classes significantly for the Indian scripts. (ii) It removes the need for a reliable word-to-symbol segmentation. (ii) It does not require strong annotation of symbols to design the classifiers, and (iii) It directly generates a valid sequence of Unicodes. We test our method on more than 6000 pages of printed Devanagari documents from multiple sources. Our method consistently outperforms other state of the art implementations.

...read moreread less

18 citations

"Text recognition using deep BLSTM n..." refers methods in this paper

...Naveen et al presented a direct implementation of single layer LSTM network for the recognition of Devanagiri scripts [25], [26] and further experimented on more Indic scripts [27]....
[...]

Proceedings Article•DOI•

Language models for online handwritten Tamil word recognition

[...]

Suresh Sundaram, Bhargava Urala K¹, A. G. Ramakrishnan¹•Institutions (1)

Indian Institute of Science¹

16 Dec 2012

TL;DR: On a test database of around 2000 words, it is found that bigram language models improve symbol and word recognition accuracies and while lexicon methods offer much greater improvements in terms of word recognition, there is a large dependency on choosing the right lexicon.

...read moreread less

Abstract: N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

...read moreread less

10 citations

"Text recognition using deep BLSTM n..." refers background in this paper

...Some work has also been done in the last few years on online handwritten recognition of Telugu script using HMM [23] and online handwritten Tamil word recognition [24] have used segmentation based approaches....
[...]