Text recognition using deep BLSTM networks

doi:10.1109/ICAPR.2015.7050699

Home
/
Papers
/
Text recognition using deep BLSTM networks

Proceedings Article•DOI•

Text recognition using deep BLSTM networks

Anupama Ray¹, Sai Rajeswar¹, Santanu Chaudhury¹•Institutions (1)

Indian Institute of Technology Delhi¹

01 Jan 2015-pp 1-6

TL;DR: A Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition that uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment.

read less

Abstract: This paper presents a Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition. This architecture uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment. This work is motivated by the results of Deep Neural Networks for isolated numeral recognition and improved speech recognition using Deep BLSTM based approaches. Deep BLSTM architecture is chosen due to its ability to access long range context, learn sequence alignment and work without the need of segmented data. Due to the use of CTC and forward backward algorithms for alignment of output labels, there are no unicode re-ordering issues, thus no need of lexicon or postprocessing schemes. This is a script independent and segmentation free approach. This system has been implemented for the recognition of unsegmented words of printed Oriya text. This system achieves 4.18% character level error and 12.11% word error rate on printed Oriya text.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Botnet Detection in the Internet of Things using Deep Learning Approaches

[...]

Christopher D. McDermott¹, Farzan Majdani¹, Andrei Petrovski¹•Institutions (1)

Robert Gordon University¹

08 Jul 2018

TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.

...read moreread less

Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

...read moreread less

230 citations

Cites methods from "Text recognition using deep BLSTM n..."

...However [18] demonstrated that a Deep Bidirectional Long Short Term Memory based RNN (BLSTM-RNN) can be used which provides promising results for text recognition....
[...]

Journal Article•

Document Analysis and Recognition

[...]

Takahiro Watanabe

25 Mar 1999-IEICE Transactions on Information and Systems

TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.

...read moreread less

Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

...read moreread less

222 citations

Cites methods from "Text recognition using deep BLSTM n..."

...[26,27], etc, that have achieved higher accuracies than the presently proposed method....
[...]

Journal Article•DOI•

Parallel Architecture of Convolutional Bi-Directional LSTM Neural Networks for Network-Wide Metro Ridership Prediction

[...]

Xiaolei Ma¹, Jiyu Zhang¹, Bowen Du¹, Chuan Ding¹, Leilei Sun² - Show less +1 more•Institutions (2)

Beihang University¹, Tsinghua University²

01 Jun 2019-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively, suitable for ridership prediction in large-scale metro networks is proposed.

...read moreread less

Abstract: Accurate metro ridership prediction can guide passengers in efficiently selecting their departure time and transferring from station to station. An increasing number of deep learning algorithms are being utilized to forecast metro ridership due to the development of computational intelligence. However, limited efforts have been exerted to consider spatiotemporal features, which are important in forecasting ridership through deep learning methods, in large-scale metro networks. To fill this gap, this paper proposes a parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively. Metro ridership data are transformed into ridership images and time series. Spatial features can be learned from ridership image data by using CNN, which demonstrates favorable performance in video detection. Time series data are input into the BLSTM which considers the historical and future impacts of ridership in temporal feature extraction. The two networks are concatenated in parallel and prevented from interfering with each other. Joint spatiotemporal features are fed into a fully connected network for metro ridership prediction. The Beijing metro network is used to demonstrate the efficiency of the proposed algorithm. The proposed model outperforms traditional statistical models, deep learning architectures, and sequential structures, and is suitable for ridership prediction in large-scale metro networks. Metro authorities can thus effectively allocate limited resources to overcrowded areas for service improvement.

...read moreread less

117 citations

Cites methods from "Text recognition using deep BLSTM n..."

...This model consists of a forward and backward LSTM to extract temporal features in two directions; hence, it can effectively capture the periodicity and regularity of ridership data [33]....
[...]

Journal Article•DOI•

A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition

[...]

Ritesh Sarkhel¹, Nibaran Das¹, Amit Saha¹, Mita Nasipuri¹•Institutions (1)

Jadavpur University¹

01 Oct 2016-Pattern Recognition

TL;DR: A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed and an AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi- objective heuristics algorithms.

...read moreread less

99 citations

Journal Article•DOI•

A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts

[...]

Ritesh Sarkhel¹, Nibaran Das¹, Aritra Das¹, Mahantapas Kundu¹, Mita Nasipuri¹ - Show less +1 more•Institutions (1)

Jadavpur University¹

01 Nov 2017-Pattern Recognition

TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.

...read moreread less

88 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

High-Performance OCR for Printed English and Fraktur Using LSTM Networks

[...]

Thomas M. Breuel¹, Adnan Ul-Hasan¹, Mayce Ali Al-Azawi¹, Faisal Shafait²•Institutions (2)

Kaiserslautern University of Technology¹, University of Western Australia²

25 Aug 2013

TL;DR: An application of bidirectional LSTM networks to the problem of machine-printed Latin and Fraktur recognition and these recognition accuracies were found without using any language modelling or any other post-processing techniques.

...read moreread less

Abstract: Long Short-Term Memory (LSTM) networks have yielded excellent results on handwriting recognition. This paper describes an application of bidirectional LSTM networks to the problem of machine-printed Latin and Fraktur recognition. Latin and Fraktur recognition differs significantly from handwriting recognition in both the statistical properties of the data, as well as in the required, much higher levels of accuracy. Applications of LSTM networks to handwriting recognition use two-dimensional recurrent networks, since the exact position and baseline of handwritten characters is variable. In contrast, for printed OCR, we used a one-dimensional recurrent network combined with a novel algorithm for baseline and x-height normalization. A number of databases were used for training and testing, including the UW3 database, artificially generated and degraded Fraktur text and scanned pages from a book digitization project. The LSTM architecture achieved 0.6% character-level test-set error on English text. When the artificially degraded Fraktur data set is divided into training and test sets, the system achieves an error rate of 1.64%. On specific books printed in Fraktur (not part of the training set), the system achieves error rates of 0.15% (Fontane) and 1.47% (Ersch-Gruber). These recognition accuracies were found without using any language modelling or any other post-processing techniques.

...read moreread less

241 citations

"Text recognition using deep BLSTM n..." refers methods in this paper

...[15] used 95,338 lines for training and 1,020 for test in case of printed English text to obtain an error of 0....
[...]
...LSTM has been used for the recognition of printed Urdu Nastaleeq script [14] and printed English and Fraktur scripts [15]....
[...]

Proceedings Article•

A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks

[...]

Marcus Liwicki, Alex Graves, Horst Bunke, Jürgen Schmidhuber

01 Jan 2007

TL;DR: A new connectionist approach to on-line handwriting recognition and address in particular the problem of recognizing handwritten whiteboard notes using a recently introduced objective function, known as Connectionist Temporal Classification (CTC), that directly trains the network to label unsegmented sequence data.

...read moreread less

Abstract: In this paper we introduce a new connectionist approach to on-line handwriting recognition and address in particular the problem of recognizing handwritten whiteboard notes. The approach uses a bidirectional recurrent neural network with the long short-term memory architecture. We use a recently introduced objective function, known as Connectionist Temporal Classification (CTC), that directly trains the network to label unsegmented sequence data. Our new system achieves a word recognition rate of 74.0%, compared with 65.4% using a previously developed HMMbased recognition system.

...read moreread less

204 citations

"Text recognition using deep BLSTM n..." refers background in this paper

...Long Short Term Memory based Recurrent Neural network architecture has been widely used for speech recognition [7], [8], text recognition [9], social signal prediction [10], emotion recognition [11] and time series prediction problems since it has the ability of sequence learning....
[...]

Proceedings Article•

Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling

[...]

Martin Wöllmer¹, Angeliki Metallinou¹, Florian Eyben¹, Björn Schuller², Shrikanth S. Narayanan² - Show less +1 more•Institutions (2)

Technische Universität München¹, University of Southern California²

01 Jan 2010

TL;DR: A context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues is applied, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database.

...read moreread less

Abstract: In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long ShortTerm Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72 %, 65 %, and 55 % for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively.

...read moreread less

183 citations

"Text recognition using deep BLSTM n..." refers background in this paper

...Long Short Term Memory based Recurrent Neural network architecture has been widely used for speech recognition [7], [8], text recognition [9], social signal prediction [10], emotion recognition [11] and time series prediction problems since it has the ability of sequence learning....
[...]

Proceedings Article•DOI•

Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks

[...]

Adnan Ul-Hasan¹, Saad Bin Ahmed¹, Faisal Rashid¹, Faisal Shafait², Thomas M. Breuel¹ - Show less +1 more•Institutions (2)

Kaiserslautern University of Technology¹, University of Western Australia²

25 Aug 2013

TL;DR: This work has presented the results of applying RNN to printed Urdu text in Nastaleeq script, and evaluated BLSTM networks for two cases: one ignoring the character's shape variations and the second is considering them.

...read moreread less

Abstract: Recurrent neural networks (RNN) have been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nabataean scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have presented the results of applying RNN to printed Urdu text in Nastaleeq script. Bidirectional Long Short Term Memory (BLSTM) architecture with Connectionist Temporal Classification (CTC) output layer was employed to recognize printed Urdu text. We evaluated BLSTM networks for two cases: one ignoring the character's shape variations and the second is considering them. The recognition error rate at character level for first case is 5.15% and for the second is 13.6%. These results were obtained on synthetically generated UPTI dataset containing artificially degraded images to reflect some real-world scanning artifacts along with clean images. Comparison with shape-matching based method is also presented.

...read moreread less

112 citations

"Text recognition using deep BLSTM n..." refers methods in this paper

...LSTM has been used for the recognition of printed Urdu Nastaleeq script [14] and printed English and Fraktur scripts [15]....
[...]

Proceedings Article•DOI•

Online handwritten Bangla character recognition using HMM

[...]

Swapan K. Parui, K. Guin, Ujjwal Bhattacharya, Bidyut B. Chaudhuri

01 Dec 2008

TL;DR: A novel scheme for recognition of online handwritten basic characters of Bangla, an Indian script used by more than 200 million people, is described here, using a database of 24,500 online handwritten isolated character samples written by 70 persons.

...read moreread less

Abstract: We describe here a novel scheme for recognition of online handwritten basic characters of Bangla, an Indian script used by more than 200 million people. There are 50 basic characters in Bangla and we have used a database of 24,500 online handwritten isolated character samples written by 70 persons. Samples in this database are composed of one or more strokes and we have collected all the strokes obtained from the training samples of the 50 character classes. These strokes are manually grouped into 54 classes based on the shape similarity of the graphemes that constitute the ideal character shapes. Strokes are recognized by using hidden Markov models (HMM). One HMM is constructed for each stroke class. A second stage of classification is used for recognition of characters using stroke classification results along with 50 look-up-tables (for 50 character classes).

...read moreread less

108 citations

"Text recognition using deep BLSTM n..." refers methods in this paper

...Some work has also been done in the last few years on online handwritten recognition of Telugu script using HMM [23] and online handwritten Tamil word recognition [24] have used segmentation based approaches....
[...]
...Traditionally, different handcrafted features have been used for text recognition of Oriya [20], Bangla[21] and classifiers like HMM [22], SVM and CRF has been widely used....
[...]
...LSTM based approaches have outperformed HMM based ones for handwriting recognition proving that learnt features are better than handcrafted features [17]....
[...]
...In segmentation free approaches sequential classifiers like Hidden Markov Model(HMM) and graphical models like Conditional Random Fields(CRF) have been used....
[...]