scispace - formally typeset
Proceedings ArticleDOI

Text recognition using deep BLSTM networks

01 Jan 2015-pp 1-6

...read more


Citations
More filters
Journal Article

[...]

TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.
Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

221 citations


Cites methods from "Text recognition using deep BLSTM n..."

  • [...]

Proceedings ArticleDOI

[...]

08 Jul 2018
TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.
Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

108 citations


Cites methods from "Text recognition using deep BLSTM n..."

  • [...]

Journal ArticleDOI

[...]

TL;DR: A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed and an AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi- objective heuristics algorithms.
Abstract: Identifying the most informative local regions of a handwritten character image is necessary for a robust handwritten character recognition system. But identifying them from a character image is a difficult task. If this task were to be performed incurring minimum possible cost, it becomes more challenging due to having two independent, apparently contradicting objectives which need to be optimized simultaneously, i.e. maximizing the recognition accuracy and minimizing the associated recognition cost. To address the problem a multi-objective approach is required. In the present task, two popular multi-objective optimization Algorithm (1) a Non-Dominated Sorting Harmony-Search Algorithm (NSHA) and (2) a Non-Dominated Sorting Genetic Algorithm-II (NSGA-II, Deb et al., 2002 18) are employed for region sampling separately. The method objectively selects the most informative set of local regions using the framework of Axiomatic Fuzzy Set (AFS) theory, from the sets of pareto-optimal solutions provided by the multi-objective region sampling algorithms. The system has been evaluated on two isolated handwritten Bangla datasets, (1) a dataset of randomly mixed handwritten Bangla Basic and Compound characters and (2) a dataset of handwritten Bangla numerals separately, with SVM based classifier, using a feature set containing convex-hull based features and CG based quad-tree partitioned longest-run based local features extracted from the selected local regions. The results have shown a significant increase in recognition accuracy and decrease in recognition cost for all the datasets. Thus the present system introduces a cost effective approach towards isolated handwritten character recognition systems. Schematic representation of the integrated system developed under present work.Display Omitted Developed a cost effective approach towards handwritten character recognition system.A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed.A non-dominated sorting harmony search algorithm based region sampling and a non-dominated sorting genetic algorithm based region sampling methodology have been developed.An AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi-objective heuristics algorithms.Maximum recognition accuracies of 86.6478% and 98.23% have been achieved with 0.234% and 12.60% decrease in recognition cost for handwritten Bangla characters and digits respectively.

83 citations

Journal ArticleDOI

[...]

TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.
Abstract: Recognition of handwritten characters is a challenging task Variations in writing styles from one person to another, as well as for a single individual from time to time, make this task harder Hence, identifying the local invariant patterns of a handwritten character or digit is very difficult These challenges can be overcome by exploiting various script specific characteristics and training the OCR system based on these special traits Finding ubiquitous invariant patterns and peculiarities, applicable for handwritten characters or digits of multiple scripts, is much more difficult In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose A deep quad-tree based staggered prediction model has been proposed for faster character recognition These denote the most significant contributions of the present work The proposed methodology has been tested on 9 publicly available datasets of isolated handwritten characters or digits of Indic scripts Promising results have been achieved by the proposed system for all of the datasets A comparative analysis has also been performed against some of the contemporary OCR systems to prove the superiority of the proposed system We have also evaluated our system on MNIST dataset and achieved a maximum recognition accuracy of 9974%, without any data augmentation to the original dataset

65 citations

Journal ArticleDOI

[...]

TL;DR: An implicit segmentation based recognition system for Urdu text lines in Nastaliq script that relies on sliding overlapped windows on lines of text and extracting a set of statistical features is presented.
Abstract: Optical Character Recognition of cursive scripts remains a challenging task due to a large number of character shapes, inter- and intra-word overlaps, context sensitivity and diagonality of text. This paper presents an implicit segmentation based recognition system for Urdu text lines in Nastaliq script. The proposed technique relies on sliding overlapped windows on lines of text and extracting a set of statistical features. The extracted features are fed to a multi-dimensional long short term memory recurrent neural network (MDLSTM RNN) with a connectionist temporal classification (CTC) output layer that labels the character sequences. Experimental study of the proposed technique is carried out on the standard Urdu Printed Text-line Images (UPTI) database which comprises 10,000 text lines in Nastaliq font. Evaluations under different experimental settings realize promising recognition rates with a highest character recognition rate of 96.40%.

61 citations


References
More filters
Journal ArticleDOI

[...]

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

49,735 citations


"Text recognition using deep BLSTM n..." refers methods in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

13,005 citations


"Text recognition using deep BLSTM n..." refers background in this paper

  • [...]

Proceedings ArticleDOI

[...]

26 May 2013
TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,938 citations


"Text recognition using deep BLSTM n..." refers methods in this paper

  • [...]

  • [...]

Posted Content

[...]

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,310 citations

Proceedings ArticleDOI

[...]

25 Jun 2006
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Abstract: Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

3,570 citations


"Text recognition using deep BLSTM n..." refers background or methods in this paper

  • [...]

  • [...]

  • [...]

  • [...]

  • [...]