scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Text recognition using deep BLSTM networks

TL;DR: A Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition that uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment.
Abstract: This paper presents a Deep Bidirectional Long Short Term Memory (LSTM) based Recurrent Neural Network architecture for text recognition. This architecture uses Connectionist Temporal Classification (CTC) for training to learn the labels of an unsegmented sequence with unknown alignment. This work is motivated by the results of Deep Neural Networks for isolated numeral recognition and improved speech recognition using Deep BLSTM based approaches. Deep BLSTM architecture is chosen due to its ability to access long range context, learn sequence alignment and work without the need of segmented data. Due to the use of CTC and forward backward algorithms for alignment of output labels, there are no unicode re-ordering issues, thus no need of lexicon or postprocessing schemes. This is a script independent and segmentation free approach. This system has been implemented for the recognition of unsegmented words of printed Oriya text. This system achieves 4.18% character level error and 12.11% word error rate on printed Oriya text.
Citations
More filters
Proceedings ArticleDOI
08 Jul 2018
TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.
Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

230 citations


Cites methods from "Text recognition using deep BLSTM n..."

  • ...However [18] demonstrated that a Deep Bidirectional Long Short Term Memory based RNN (BLSTM-RNN) can be used which provides promising results for text recognition....

    [...]

Journal Article
TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.
Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

222 citations


Cites methods from "Text recognition using deep BLSTM n..."

  • ...[26,27], etc, that have achieved higher accuracies than the presently proposed method....

    [...]

Journal ArticleDOI
Xiaolei Ma1, Jiyu Zhang1, Bowen Du1, Chuan Ding1, Leilei Sun2 
TL;DR: A parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively, suitable for ridership prediction in large-scale metro networks is proposed.
Abstract: Accurate metro ridership prediction can guide passengers in efficiently selecting their departure time and transferring from station to station. An increasing number of deep learning algorithms are being utilized to forecast metro ridership due to the development of computational intelligence. However, limited efforts have been exerted to consider spatiotemporal features, which are important in forecasting ridership through deep learning methods, in large-scale metro networks. To fill this gap, this paper proposes a parallel architecture comprising convolutional neural network (CNN) and bi-directional long short-term memory network (BLSTM) to extract spatial and temporal features, respectively. Metro ridership data are transformed into ridership images and time series. Spatial features can be learned from ridership image data by using CNN, which demonstrates favorable performance in video detection. Time series data are input into the BLSTM which considers the historical and future impacts of ridership in temporal feature extraction. The two networks are concatenated in parallel and prevented from interfering with each other. Joint spatiotemporal features are fed into a fully connected network for metro ridership prediction. The Beijing metro network is used to demonstrate the efficiency of the proposed algorithm. The proposed model outperforms traditional statistical models, deep learning architectures, and sequential structures, and is suitable for ridership prediction in large-scale metro networks. Metro authorities can thus effectively allocate limited resources to overcrowded areas for service improvement.

117 citations


Cites methods from "Text recognition using deep BLSTM n..."

  • ...This model consists of a forward and backward LSTM to extract temporal features in two directions; hence, it can effectively capture the periodicity and regularity of ridership data [33]....

    [...]

Journal ArticleDOI
TL;DR: A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed and an AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi- objective heuristics algorithms.

99 citations

Journal ArticleDOI
TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.

88 citations

References
More filters
Proceedings ArticleDOI
25 Aug 2013
TL;DR: A novel script independent CRF based inferencing framework for character recognition that considers a word as a sequence of connected components using multiple hypothesis tree to form the correct sequence of alphabets.
Abstract: The paper presents a novel script independent CRF based inferencing framework for character recognition. In this framework we consider a word as a sequence of connected components. The connected components are obtained using different binarization schemes and different possible sequences are considered using a tree structure. CRF uses contextual information to learn perfect primitive sequences and finds the most probable labeling of the sequence of primitives using multiple hypothesis tree to form the correct sequence of alphabets. This approach is particularly suitable for degraded printed document images as it considers multiple alternate hypotheses for correct decision.

5 citations


"Text recognition using deep BLSTM n..." refers methods in this paper

  • ...The same Oriya data as used in this paper was segmented by connected components and used to train a CRF obtaining 93% character error rate on the same testset [31]....

    [...]