scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Scene Text Analysis using Deep Belief Networks

TL;DR: This paper is the first paper to report scene text recognition using deep belief networks and achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms.
Abstract: This paper focuses on the recognition and analysis of text embedded in scene images using Deep learning. The proposed approach uses deep learning architectures for automated higher order feature extraction, thereby improving classification accuracies in comparison to handcrafted features used traditionally. Exhaustive experiments have been performed with Deep Belief Networks and Convolutional Deep Neural Networks with varied training algorithms like Contrastive Divergence, De-noising Score Matching and supervised learning algorithms such as logistic regression and Multi-layer perceptron. These algorithms have been validated on 4 standard datasets: Chars 74K English, Chars 74K Kannada, ICDAR 2003 Robust OCR dataset and SVT-CHAR dataset. The proposed network achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms. For ICDAR 2003 dataset, the proposed network is marginally worse in comparison to Deep Convolutional networks. Although deep belief networks have been considerably used for several applications, according to the knowledge of the authors, this is the first paper to report scene text recognition using deep belief networks.
Citations
More filters
Journal ArticleDOI
TL;DR: A novel technique by using adapted maximally stable extremal region (MSER) technique and extracts scale-invariant features from MSER detected region is presented and the adapted MDLSTM network is presented to tackle the complexities of cursive scene text.
Abstract: © 2019 IEEE. The recognition of text in natural scene images is a practical yet challenging task due to the large variations in backgrounds, textures, fonts, and illumination. English as a secondary language is extensively used in Gulf countries along with Arabic script. Therefore, this paper introduces English-Arabic scene text recognition 42K scene text image dataset. The dataset includes text images appeared in English and Arabic scripts while maintaining the prime focus on Arabic script. The dataset can be employed for the evaluation of text segmentation and recognition task. To provide an insight to other researchers, experiments have been carried out on the segmentation and classification of Arabic as well as English text and report error rates like 5.99% and 2.48%, respectively. This paper presents a novel technique by using adapted maximally stable extremal region (MSER) technique and extracts scale-invariant features from MSER detected region. To select discriminant and comprehensive features, the size of invariant features is restricted and considered those specific features which exist in the extremal region. The adapted MDLSTM network is presented to tackle the complexities of cursive scene text. The research on Arabic scene text is in its infancy, thus this paper presents benchmark work in the field of text analysis.

20 citations

Patent
30 Dec 2016
TL;DR: In this article, a neural network system that includes multiple subnetworks that includes a first subnetwork including multiple first modules, each first module including: a pass-through convolutional layer configured to process the subnetwork input for the first sub-network to generate a passthrough output; an average pooling stack of neural network layers that collectively processes the sub-networks inputs to generate an average Pooling output.
Abstract: A neural network system that includes: multiple subnetworks that includes: a first subnetwork including multiple first modules, each first module including: a pass-through convolutional layer configured to process the subnetwork input for the first subnetwork to generate a pass-through output; an average pooling stack of neural network layers that collectively processes the subnetwork input for the first subnetwork to generate an average pooling output; a first stack of convolutional neural network layers configured to collectively process the subnetwork input for the first subnetwork to generate a first stack output; a second stack of convolutional neural network layers that are configured to collectively process the subnetwork input for the first subnetwork to generate a second stack output; and a concatenation layer configured to concatenate the pass-through output, the average pooling output, the first stack output, and the second stack output to generate a first module output for the first module.

12 citations

Journal ArticleDOI
TL;DR: A machine learning based automated legal model is proposed to enhance the efficiency of the legal support system with an accuracy of 94% to assist the victims with prompt delivery of justice and legal professionals in reducing their workload.
Abstract: It is essential to provide a structured data feed to the computer to accomplish any task so that it can process flawlessly to generate the desired output within minimal computational time. Generally, computer programmers should provide a structured data feed to the computer program for its successful execution. The hardcopy document should be scanned to generate its corresponding computer-readable softcopy version of the file. This process also proves to be a budget-friendly approach to disengage human resources from the entire process of record maintenance. Due to this automation, the workload of existing manpower is reduced to a significant level. This concept may prove beneficial for the delivery of any type of services to the ultimate beneficiary (i.e., citizen) in a minimal time frame. The administration has to deal with various issues of citizens due to the pressure of a huge population who seek legal help to resolve their issues, thereby leading to the filing of large numbers of pending legal cases at several courts of the country. To assist the victims with prompt delivery of justice and legal professionals in reducing their workload, this paper proposed a machine learning based automated legal model to enhance the efficiency of the legal support system with an accuracy of 94%.

7 citations

Proceedings ArticleDOI
01 Nov 2016
TL;DR: The proposed DNA sequence alignment technique uses a novel concept of pointing matrix where the directed path in the pointing matrix ensures faster and accurate finding of the optimal alignment pertaining with the accuracy ensured by the well known Needleman & Wunsch algorithm.
Abstract: A memory efficient approach for pair-wise DNA sequence alignment is presented in this paper. The proposed DNA sequence alignment technique uses a novel concept of pointing matrix. The directed path in the pointing matrix ensures faster and accurate finding of the optimal alignment pertaining with the accuracy ensured by the well known Needleman & Wunsch algorithm. The proposed technique is tested using DNA nucleotides but could use RNA or protein sequences too and it is also suitable for any formal language processing. The complete approach has been simulated considering ten numerous cases (DNA sequence size from 64 to 1024 nucleotides) on 100 pseudo randomly generated DNA sequence pairs in each case. The proposed approach consumes little more time (≈ 9 – 11%) for matrix formation, but it takes 34 – 42% less time to find the optimal alignment compared to the approach based on [2]. For the longer DNA sequences, the space requirement of the proposed approach is significantly lesser compared to the approach based on Needleman-Wunsch [2] though it uses two matrices.

3 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: An improved DNA sequence alignment scheme using a matrix named neighbouring matrix to meet the requirement of high speed processing and can align 43% — 69% more sequences than that of same DNA sequence pairs of variable lengths.
Abstract: An improved DNA sequence alignment scheme using a matrix named neighbouring matrix is presented in this paper. This paper also presents a simple design of the hardware implementation of the proposed global sequence alignment approach to meet the requirement of high speed processing. Additionally, this architecture efficiently uses SRAM to boost DNA sequence alignment in real time by employing a few logic circuitry for the fundamental operations of sequence alignment. The complete scheme is simulated for ten numerous cases considering sequence size ranging from 64 to 1024 nucleotides with 100 pseudo randomly generated DNA sequence pairs in each case. The proposed approach consumes almost half the time to align two DNA sequences than that of the approach based on [2]. The performance is measured in terms of Million Alignments Per Second (MAPS). The simulation result shows that the proposed scheme can align 43% — 69% more sequences than that of [2] of same DNA sequence pairs of variable lengths.

2 citations

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
28 Jul 2006-Science
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

16,717 citations

Journal ArticleDOI
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

15,055 citations


"Scene Text Analysis using Deep Beli..." refers background or methods in this paper

  • ...architecture is from [11] where a 3 layer DBN architecture is used for the handwriting recognition problem on MNIST dataset....

    [...]

  • ...The main motivation for a deep network was conceived from the comparison study of RBM and DBN [11], which emphasizes on the fact that using more layers of RBMs increase the representational capability of the network....

    [...]

  • ...Deep belief networks are probabilistic generative models that have several layers and can be trained by both unsupervised and supervised techniques [11]....

    [...]

Journal ArticleDOI
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

11,201 citations


"Scene Text Analysis using Deep Beli..." refers background in this paper

  • ...Representation learning [1] uses layers of unsupervised algorithms which learn abstract concepts from the data and thus is able to increase the accuracy of the classifier....

    [...]