scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Handwritten recognition of Tamil vowels using deep learning

01 Nov 2017-Vol. 263, Iss: 5, pp 052035
TL;DR: This paper explored the performance of Deep Belief Networks in the classification of Handwritten Tamil vowels, and conclusively compared the results, and proposed method has shown satisfactory recognition accuracy in light of difficulties faced with regional languages.
Abstract: We come across a large volume of handwritten texts in our daily lives and handwritten character recognition has long been an important area of research in pattern recognition. The complexity of the task varies among different languages and it so happens largely due to the similarity between characters, distinct shapes and number of characters which are all language-specific properties. There have been numerous works on character recognition of English alphabets and with laudable success, but regional languages have not been dealt with very frequently and with similar accuracies. In this paper, we explored the performance of Deep Belief Networks in the classification of Handwritten Tamil vowels, and conclusively compared the results obtained. The proposed method has shown satisfactory recognition accuracy in light of difficulties faced with regional languages such as similarity between characters and minute nuances that differentiate them. We can further extend this to all the Tamil characters.
Citations
More filters
Book ChapterDOI
01 Jan 2021
TL;DR: The recurrent neural network (RNN) is used to train the features of the characters extracted from the palm leaf using a preprocessing method to eliminate noise and provides better recognition accuracy than the other neural network-based character recognition.
Abstract: Tamil is one of the ancient Indian languages which has a vast collection of literature in the form of palm leaf, stones, metal plates, and other materials. Palm leaf manuscript was a broad tool to narrate medicines, literature, drama, and many more. Recognition of the characters written in palm leaf manuscripts is still an open task because of the need for digitization and transcription. In this paper, the recurrent neural network (RNN) is used to train the features of the characters extracted from the palm leaf. This method contains a preprocessing method to eliminate noise; then the character is segmented from the image and trained using the bidirectional long short-term memory (BLSTM) network. A feature vector with nine zones of character strokes is used to train and test the characters. A rich set of characters are used to train the features of the characters. This method provides better recognition accuracy than the other neural network-based character recognition.

8 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: A novel generative approach for face recognition is proposed, in which sparse facial features are extracted from high resolution color face images using predefined landmark topologies which mark discriminative locations on face images, unlike the appearance-based approach, which low resolution grayscale face images are used, reducing the computational complexity.
Abstract: We propose a novel generative approach for face recognition, in which sparse facial features are extracted from high resolution color face images using predefined landmark topologies which mark discriminative locations on face images, unlike the appearance-based approach, in which low resolution grayscale face images are used, reducing the computational complexity. By adopting a common landmark topology, the dissimilarity between distinct face images can be scored in terms of the dissimilarities between their corresponding landmarks, which are obtained by proposed geodesic distance approximations between multivariate normal distributions which represent the color intensities in the vicinities of each landmark location. The classification process of new face samples occurs by the determination of the face image sample present in the training set which minimizes the dissimilarity score. The proposed method was compared with representative current state-of-the-art methods using color or grayscale face images and presented the higher recognition rates. Moreover, these results also support a trend in which color information is relevant in face recognition.

8 citations


Cites background from "Handwritten recognition of Tamil vo..."

  • ...to extract reliable features for several instrumentation-related applications which use texture information, such as face recognition [1] [2] [3] [4], brain image recognition [5] [6], texture recognition of material images [7], food image recognition [8] [9], character recognition [10] [11], yawning detection [12], etc....

    [...]

Journal ArticleDOI
TL;DR: The proposed face recognition method was compared to methods representative of the state-of-the-art, using color or grayscale face images, and presented higher recognition rates and also is efficient in general texture discrimination (e.g., texture recognition of material images), as the experiments suggest.
Abstract: Geodesic distance is a natural dissimilarity measure between probability distributions of a specific type, and can be used to discriminate texture in image-based measurements Furthermore, since there is no known closed-form solution for the geodesic distance between general multivariate normal distributions, we propose two efficient approximations to be used as texture dissimilarity metrics in the context of face recognition A novel face recognition approach based on texture discrimination in high-resolution color face images is proposed, unlike the typical appearance-based approach that relies on low-resolution grayscale face images In our face recognition approach, sparse facial features are extracted using predefined landmark topologies, that identify discriminative image locations on the face images Given this landmark topology, the dissimilarity between distinct face images are scored in terms of the dissimilarities between their corresponding face landmarks, and the texture in each one of these landmarks is represented by multivariate normal distributions, expressing the color distribution in the vicinity of each landmark location The classification of new face image samples occurs by determining the face image sample in the training set which minimizes the dissimilarity score, using the nearest neighbor rule The proposed face recognition method was compared to methods representative of the state-of-the-art, using color or grayscale face images, and presented higher recognition rates Moreover, the proposed texture dissimilarity metric also is efficient in general texture discrimination (eg, texture recognition of material images), as our experiments suggest

6 citations


Cites methods from "Handwritten recognition of Tamil vo..."

  • ...Moreover, the image processing and computer vision fields may be used to help to extract reliable features for several instrumentation-related applications which use texture information, such as face recognition [1–4], brain image recognition [5, 6], texture recognition of material images [7], food image recognition [8, 9], character recognition [10, 11], yawning detection [12], etc....

    [...]

18 Jan 2018
TL;DR: This work proposes a novel generative approach for face recognition based on texture discrimination using high-resolution color face images and proposes two efficient approximations to discriminate textures in the context of face recognition.
Abstract: Geodesic distances are a natural dissimilarity measure between probability distributions of a fixed type, and are used to discriminate texture in several image-based measurements. Besides, since there is no known closed-form solution for the geodesic distance between general multivariate normal distributions, we propose two efficient approximations to discriminate textures in the context of face recognition. Unlike the typical appearance-based approach that uses low-resolution grayscale face images, we propose a novel generative approach for face recognition based on texture discrimination. In the proposed approach, sparse facial features are extracted from high-resolution color face images using predefined landmark topologies, in which landmarks are in discriminative locations of face images. By adopting a common landmark topology, the dissimilarity between distinct face images can be scored in terms of the dissimilarities between the texture in their corresponding landmark vicinities. The proposed multivariate normal distributions represent the color intensities around each landmark location. The classification of new face samples occurs by determining the face image sample in the training set which minimizes the dissimilarity score. The proposed face recognition method was compared to methods representative of the state-of-the-art using color and grayscale face images, and presented higher recognition rates. Moreover, the proposed measures to discriminate textures tend to be efficient in face recognition and in general texture discrimination (e.g., texture recognition of material images), as our experiments suggest

Cites background from "Handwritten recognition of Tamil vo..."

  • ...images [7], food image recognition [8] [9], character recognition [10] [11], yawning detection [12], etc....

    [...]

Journal ArticleDOI
TL;DR: In this paper , a model is proposed to detect isolated text characters in the photographic images of natural scenes using the combination of Convolutional Neural Network (CNN) and RNN for recognizing the text in natural images.
Abstract: Recognizing text from the nature scene images and videos has been the challenging task of computer vision and machine learning research community in recent years. These texts are difficult to recognize because of their shapes, complex backgrounds, color, shape and size variations. However, text recognition is very much useful in indexing, keyword-based image and video search, and information retrieval. In this research paper, a model is proposed to detect the isolated text characters in the photographic images of natural scenes. The proposed model uses the combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) for recognizing the text in natural images. The model uses two networks, where the first network combines the low-level and middle-level features to increase the feature size and passes the enriched information to the second network. Here, features are again widened by combining with high-level features, resulting in powerful and robust features. To evaluate the proposed model, ICDAR2003 (IC03), ICDAR2013 (IC13), SVT (Street View Text) datasets have been used. And an extensive Tamil news tickers image dataset has been developed to evaluate the model. The experimental results show that the combined feature fusion technique outperforms the other methods on the ICDAR2003, ICDAR2013, SVT and Tamil news tickers datasets.
References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Journal ArticleDOI
28 Jul 2006-Science
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

16,717 citations

Proceedings Article
31 Mar 2010
TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

9,500 citations