scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script

15 Apr 2006-Pattern Recognition Letters (Elsevier)-Vol. 27, Iss: 6, pp 696-705
TL;DR: A novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script and the word-level accuracy is improved by the proposed optimisation algorithm.
About: This article is published in Pattern Recognition Letters.The article was published on 2006-04-15. It has received 14 citations till now. The article focuses on the topics: Optical character recognition & Pattern recognition (psychology).
Citations
More filters
Posted Content
TL;DR: The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers.
Abstract: Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field.

24 citations

Patent
14 Mar 2013
TL;DR: In this paper, an electronic device and method identify a block of text in a portion of an image of real world captured by a camera of a mobile device, slice subblocks from the block and identify characters in the sub-blocks that form a first sequence to a predetermined set of sequences to identify a second sequence therein.
Abstract: An electronic device and method identify a block of text in a portion of an image of real world captured by a camera of a mobile device, slice sub-blocks from the block and identify characters in the sub-blocks that form a first sequence to a predetermined set of sequences to identify a second sequence therein. The second sequence may be identified as recognized (as a modifier-absent word) when not associated with additional information. When the second sequence is associated with additional information, a check is made on pixels in the image, based on a test specified in the additional information. When the test is satisfied, a copy of the second sequence in combination with the modifier is identified as recognized (as a modifier-present word). Storage and use of modifier information in addition to a set of sequences of characters enables recognition of words with or without modifiers.

13 citations

Patent
14 Mar 2013
TL;DR: In this article, a trellis based word decoder analyses a set of OCR characters and probabilities using a forward pass across a forward tree and a reverse pass across the reverse tree.
Abstract: Systems, apparatuses, and methods to relate images of words to a list of words are provided. A trellis based word decoder analyses a set of OCR characters and probabilities using a forward pass across a forward trellis and a reverse pass across a reverse trellis. Multiple paths may result, however, the most likely path from the trellises has the highest probability with valid links. A valid link is determined from the trellis by some dictionary word traversing the link. The most likely path is compared with a list of words to find the word closest to the most.

10 citations

Journal ArticleDOI
TL;DR: The classification results showed that the Fuzzy Discrete Hidden Markov Model (FDHMM) method is effective for classification of internal carotid artery Doppler signals.
Abstract: Research highlights? We developed a biomedical system based on fuzzy discrete hidden Markov model (FDHMM) in order to classify the internal carotid artery Doppler signals. The system consists of feature extraction and classification stages. In the feature extraction stage, Burg autoregressive (AR) spectrum analysis technique was used in order to obtain medical information. In the classification stage, in order to avoid losing information due to vector quantization and to increase the classification performance, a fuzzy logic based approach was applied. Our proposed method reached 97.38% classification accuracy with 5-fold cross validation (CV) technique. The classification results showed that the FDHMM is effective for classification of internal carotid artery Doppler signals. We developed a biomedical system based on Discrete Hidden Markov Model (DHMM). The aim of our system is to classify the internal carotid artery Doppler signals. We applied a fuzzy approach to DHMM. Thus we decreased information loss and increased the classification performance. Our system reached 97.38% of classification accuracy with 5 fold cross validation. These results showed that the Fuzzy Discrete Hidden Markov Model (FDHMM) method is effective for classification of internal carotid artery Doppler signals.

9 citations

01 Jan 2009
TL;DR: A search engine for font recognition and a modified approach for statistical analysis of color emotions in images, involving transformations of ordinary RGB-histograms, is used for image classification and retrieval.
Abstract: Two novel contributions to Content Based Image Retrieval are presented and discussed The first is a search engine for font recognition The intended usage is the search in very large font databases The input to the search engine is an image of a text line, and the output is the name of the font used when printing the text After pre-processing and segmentation of the input image, a local approach is used, where features are calculated for individual characters The method is based on eigenimages calculated from edge filtered character images, which enables compact feature vectors that can be computed rapidly A system for visualizing the entire font database is also proposed Applying geometry preserving linear- and non-linear manifold learning methods, the structure of the high-dimensional feature space is mapped to a two-dimensional representation, which can be reorganized into a grid-based display The performance of the search engine and the visualization tool is illustrated with a large database containing more than 2700 fonts The second contribution is the inclusion of color-based emotion-related properties in image retrieval The color emotion metric used is derived from psychophysical experiments and uses three scales: activity, weight and heat It was originally designed for single-color combinations and later extended to include pairs of colors A modified approach for statistical analysis of color emotions in images, involving transformations of ordinary RGB-histograms, is used for image classification and retrieval The methods are very fast in feature extraction, and descriptor vectors are very short This is essential in our application where the intended use is the search in huge image databases containing millions or billions of images The proposed method is evaluated in psychophysical experiments, using both category scaling and interval scaling The results show that people in general perceive color emotions for multi-colored images in similar ways, and that observer judgments correlate with derived values Both the font search engine and the emotion based retrieval system are implemented in publicly available search engines User statistics gathered during a period of 20 respectively 14 months are presented and discussed

8 citations


Cites methods from "Lexicon and hidden Markov model-bas..."

  • ...For many years font recognition research was dominated by methods focusing on the English or Latin alphabet (with minor contributions focusing on other alphabets, for instance the Arabic alphabet [55], and the South Asian script Sinhala [65][66])....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

4,546 citations

Reference EntryDOI
TL;DR: In this paper, the concept of hidden Markov models in computational biology is introduced and described using simple biological examples, requiring as little mathematical knowledge as possible, and an overview of their current applications are presented.
Abstract: This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit also presents a brief history of hidden Markov models and an overview of their current applications before concluding with a discussion of their limitations.

1,305 citations

Proceedings ArticleDOI
05 Aug 1996
TL;DR: A new model for word alignment in statistical translation using a first-order Hidden Markov model for the word alignment problem as they are used successfully in speech recognition for the time alignment problem.
Abstract: In this paper, we describe a new model for word alignment in statistical translation and present experimental results. The idea of the model is to make the alignment probabilities dependent on the differences in the alignment positions rather than on the absolute positions. To achieve this goal, the approach uses a first-order Hidden Markov model (HMM) for the word alignment problem as they are used successfully in speech recognition for the time alignment problem. The difference to the time alignment HMM is that there is no monotony constraint for the possible word orderings. We describe the details of the model and test the model on several bilingual corpora.

976 citations

Journal ArticleDOI
TL;DR: A novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided and linguistic knowledge beyond the lexicon level is incorporated in the recognition process.
Abstract: In this paper, a system for the reading of totally unconstrained handwritten text is presented. The kernel of the system is a hidden Markov model (HMM) for handwriting recognition. The HMM is enhanced by a statistical language model. Thus linguistic knowledge beyond the lexicon level is incorporated in the recognition process. Another novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided. A number of experiments with various language models and large vocabularies have been conducted. The language models used in the system were also analytically compared based on their perplexity.

463 citations


"Lexicon and hidden Markov model-bas..." refers background in this paper

  • ...Marti and Bunke (2001), and Khorsheed (2003) are cited among the recent work on the HMM based handwriting recognition....

    [...]

  • ...When the words are formed using the recognised individual characters and the substitutions are made for false rejections, the accuracy rate at word level is around 81.5%....

    [...]

Journal ArticleDOI
TL;DR: A hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies and can be successfully used for handwritten word recognition.
Abstract: Describes a hidden Markov model-based approach designed to recognize off-line unconstrained handwritten words for large vocabularies. After preprocessing, a word image is segmented into letters or pseudoletters and represented by two feature sequences of equal length, each consisting of an alternating sequence of shape-symbols and segmentation-symbols, which are both explicitly modeled. The word model is made up of the concatenation of appropriate letter models consisting of elementary HMMs and an HMM-based interpolation technique is used to optimally combine the two feature sets. Two rejection mechanisms are considered depending on whether or not the word image is guaranteed to belong to the lexicon. Experiments carried out on real-life data show that the proposed approach can be successfully used for handwritten word recognition.

243 citations