Author
H. L. Premaratne
Bio: H. L. Premaratne is an academic researcher from University of Colombo. The author has contributed to research in topics: Optical character recognition & Pattern recognition (psychology). The author has an hindex of 2, co-authored 2 publications receiving 38 citations.
Papers
More filters
TL;DR: In this article, a novel approach for printed character recognition using linear symmetry is proposed, where orientation features are effectively used to recognize characters directly using a standard alphabet as the basis without the need for segmentation into basic components.
Abstract: In this paper, a novel approach for printed character recognition using linear symmetry is proposed. When the conventional character recognition methods such as the artificial neural network based techniques are used to recognise Brahmi Sinhala script, segmentation of modified characters into modifier symbols and basic characters is a necessity but a complex issue. The large size of the character set makes the whole recognition process even more complex. In contrast, in the proposed method, the orientation features are effectively used to recognise characters directly using a standard alphabet as the basis without the need for segmentation into basic components. The edge detection algorithm using linear symmetry recognises vertical modifiers. The linear symmetry principle is also used to determine the skew angle. Experiments with the aim for an optical character recognition system for the printed Sinhala script show favourable results.
24 citations
TL;DR: A novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script and the word-level accuracy is improved by the proposed optimisation algorithm.
Abstract: The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabie, Y., Bigun, J., 2004. Recognition of modification-based scripts using direction tensors. In: 4th Indian Conf. on Computer Vision, Graphics and Image Processing (ICVGIP2004), pp. 587-592]; a segmentation-free recognition method using orientation features has been proposed in [Premaratne, H.L., Bigun, J., 2004. A segmentation-free approach to recognise printed Sinhala script using linear symmetry. Pattern Recognition 37, 2081-2089]. Due to the limitations in image analysis techniques the character level accuracy of the results directly produced by the proposed character recognition algorithm saturates at 94%. The false rejections from the recognition algorithm are initially identified only as 'missing character positions' or 'blank characters'. It is necessary to identify suitable substitutes for such 'missing character positions' and optimise the accuracy of words to an acceptable level. This paper proposes a novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script. The proposed method could easily be extended with minor changes to other modification-based scripts consisting of confusing characters. The word-level accuracy which was at 81.5% is improved to 88.5% by the proposed optimisation algorithm.
14 citations
Cited by
More filters
Posted Content•
TL;DR: The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers.
Abstract: Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field.
24 citations
TL;DR: A novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script and the word-level accuracy is improved by the proposed optimisation algorithm.
Abstract: The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabie, Y., Bigun, J., 2004. Recognition of modification-based scripts using direction tensors. In: 4th Indian Conf. on Computer Vision, Graphics and Image Processing (ICVGIP2004), pp. 587-592]; a segmentation-free recognition method using orientation features has been proposed in [Premaratne, H.L., Bigun, J., 2004. A segmentation-free approach to recognise printed Sinhala script using linear symmetry. Pattern Recognition 37, 2081-2089]. Due to the limitations in image analysis techniques the character level accuracy of the results directly produced by the proposed character recognition algorithm saturates at 94%. The false rejections from the recognition algorithm are initially identified only as 'missing character positions' or 'blank characters'. It is necessary to identify suitable substitutes for such 'missing character positions' and optimise the accuracy of words to an acceptable level. This paper proposes a novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script. The proposed method could easily be extended with minor changes to other modification-based scripts consisting of confusing characters. The word-level accuracy which was at 81.5% is improved to 88.5% by the proposed optimisation algorithm.
14 citations
Patent•
14 Mar 2013
TL;DR: In this paper, an electronic device and method identify a block of text in a portion of an image of real world captured by a camera of a mobile device, slice subblocks from the block and identify characters in the sub-blocks that form a first sequence to a predetermined set of sequences to identify a second sequence therein.
Abstract: An electronic device and method identify a block of text in a portion of an image of real world captured by a camera of a mobile device, slice sub-blocks from the block and identify characters in the sub-blocks that form a first sequence to a predetermined set of sequences to identify a second sequence therein. The second sequence may be identified as recognized (as a modifier-absent word) when not associated with additional information. When the second sequence is associated with additional information, a check is made on pixels in the image, based on a test specified in the additional information. When the test is satisfied, a copy of the second sequence in combination with the modifier is identified as recognized (as a modifier-present word). Storage and use of modifier information in addition to a set of sequences of characters enables recognition of words with or without modifiers.
13 citations
Proceedings Article•
01 Jan 2004TL;DR: This thesis proposes structural and syntactic techniques for recognition of Ethiopic characters where the graphically comnplex characters are represented by less complex primitive structures and their spatial interrelationships and presents a general framework for multi-font, multi-size and multi-style Ethiopic character recognition system.
Abstract: In this thesis, we present a general framework for multi-font, multi-size and multi-style Ethiopic character recognition system. We propose structural and syntactic techniques for recognition of Ethiopic characters where the graphically comnplex characters are represented by less complex primitive structures and their spatial interrelationships. For each Ethiopic character, the primitive structures and their spatial interrelationships form a unique set of patterns.The interrelationships of primitives are represented by a special tree structure which resembles a binary search tree in the sense that it groups child nodes as left and right, and keeps the spatial position of primitives in orderly manner. For a better computational efficiency, the primitive tree is converted into string pattern using in-order traversal, which generates a base of the alphabet that stores possibly occuring string patterns for each character. The recognition of characters is then achieved by matching the generated patterns with each pattern in a stored knowledge base of characters.Structural features are extracted using direction field tensor, which is also used for character segmentation. In general, the recognition system does not need size normalization, thinning or other preprocessing procedures. The only parameter that needs to be adjusted during the recognition process is the size of Gaussian window which should be chosen optimally in relation to font sizes. We also constructed an Ethiopic Document Image Database (EDIDB) from real life documents and the recognition system is tested with respect to variations in font type, size, style, document skewness and document type. Experimental results are reported.
10 citations
Patent•
14 Mar 2013TL;DR: In this article, a trellis based word decoder analyses a set of OCR characters and probabilities using a forward pass across a forward tree and a reverse pass across the reverse tree.
Abstract: Systems, apparatuses, and methods to relate images of words to a list of words are provided. A trellis based word decoder analyses a set of OCR characters and probabilities using a forward pass across a forward trellis and a reverse pass across a reverse trellis. Multiple paths may result, however, the most likely path from the trellises has the highest probability with valid links. A valid link is determined from the trellis by some dictionary word traversing the link. The most likely path is compared with a list of words to find the word closest to the most.
10 citations