scispace - formally typeset
Search or ask a question

Showing papers by "A. G. Ramakrishnan published in 2013"


Journal ArticleDOI
TL;DR: The proposed dynamic plosion index (DPI) algorithm, based on integrated linear prediction residual (ILPR) which resembles the voice source signal, is tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech.
Abstract: Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting ‘transients’ in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.

117 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: Two algorithms, namely nonlinear enhancement and selection of plane and midline analysis and propagation of segmentation, already published by the authors are evaluated and suggestions are provided to improve the quality of the algorithms.
Abstract: A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The competition had four distinct tasks: (i) text localization and (ii) segmentation from scene images containing one or more of Kannada, Tamil, Hindi, Chinese and English words. (iii) English and (iv) Kannada word recognition task from scene word images. There were totally four submissions for the text localization and segmentation tasks. For the other two tasks, we have evaluated two algorithms, namely nonlinear enhancement and selection of plane and midline analysis and propagation of segmentation, already published by us. A complete picture on the position of an algorithm is discussed and suggestions are provided to improve the quality of the algorithms. Graphical depiction of f-score of individual images in the form of benchmark values is proposed to show the strength of an algorithm.

31 citations


Journal ArticleDOI
TL;DR: A lexicon-free, script-dependent approach to segment online handwritten isolated Tamil words into its constituent symbols, which achieves a symbol-level segmentation accuracy of 98.1%, which improves to as high as 99.7% after the AFS strategy.
Abstract: In this article, we propose a lexicon-free, script-dependent approach to segment online handwritten isolated Tamil words into its constituent symbols. Our proposed segmentation strategy comprises two modules, namely the (1) Dominant Overlap Criterion Segmentation (DOCS) module and (2) Attention Feedback Segmentation (AFS) module. Based on a bounding box overlap criterion in the DOCS module, the input word is first segmented into stroke groups. A stroke group may at times correspond to a part of a valid symbol (over-segmentation) or a merger of valid symbols (under-segmentation). Attention on specific features in the AFS module serve in detecting possibly over-segmented or under-segmented stroke groups. Thereafter, feedbacks from the SVM classifier likelihoods and stroke-group based features are considered in modifying the suspected stroke groups to form valid symbols.The proposed scheme is tested on a set of 10000 isolated handwritten words (containing 53,246 Tamil symbols). The results show that the DOCS module achieves a symbol-level segmentation accuracy of 98.1p, which improves to as high as 99.7p after the AFS strategy. This in turn entails a symbol recognition rate of 83.9p (at the DOCS module) and 88.4p (after the AFS module). The resulting word recognition rates at the DOCS and AFS modules are found to be, 50.9p and 64.9p respectively, without any postprocessing.

26 citations


Proceedings ArticleDOI
04 Feb 2013
TL;DR: A breakthrough result is reported on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images.
Abstract: In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding The plane that has maximum discrimination factor is selected for segmentation The trial version of Omnipage OCR is then used on the binarized words for recognition Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature As baseline, the images binarized by simple global and local thresholding techniques were also recognized The word recognition rate obtained by our non-linear enhancement and selection of plance method is 728% and 662% for ICDAR 2011 and 2003 word datasets, respectively We have created ground-truth for each image at the pixel level to benchmark these datasets using a toolkit developed by us The recognition rate of benchmarked images is 867% and 839% for ICDAR 2011 and 2003 datasets, respectively

22 citations


Proceedings ArticleDOI
24 Aug 2013
TL;DR: This paper investigates the efficacy of using global features alone, DT, DCT, local features alone (preprocessed (x, y) coordinates) and a combination of both global and local features and obtained more than 95% accuracy.
Abstract: Feature extraction is a key step in the recognition of online handwritten data and is well investigated in literature. In the case of Tamil online handwritten characters, global features such as those derived from discrete Fourier transform (DFT), discrete cosine transform (DCT), wavelet transform have been used to capture overall information about the data. On the hand, local features such as (x, y) coordinates, nth derivative, curvature and angular features have also been used. In this paper, we investigate the efficacy of using global features alone (DFT, DCT), local features alone (preprocessed (x, y) coordinates) and a combination of both global and local features. Our classifier, a support vector machine (SVM) with radial basis function (RBF) kernel, is trained and tested on the IWFHR 2006 Tamil handwritten character recognition competition dataset. We have obtained more than 95% accuracy on the test dataset which is greater than the best score reported in the literature. Further, we have used a combination of global and local features on a publicly available database of Indo-Arabic numerals and obtained an accuracy of more than 98%.

21 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information.
Abstract: A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.

4 citations


01 Jan 2013
TL;DR: A novel technique that can extract text lines of arbitrary curvature and align them horizontally by invoking the spatial regularity properties of text by fitting a B-spline curve to the centroids of the constituent characters and normal vectors are computed all along the resulting curve.
Abstract: Conventional optical character recognition systems, designed to recognize linearly aligned text, perform poorly on document images that contain multi-oriented text lines. This paper describes a novel technique that can extract text lines of arbitrary curvature and align them horizontally. By invoking the spatial regularity properties of text, adjacent components are grouped together to obtain the text lines present in the image. To align each identified text line, we fit a B-spline curve to the centroids of the constituent characters and normal vectors are computed all along the resulting curve. Each character is then individually rotated such that the corresponding normal vector is aligned with the vertical axis. The method has been tested on images that contain text laid out in various forms namely arc, wave, triangular and combination of these with linearly skewed text lines. It yields 97.3% recognition accuracy on text strings where state-of-the-art OCRs fail before alignment.

2 citations


Journal ArticleDOI
TL;DR: This paper presents compressed sensing data acquisition from a different perspective, wherein a set of signals are reconstructed at a sampling rate which is a multiple of the sampling rate of the ADCs that are used to measure the signals.
Abstract: Major emphasis, in compressed sensing (CS) research, has been on the acquisition of sub-Nyquist number of samples of a signal that has a sparse representation on some tight frame or an orthogonal basis, and subsequent reconstruction of the original signal using a plethora of recovery algorithms. In this paper, we present compressed sensing data acquisition from a different perspective, wherein a set of signals are reconstructed at a sampling rate which is a multiple of the sampling rate of the ADCs that are used to measure the signals. We illustrate how this can facilitate usage of anti-aliasing filters with relaxed frequency specifications and, consequently, of lower order.

2 citations


Proceedings ArticleDOI
04 Feb 2013
TL;DR: A metric to evaluate binarized document images using eigen value decomposition is proposed and evaluated on DIBCO and H-DIBCO datasets.
Abstract: A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.

2 citations