scispace - formally typeset

Book ChapterDOI

Unlocking the Mechanism of Devanagari Letter Identification Using Eye Tracking

05 Dec 2017-pp 219-226

TL;DR: Upon understanding the level of distortion acceptable for correct letter recognition and the processes involved in the identification of the letters, the OCR can be made more robust and the gap between human reading and machine reading can be narrowed down.

AbstractThe present day computers can outperform the human in many complicated tasks very precisely and efficiently. However, in many scenarios like pattern recognition and more importantly, character recognition; a school going child can outperform the sophisticated machines available today. The modern machines present today find handwritten, calligraphic text difficult to recognize because such texts hardly contain rationalized straight lines or perfect loops or circles. Therefore, most of the optical character recognition systems fail to recognize the characters beyond certain levels of distortions and noise. On the other hand, the human brain has achieved a remarkable ability to recognize visual patterns or characters in various distortion conditions with high speed. The present work tries to understand how human perceive, process and recognize the Devanagari characters under various distortion levels. In order to achieve this objective, eye tracking experiment was performed on 20 graduate participants by presenting stimuli in decreasing level of distortions (from highly distorted to more normal one). The eye fixation patterns along with the time course of recognition gave us the moment-to-moment processing involved in letter identification. Upon understanding the level of distortion acceptable for correct letter recognition and the processes involved in the identification of the letters, the OCR can be made more robust and the gap between human reading and machine reading can be narrowed down.

...read more


Citations
More filters
Book ChapterDOI
07 Dec 2018
TL;DR: There is a scope for improvement in the reading comprehension, by changing the physical properties of the document without changing its content, when the same document is read in different font type.
Abstract: In this world of digitization, screen reading has grown immensely due to the availability of affordable display devices. Most of the people prefer to read on display devices as compared to the print media. To make the reading experience of the reader pleasant and comfortable, the font designers strive hard to choose suitable typographical properties of the text such as font type, font size etc. Some of the researchers suggest that the typography of the text affects the reading performance of the readers to some extent. However, the research focusing on the effect of typography on the reading behavior of the readers is limited and it is hardly touched upon for the Indian scripts. Therefore, the proposed paper aims to find out the effect of Devanagari font type on the reading performance, especially reading comprehension of the readers. In addition to this, a method to reduce the error in the gaze estimation of the eye tracker is also proposed. In order to understand the reading behavior, an eye tracking experiment is performed on 14 participants asking them to read 22 pages, in 3 different font types, presented on the screen of the eye tracker. The performance of the readers is analyzed in terms of total reading time, comprehension score, number of fixations, fixation duration and number of regressions. Our results show that there is a significant difference in the fixation duration, a number of fixations and the comprehension score, when the same document is read in different font type. Thus, there is a scope for improvement in the reading comprehension, by changing the physical properties of the document without changing its content. These findings might be useful to understand the readers’ preference for the font and to design a proper font type for online reading.

5 citations

Journal ArticleDOI
TL;DR: The comparative analysis shows that the memorability-based compression outperforms the state-of-the-art compression techniques.
Abstract: This study is concerned with achieving the image compression using the concept of memorability. The authors have used memorability of an image, as a perceptual measure while image coding. In the proposed approach, a region-of-interest-based memorability preserving image compression algorithm which is accomplished via two sub-processes namely, memorability prediction and image compression is introduced. The memorability of images is predicted using convolutional neural network and restricted Boltzmann machine features. Based on these features, the memorability score of individual patches in an image is calculated and these scores are used to generate the memorability map. These memorability map values are used for optimised image compression. In order to validate the results, an eye tracking experiment with human participants is performed. The comparative analysis shows that the memorability-based compression outperforms the state-of-the-art compression techniques.

4 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: It is found that the convolutional neural network performs better when trained with the assistance of fixation information compared to the network trained without eye fixations.
Abstract: This paper is concerned with the development of techniques for the recognition of ornamental characters motivated by the perceptual processes involved in humans. To understand the perceptual process, we have performed the eye-tracking experiment to recognize the special set of characters, with artistic variations in character structure and form. The novelty of this paper is the use of human visual fixations to supervise the intermediate layers of the convolutional neural network. From the results obtained, we found that the network performs better when trained with the assistance of fixation information compared to the network trained without eye fixations.

4 citations


Cites methods from "Unlocking the Mechanism of Devanaga..."

  • ...This was used in our earlier work also [11] to understand the recognition of distorted Devanagari characters....

    [...]

Posted Content
Abstract: This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human's role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors. The comparison with the state-of-the-art techniques demonstrates that the proposed approach yields ceiling-level performance with reduced computational cost in summarising the videos.

1 citations

Proceedings ArticleDOI
10 Jan 2021
Abstract: The deep learning models, which include attention mechanisms, are shown to enhance the performance and efficiency of the various computer vision tasks such as pattern recognition, object detection, face recognition, etc. Although the visual attention mechanism is the source of inspiration for these models, recent attention models consider ’attention’ as a pure machine vision optimization problem, and visual attention remains the most neglected aspect. Therefore, this paper presents a collaborative human and machine attention module which considers both visual and network’s attention. The proposed module is inspired by the dorsal (‘where’) pathways of visual processing and can be integrated with any convolutional neural network (CNN) model. First, the module computes the spatial attention map from the input feature maps, which is then combined with the visual attention maps. The visual attention maps are created using eye-fixations obtained by performing an eye-tracking experiment with human participants. The visual attention map covers the highly salient and discriminating image regions as humans tend to focus on such regions, whereas the other relevant image regions are processed by spatial attention map. The combination of these two maps results in the finer refinement in feature maps, resulting in improved performance. The comparative analysis reveals that our model not only shows significant improvement over the baseline model but also outperforms the other models. We hope that our findings using a collaborative human-machine attention module will be helpful in other computer vision tasks as well.

1 citations


References
More filters
Journal ArticleDOI
TL;DR: The basic theme of the review is that eye movement data reflect moment-to-moment cognitive processes in the various tasks examined.
Abstract: Recent studies of eye movements in reading and other information processing tasks, such as music reading, typing, visual search, and scene perception, are reviewed. The major emphasis of the review is on reading as a specific example of cognitive processing. Basic topics discussed with respect to reading are (a) the characteristics of eye movements, (b) the perceptual span, (c) integration of information across saccades, (d) eye movement control, and (e) individual differences (including dyslexia). Similar topics are discussed with respect to the other tasks examined. The basic theme of the review is that eye movement data reflect moment-to-moment cognitive processes in the various tasks examined. Theoretical and practical considerations concerning the use of eye movement data are also discussed.

6,131 citations

Book
01 Jan 1969

2,269 citations

Proceedings ArticleDOI
08 Nov 2000
TL;DR: A taxonomy of fixation identification algorithms is proposed that classifies algorithms in terms of how they utilize spatial and temporal information in eye-tracking protocols in order to evaluate and compare these algorithms with respect to a number of qualitative characteristics.
Abstract: The process of fixation identification—separating and labeling fixations and saccades in eye-tracking protocols—is an essential part of eye-movement data analysis and can have a dramatic impact on higher-level analyses. However, algorithms for performing fixation identification are often described informally and rarely compared in a meaningful way. In this paper we propose a taxonomy of fixation identification algorithms that classifies algorithms in terms of how they utilize spatial and temporal information in eye-tracking protocols. Using this taxonomy, we describe five algorithms that are representative of different classes in the taxonomy and are based on commonly employed techniques. We then evaluate and compare these algorithms with respect to a number of qualitative characteristics. The results of these comparisons offer interesting implications for the use of the various algorithms in future work.

1,559 citations

Journal ArticleDOI
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.
Abstract: Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.

565 citations

Proceedings ArticleDOI
08 Sep 2013
TL;DR: This work investigates whether different document types can be automatically detected from visual behaviour recorded using a mobile eye tracker, and presents an initial recognition approach that uses special purpose eye movement features as well as machine learning for document type detection.
Abstract: Reading is a ubiquitous activity that many people even perform in transit, such as while on the bus or while walking. Tracking reading enables us to gain more insights about expertise level and potential knowledge of users -- towards a reading log tracking and improve knowledge acquisition. As a first step towards this vision, in this work we investigate whether different document types can be automatically detected from visual behaviour recorded using a mobile eye tracker. We present an initial recognition approach that com- bines special purpose eye movement features as well as machine learning for document type detection. We evaluate our approach in a user study with eight participants and five Japanese document types and achieve a recognition performance of 74% using user-independent training.

62 citations