scispace - formally typeset
Search or ask a question
Book ChapterDOI

Extraction of Doodles and Drawings from Manuscripts

TL;DR: An approach to separate the non-texts from texts of a manuscript, mainly in the form of doodles and drawings of some exceptional thinkers and writers, and a computational approach to recover the struck-out texts to reduce human effort.
Abstract: In this paper we propose an approach to separate the non-texts from texts of a manuscript. The non-texts are mainly in the form of doodles and drawings of some exceptional thinkers and writers. These have enormous historical values due to study on those writers’ subconscious as well as productive mind. We also propose a computational approach to recover the struck-out texts to reduce human effort. The proposed technique has a preprocessing stage, which removes noise using median filter and segments object region using fuzzy c-means clustering. Now connected component analysis finds the major portions of non-texts, and window examination eliminates the partially attached texts. The struck-out texts are extracted by eliminating straight lines, measuring degree of continuity, using some morphological operations.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors used various models mainly based on handcrafted features with support vector machine and features auto-derived by the convolutional network for writer identification/verification from highly intra-variable offline Bengali writing.
Abstract: The handwriting of a person may vary substantially with factors, such as mood, time, space, writing speed, writing medium/tool, writing a topic, and so on. It becomes challenging to perform automated writer verification/identification on a particular set of handwritten patterns (e.g., speedy handwriting) of an individual, especially when the system is trained using a different set of writing patterns (e.g., normal speed) of that same person. However, it would be interesting to experimentally analyze if there exists any implicit characteristic of individuality which is insensitive to high intra-variable handwriting. In this paper, we study some handcrafted features and auto-derived features extracted from intra-variable writing. Here, we work on writer identification/verification from highly intra-variable offline Bengali writing. To this end, we use various models mainly based on handcrafted features with support vector machine and features auto-derived by the convolutional network. For experimentation, we have generated two handwritten databases from two different sets of 100 writers and enlarged the dataset by a data-augmentation technique. We have obtained some interesting results.

32 citations

Proceedings ArticleDOI
01 Aug 2018
TL;DR: A framework for annotating large scale of handwritten word images with ease and speed is proposed, and a new handwritten word dataset for Telugu is released, which is collected and annotated using the proposed framework.
Abstract: Handwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Lack of publicly available handwriting datasets in Indic scripts has affected the development of handwritten word recognizers, and made direct comparisons across different methods an impossible task in the field. In this paper, we propose a framework for annotating large scale of handwritten word images with ease and speed. We also release a new handwritten word dataset for Telugu, which is collected and annotated using the proposed framework. We also benchmark major Indic scripts such as Devanagari, Bangla and Telugu for the tasks of word spotting and handwriting recognition using state of the art deep neural architectures. Finally, we evaluate the proposed pipeline on RoyDB, a public dataset, and achieve significant reduction in error rates.

19 citations

Proceedings ArticleDOI
01 Sep 2014
TL;DR: A graph based model is used to represent a textual connected component as a graph, and finds the shortest path which is nearly as long as the width of the text component and maintains a reasonable degree of straightness.
Abstract: A handwritten document may contain strike-through texts. If such texts are fed into an OCR system, the output will be garbage. In this paper, we propose a scheme to detect such strike-through texts/words. Using a graph based model, we represent a textual connected component as a graph. The start/end and intersection points of the ink-strokes of a component are marked as graph nodes. There exists an edge between two nodes if they are connected by object (ink) pixels. By eliminating parallel edges and self loops we obtain a simple, undirected, edge-weighted graph of the text-component. The edge-weight is found by adding horizontal/vertical moves weighted by 1 and diagonal moves weighted by √2. In this graph, we find the shortest path which is nearly as long as the width of the text component and maintains a reasonable degree of straightness. This path, if exist, is identified as the strike-through line. Here we deal with handwritten documents in English, Bengali and Devanagari script. Our approach delivers fairly good results.

13 citations


Cites methods from "Extraction of Doodles and Drawings ..."

  • ...For doodle and drawing separation, Adak and Chaudhuri [7] proposed a model consisting of 5X5 window examination, with degree of continuity and connected component analysis....

    [...]

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This study collects contemporary Bengali handwritings, on which the subjective legibility and aesthetic scores are provided by human readers, and formulates both legible and aesthetic analysis tasks as machine learning problems supervised by the human cognitive system.
Abstract: This paper deals with computer-based cognitive analysis towards legibility and aesthetics of a handwritten document The legible text creates a human perception that the writing can be read effortlessly because of its orthographic clarity The aesthetic property relates to the beautiful appearance of a handwritten document In this study, we deal with these properties on offline Bengali handwriting We formulate both legibility and aesthetic analysis tasks as machine learning problems supervised by the human cognitive system We employ automatically derived feature-based recurrent neural networks to investigate writing legibility For aesthetics evaluation, we employ hand-crafted feature-based support vector machines (SVMs) We have collected contemporary Bengali handwritings, on which the subjective legibility and aesthetic scores are provided by human readers On this corpus containing legibility and aesthetic ground-truth information, we executed our experiments The experimental results obtained on various handwritings are encouraging

6 citations


Cites methods from "Extraction of Doodles and Drawings ..."

  • ...The text region is separated from the doodle/drawing-like non-text components, if any, by employing the technique in [21]....

    [...]

Proceedings ArticleDOI
05 Dec 2018
TL;DR: The approach is to find idiosyncratic handwritten text components and model the idiosyncrasy analysis task as a machine learning problem supervised by human cognition and employ the Inception network for this purpose.
Abstract: In this paper, we study handwriting idiosyncrasy in terms of its structural eccentricity. In this study, our approach is to find idiosyncratic handwritten text components and model the idiosyncrasy analysis task as a machine learning problem supervised by human cognition. We employ the Inception network for this purpose. The experiments are performed on two publicly available databases and an in-house database of Bengali offline handwritten samples. On these samples, subjective opinion scores of handwriting idiosyncrasy are collected from handwriting experts. We have analyzed the handwriting idiosyncrasy on this corpus which comprises the perceptive ground-truth opinion. We also investigate the effect of idiosyncratic text on writer identification by using the SqueezeNet. The performance of our system is promising.

5 citations


Cites methods from "Extraction of Doodles and Drawings ..."

  • ...Non-textual components such as drawings/doodles are removed using the technique of [13]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The development and implementation of an algorithm for automated text string separation that is relatively independent of changes in text font style and size and of string orientation are described and showed superior performance compared to other techniques.
Abstract: The development and implementation of an algorithm for automated text string separation that is relatively independent of changes in text font style and size and of string orientation are described. It is intended for use in an automated system for document analysis. The principal parts of the algorithm are the generation of connected components and the application of the Hough transform in order to group components into logical character strings that can then be separated from the graphics. The algorithm outputs two images, one containing text strings and the other graphics. These images can then be processed by suitable character recognition and graphics recognition systems. The performance of the algorithm, both in terms of its effectiveness and computational efficiency, was evaluated using several test images and showed superior performance compared to other techniques. >

664 citations


"Extraction of Doodles and Drawings ..." refers methods in this paper

  • ...The existing methods [2-10] deal with different logos, diagrams, maps, engineering drawings and photographic images....

    [...]

Journal ArticleDOI
TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.
Abstract: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.

544 citations


"Extraction of Doodles and Drawings ..." refers background in this paper

  • ...In the field of document image analysis [1], the separation of texts and nontexts has gained interest since 1980....

    [...]

Journal ArticleDOI
TL;DR: An algorithm has been developed to locate and separate text strings of various font sizes, styles, and orientations by applying the Hough transform to the centroids of connected components in the image.
Abstract: A system for interpretation of images of paper-based line drawings is described. Since a typical drawing contains both text strings and graphics, an algorithm has been developed to locate and separate text strings of various font sizes, styles, and orientations. This is accomplished by applying the Hough transform to the centroids of connected components in the image. The graphics in the segmented image are processed to represent thin entities by their core-lines and thick objects by their boundaries. The core-lines and boundaries are segmented into straight line segments and curved lines. The line segments and their interconnections are analyzed to locate minimum redundancy loops which are adequate to generate a succinct description of the graphics. Such a description includes the location and attributes of simple polygonal shapes, circles, and interconnecting lines, and a description of the spatial relationships and occlusions among them. Hatching and filling patterns are also identified. The performance of the system is evaluated using several test images, and the results are presented. The superiority of these algorithms in generating meaningful interpretations of graphics, compared to conventional data compression schemes, is clear from these results. >

232 citations


"Extraction of Doodles and Drawings ..." refers methods in this paper

  • ...The existing methods [2-10] deal with different logos, diagrams, maps, engineering drawings and photographic images....

    [...]

Journal ArticleDOI
Zhaoyang Lu1
TL;DR: An algorithm for text/graphics separation is presented that can be used to extract both Chinese and Western characters, dimensions, and symbols and has few limitations on the kind of engineering drawings and noise level.
Abstract: An algorithm for text/graphics separation is presented in this paper. The basic principle of the algorithm is to erase nontext regions from mixed text and graphics engineering drawings, rather than extract text regions directly. This algorithm can be used to extract both Chinese and Western characters, dimensions, and symbols and has few limitations on the kind of engineering drawings and noise level. It is robust to text-graphics touching, text fonts, and written orientations.

74 citations


"Extraction of Doodles and Drawings ..." refers methods in this paper

  • ...The existing methods [2-10] deal with different logos, diagrams, maps, engineering drawings and photographic images....

    [...]