Segmenting printed text and handwritten annotation by Spectral Partitioning

doi:10.1109/NCVPRIPG.2015.7490033

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Associating Field Components in Heterogeneous Handwritten form Images using Graph Autoencoder

[...]

Srivastava Divya¹, Harit Gaurav¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

01 Sep 2019

TL;DR: This work uses a Graph Autoencoder to perform the intended field label to field value association in a given form image, which is the first attempt to perform label-value associations in a handwritten form image using a machine learning approach.

...read moreread less

Abstract: We propose a graph-based deep network for predicting the associations pertaining to field labels and field values in heterogeneous handwritten form images. We consider forms in which the field label comprises printed text and field value can be the handwritten text. Inspired by the relationship predicting capability of the graphical models, we use a Graph Autoencoder to perform the intended field label to field value association in a given form image. To the best of our knowledge, it is the first attempt to perform label-value association in a handwritten form image using a machine learning approach. We have prepared our handwritten form image dataset comprising 300 images from 30 different templates having 10 images per template. Our framework is experimented on different network parameter and has shown promising results.

...read moreread less

1 citations

Cites methods from "Segmenting printed text and handwri..."

...The features used to distinguish between them have been adopted from [11] as: field component patch size, foreground density, average stroke width, horizontal and vertical density difference, maximum horizontal and vertical runlength and standard deviation of horizontal and vertical projection of a patch....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on spectral clustering

[...]

Ulrike von Luxburg¹•Institutions (1)

Max Planck Society¹

01 Dec 2007-Statistics and Computing

TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.

...read moreread less

Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

...read moreread less

9,141 citations

Proceedings Article•

On Spectral Clustering: Analysis and an algorithm

[...]

Andrew Y. Ng¹, Michael I. Jordan¹, Yair Weiss²•Institutions (2)

University of California, Berkeley¹, Hebrew University of Jerusalem²

03 Jan 2001

TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.

...read moreread less

Abstract: Despite many empirical successes of spectral clustering methods— algorithms that cluster points using eigenvectors of matrices derived from the data—there are several unresolved issues. First. there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.

...read moreread less

9,043 citations

"Segmenting printed text and handwri..." refers methods in this paper

...The Spectral approach relies on the Eigen structure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity [12]....
[...]

Proceedings Article•DOI•

Separating handwritten material from machine printed text using hidden Markov models

[...]

J.K. Guo¹, M.Y. Ma²•Institutions (2)

Princeton University¹, Panasonic²

10 Sep 2001

TL;DR: An algorithm that is based on the theory of hidden Markov models (HMMs) to distinguish between machine-printed and handwritten materials is presented, which has been shown to be promising in the authors' experiments.

...read moreread less

Abstract: In this paper, we address the problem of separating handwritten annotations from machine-printed text within a document. We present an algorithm that is based on the theory of hidden Markov models (HMMs) to distinguish between machine-printed and handwritten materials. No OCR results are required prior to or during the process, and the classification is performed at the word level. Handwritten annotations are not limited to marginal areas, as the approach can deal with document images having handwritten annotations overlaid on machine-printed text and it has been shown to be promising in our experiments. Experimental results show that the proposed method can achieve 72.19% recall for fully extracted handwritten words and 90.37% for partially extracted words. The precision of extracting handwritten words has reached 92.86%.

...read moreread less

107 citations

"Segmenting printed text and handwri..." refers background in this paper

...Only few systems are able to handle multi-oriented handwritten annotations [1] [3] [7] in a real environment....
[...]

Proceedings Article•DOI•

A system for machine-written and hand-written character distinction

[...]

K. Kuhnke¹, L. Simoncini¹, Zs.M. Kovács-V¹•Institutions (1)

University of Bologna¹

14 Aug 1995

TL;DR: In this work a classification system is presented which reads a raster image of a character and outputs two confidence values, one for "machine-written" and one for 'hand-written' character classes, respectively.

...read moreread less

Abstract: In applications of character recognition where machine-printed and hand-written characters are involved, it is important to know if the character image, or the whole word, is machine- or hand-written. This is due to the accuracy difference between the algorithms and systems oriented to machine- or handwritten characters. Obviously, this type of knowledge leads to the increase of the overall system quality. In this work a classification system is presented which reads a raster image of a character and outputs two confidence values, one for "machine-written" and one for "hand-written" character classes, respectively. The proposed system features a preprocessing step, which transforms a general uncentered character image into a normalized form, then the feature extraction phase extracts relevant information from the image, and at the end, a standard classifier based on a feedforward neural network creates the final response. At the end, some results on a proprietary image database are reported.

...read moreread less

65 citations

"Segmenting printed text and handwri..." refers background or methods in this paper

...Most of these systems extract annotations in controlled scenario [4] [5] [6] [2]....
[...]
...It commenced by the contribution of Kuhnke et al [4] for printed and hand-written character segmentation using directional and symmetrical features into a neural network....
[...]

Proceedings Article•DOI•

Automatic separation of machine-printed and hand-written text lines

[...]

Umapada Pal, Bidyut B. Chaudhuri

20 Sep 1999

TL;DR: This paper presents a classification scheme for both Bangla and Devnagari characters based on the structural and statistical features of the machine-printed and hand-written text lines and has an accuracy of about 98.3%.

...read moreread less

Abstract: There are many types of documents where machine-printed and hand-written texts appear intermixed. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, it is necessary to separate these two types of text before feeding them to the respective OCR systems. In this paper, we present such a scheme for both Bangla and Devnagari characters. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of about 98.3%.

...read moreread less

48 citations

"Segmenting printed text and handwri..." refers background or methods in this paper

...Most of these systems extract annotations in controlled scenario [4] [5] [6] [2]....
[...]
...Pal and Chaudhuri [5] segmented handwritten and printed text lines of Bangla and Devnagari using a tree-based classification approach....
[...]

Segmenting printed text and handwritten annotation by Spectral Partitioning

Citations

Cites methods from "Segmenting printed text and handwri..."

References

"Segmenting printed text and handwri..." refers methods in this paper

"Segmenting printed text and handwri..." refers background in this paper

"Segmenting printed text and handwri..." refers background or methods in this paper

"Segmenting printed text and handwri..." refers background or methods in this paper

Related Papers (5)