scispace - formally typeset
Search or ask a question

Showing papers on "Document processing published in 2012"


Journal ArticleDOI
Li Deng1
TL;DR: “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research.
Abstract: In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research.

1,626 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work presents a framework that exploits both bottom-up and top-down cues in the problem of recognizing text extracted from street images, and shows significant improvements in accuracies on two challenging public datasets, namely Street View Text and ICDAR 2003.
Abstract: Scene text recognition has gained significant attention from the computer vision community in recent years. Recognizing such text is a challenging problem, even more so than the recognition of scanned documents. In this work, we focus on the problem of recognizing text extracted from street images. We present a framework that exploits both bottom-up and top-down cues. The bottom-up cues are derived from individual character detections from the image. We build a Conditional Random Field model on these detections to jointly model the strength of the detections and the interactions between them. We impose top-down cues obtained from a lexicon-based prior, i.e. language statistics, on the model. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. We show significant improvements in accuracies on two challenging public datasets, namely Street View Text (over 15%) and ICDAR 2003 (nearly 10%).

349 citations


Patent
27 Aug 2012
TL;DR: In this article, a document processing system for accurately and efficiently analyzing documents and methods for making and using the same is presented, where each incoming document includes at least one section of textual content and is provided as a paper-based document that is converted into an electronic form.
Abstract: A document processing system for accurately and efficiently analyzing documents and methods for making and using same. Each incoming document includes at least one section of textual content and is provided in an electronic form or as a paper-based document that is converted into an electronic form. Since many categories of documents, such as legal and accounting documents, often include one or more common text sections with similar textual content, the document processing system compares the documents to identify and classify the common text sections. The document comparison can be further enhanced by dividing the document into document segments and comparing the document segments; whereas, the conversion of paper-based documents likewise can be improved by comparing the resultant electronic document with a library of standard phrases, sentences, and paragraphs. The document processing system thereby enables an image of the document to be manipulated, as desired, to facilitate its review.

103 citations


Proceedings ArticleDOI
18 Sep 2012
TL;DR: The comprehensive Arabic offline Handwritten Text database (KHATT) is reported after completion of the collection of 1000 handwritten forms written by 1000 writers from different countries, composed of an image database containing images of the written text at 200, 300, and 600 dpi resolutions, and a manually verified ground truth database that contains meta-data describing thewritten text at the page, paragraph, and line levels.
Abstract: In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) after completion of the collection of 1000 handwritten forms written by 1000 writers from different countries. It is composed of an image database containing images of the written text at 200, 300, and 600 dpi resolutions, a manually verified ground truth database that contains meta-data describing the written text at the page, paragraph, and line levels. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. Preliminary experiments on Arabic handwritten text recognition are conducted using sample data from the database and the results are reported. The database will be made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, etc.

88 citations


Proceedings ArticleDOI
11 Mar 2012
TL;DR: It is found that Shreddr can significantly decrease the effort and cost of data entry, while maintaining a high level of quality, within this case study.
Abstract: For low-resource organizations working in developing regions, infrastructure and capacity for data collection have not kept pace with the increasing demand for accurate and timely data. Despite continued emphasis and investment, many data collection efforts still suffer from delays, inefficiency and difficulties maintaining quality. Data is often still "stuck" on paper forms, making it unavailable for decision-makers and operational staff. We apply techniques from computer vision, database systems and machine learning, and leverage new infrastructure -- online workers and mobile connectivity -- to redesign data entry with high data quality. Shreddr delivers self-serve, low-cost and on-demand data entry service allowing low-resource organizations to quickly transform stacks of paper into structured electronic records through a novel combination of optimizations: batch processing and compression techniques from database systems, automatic document processing using computer vision, and value verification through crowd-sourcing. In this paper, we describe Shreddr's design and implementation, and measure system performance with a large-scale evaluation in Mali, where Shreddr was used to enter over a million values from 36,819 pages. Within this case study, we found that Shreddr can significantly decrease the effort and cost of data entry, while maintaining a high level of quality.

55 citations


Journal ArticleDOI
TL;DR: The paper presents a survey of applications of OCR in different fields and further presents the experimentation for three important applications such as Captcha, Institutional Repository and Optical Music Character Recognition.
Abstract: Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website. The paper presents a survey of applications of OCR in different fields and further presents the experimentation for three important applications such as Captcha, Institutional Repository and Optical Music Character Recognition. We make use of an enhanced image segmentation algorithm based on histogram equalization using genetic algorithms for optical character recognition. The paper will act as a good literature survey for researchers starting to work in the field of optical character recognition.

52 citations


Proceedings ArticleDOI
27 Mar 2012
TL;DR: This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames.
Abstract: Extraction and recognition of text present in video has become a very popular research area in the last decade. Generally, text present in video frames is of different size, orientation, style, etc. with complex backgrounds, noise, low resolution and contrast. These factors make the automatic text extraction and recognition in video frames a challenging task. A large number of techniques have been proposed by various researchers in the recent past to address the problem. This paper presents a review of various state-of-the-art techniques proposed towards different stages (e.g. detection, localization, extraction, etc.) of text information processing in video frames. Looking at the growing popularity and the recent developments in the processing of text in video frames, this review imparts details of current trends and potential directions for further research activities to assist researchers.

49 citations


Patent
06 Jul 2012
TL;DR: In this paper, the authors present a system for providing a financial document processing system, where the system receives an image of financial documents, such as a check, and identifies at least a transaction amount from the financial document.
Abstract: Embodiments of the invention relate to systems, methods, and computer program products for providing a financial document processing system. The system receives an image of a financial document, such as a check, and identifies at least a transaction amount from the financial document. Then the system determines account information associated with the financial document. The system may identify a routing and account number on the financial document or identify the document based on a name on the document. Once the transaction amount and account information are known, the system determines a prospective balance for a financial account based on the account information and the transaction amount. If there are funds at least equal to the transaction amount in the financial account, the system validates the transactions. In some embodiments, the system also immediately updates the balance of the account and/or offers to complete the transaction.

48 citations


Journal ArticleDOI
TL;DR: Preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.

46 citations


Patent
02 Feb 2012
TL;DR: In this article, a set of word embedding transforms are applied to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document.
Abstract: A set of word embedding transforms are applied to transform text words of a set of documents into K-dimensional word vectors in order to generate sets or sequences of word vectors representing the documents of the set of documents. A probabilistic topic model is learned using the sets or sequences of word vectors representing the documents of the set of documents. The set of word embedding transforms are applied to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document. The learned probabilistic topic model is applied to assign probabilities for topics of the probabilistic topic model to the set or sequence of word vectors representing the input document. A document processing operation such as annotation, classification, or similar document retrieval may be performed using the assigned topic probabilities.

46 citations


Proceedings ArticleDOI
27 Mar 2012
TL;DR: A novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images by exploiting the additional context present in the character n-gram images, which enables better disambiguation between confusing characters in the recognition phase.
Abstract: In this paper we present a novel recognition approach that results in a 15% decrease in word error rate on heavily degraded Indian language document images. OCRs have considerably good performance on good quality documents, but fail easily in presence of degradations. Also, classical OCR approaches perform poorly over complex scripts such as those for Indian languages. We address these issues by proposing to recognize character n-gram images, which are basically groupings of consecutive character/component segments. Our approach is unique, since we use the character n-grams as a primitive for recognition rather than for post processing. By exploiting the additional context present in the character n-gram images, we enable better disambiguation between confusing characters in the recognition phase. The labels obtained from recognizing the constituent n-grams are then fused to obtain a label for the word that emitted them. Our method is inherently robust to degradations such as cuts and merges which are common in digital libraries of scanned documents. We also present a reliable and scalable scheme for recognizing character n-gram images. Tests on English and Malayalam document images show considerable improvement in recognition in the case of heavily degraded documents.

Journal Article
TL;DR: A recognition model based on multiple Hidden Markov Models followed by few novel feature extraction techniques for a single character to tackle its different writing formats and a post-processing block at the final stage to enhance the recognition rate further is proposed.
Abstract: rate of handwritten character is still limited around 90 percent due to the presence of large variation of shape, scale and format in hand written characters. A sophisticated hand written character recognition system demands a better feature extraction technique that would take care of such variation of hand writing. In this paper, we propose a recognition model based on multiple Hidden Markov Models (HMMs) followed by few novel feature extraction techniques for a single character to tackle its different writing formats. We also propose a post-processing block at the final stage to enhance the recognition rate further. We have created a data-base of 13000 samples collected from 100 writers written five times for each character. 2600 samples have been used to train HMM and the rest are used to test recognition model. Using our proposed recognition system we have achieved a good average recognition rate of 98.26 percent.

Journal ArticleDOI
01 May 2012
TL;DR: The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ, a metric that can be used to compare existing document enhancement methods and guide automated document enhancement.
Abstract: Large degradations in document images impede their readability and deteriorate the performance of automated document processing systems. Document image quality (IQ) metrics have been defined through optical character recognition (OCR) accuracy. Such metrics, however, do not always correlate with human perception of IQ. When enhancing document images with the goal of improving readability, e.g., in historical documents where OCR performance is low and/or where it is necessary to preserve the original context, it is important to understand human perception of quality. The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ. Such a metric can be used to compare existing document enhancement methods and guide automated document enhancement. Moreover, the proposed methodology is designed as a general framework that can be applied in a wide range of applications.

Proceedings ArticleDOI
27 Mar 2012
TL;DR: A novel method to recognize scene texts avoiding the conventional character segmentation step is proposed, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones.
Abstract: Understanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches.

Journal ArticleDOI
TL;DR: A digital library system for managing heterogeneous music collections that offers a full-fledged, widely automated document processing chain: digitization, indexing, annotation, access, and presentation, implemented as a generic and modular music repository based on a service-oriented software architecture.
Abstract: In this paper, we present a digital library system for managing heterogeneous music collections. The heterogeneity refers to various document types and formats as well as to different modalities, e. g., CD-audio recordings, scanned sheet music, and lyrics. The system offers a full-fledged, widely automated document processing chain: digitization, indexing, annotation, access, and presentation. Our system is implemented as a generic and modular music repository based on a service-oriented software architecture. As a particular strength of our approach, the various documents representing aspects of a piece of music are jointly considered in all stages of the document processing chain. Our user interfaces allow for a multimodal and synchronized presentation of documents (WYSIWYH: what you see is what you hear), a score- or lyrics-based navigation in audio, as well as a cross- and multimodal retrieval. Hence, our music repository may be called a truly cross-modal library system. In our paper, we describe the system components, outline the techniques of the document processing chain, and illustrate the implemented functionalities for user interaction. We describe how the system is put into practice at the Bavarian State Library (BSB) Munich as a part of the German PROBADO Digital Library Initiative (PDLI).

Proceedings ArticleDOI
18 Sep 2012
TL;DR: This paper deals with recognition of online handwritten Bangla (Bengali) text with segmentation of text into strokes, and discovered some rules analyzing different joining patterns of Bangla characters.
Abstract: This paper deals with recognition of online handwritten Bangla (Bengali) text. Here, at first, we segment cursive words into strokes. A stroke may represent a character or a part of a character. We selected a set of Bangla words written by different groups of people such that they contain all basic characters, all vowel and consonant modifiers and almost all types of possible joining among them. For segmentation of text into strokes, we discovered some rules analyzing different joining patterns of Bangla characters. Combination of online and offline information was used for segmentation. We achieved correct segmentation rate of 97.89% on the dataset. We manually analyzed different strokes to create a ground truth set of distinct stroke classes for result verification and we obtained 85 stroke classes. Directional features were used in SVM for recognition and we achieved correct stroke recognition rate of 97.68%.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: A Neural Network, specifically a backpropagation network, will be used in generalizing the relationship of the title and the content of articles in the archive by following word features other than TF-IDF, such as position of word in the sentence, paragraph, or in the entire document.
Abstract: Keyword extraction is vital for Knowledge Management System, Information Retrieval System, and Digital Libraries as well as for general browsing of the web. Keywords are often the basis of document processing methods such as clustering and retrieval since processing all the words in the document can be slow. Common models for automating the process of keyword extraction are usually done by using several statistics-based methods such as Bayesian, K-Nearest Neighbor, and Expectation-Maximization. These models are limited by word-related features that can be used since adding more features will make the models more complex and difficult to comprehend. In this research, a Neural Network, specifically a backpropagation network, will be used in generalizing the relationship of the title and the content of articles in the archive by following word features other than TF-IDF, such as position of word in the sentence, paragraph, or in the entire document, and formats such as heading, and other attributes defined beforehand. In order to explain how the backpropagation network works, a rule extraction method will be used to extract symbolic data from the resulting backpropagation network. The rules extracted can then be transformed into decision trees performing almost as accurate as the network plus the benefit of being in an easily comprehensible format.

Proceedings ArticleDOI
18 Sep 2012
TL;DR: A new approach is proposed which strives towards identifying and separating handwritten from machine printed text using the Bag of Visual Words paradigm (BoVW), using a consistent evaluation methodology which couples meaningful measures along with a new dataset.
Abstract: In a number of types of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may be present in the same document image, giving rise to significant issues within a digitisation and recognition pipeline. It is therefore necessary to separate the two types of text before applying different recognition methodologies to each. In this paper, a new approach is proposed which strives towards identifying and separating handwritten from machine printed text using the Bag of Visual Words paradigm (BoVW). Initially, blocks of interest are detected in the document image. For each block, a descriptor is calculated based on the BoVW. The final characterization of the blocks as Handwritten, Machine Printed or Noise is made by a Support Vector Machine classifier. The promising performance of the proposed approach is shown by using a consistent evaluation methodology which couples meaningful measures along with a new dataset.

Patent
Shenghua Bao1, Jie Cui1, Hui Su1, Zhong Su1, Li Zhang1 
10 Sep 2012
TL;DR: In this article, a method and system for expanding a document set as a search data source in the field of business related search is presented, which includes identifying one or more entity words of the seed document, identifying one of the topic words, and forming an entity word-topic word pair from each identified topic word and the entity word on the basis of which each topic word is identified.
Abstract: A method and system for expanding a document set as a search data source in the field of business related search. The present invention provides a method of expanding a seed document in a seed document set. The method includes identifying one or more entity words of the seed document; identifying one or more topic words identifying one or more topic words related to a based entity word in the seed document where the entity word is located; forming an entity word-topic word pair from each identified topic word and the entity word on the basis of which each topic word is identified; and obtaining one or more expanded documents by taking the entity word and topic word in each entity word-topic word pair as key words for web searching at the same time. A system for executing the above method is also provided.

Proceedings ArticleDOI
26 Nov 2012
TL;DR: This paper presents and compares techniques that have been used to recognize the Arabic handwriting scripts in online recognition systems and attempts to recognize Arabic handwritten words, characters, digits or strokes.
Abstract: Online recognition of Arabic handwritten text has been an on-going research problem for many years. Generally, online text recognition field has been gaining more interest lately due to the increasing popularity of hand-held computers, digital notebooks and advanced cellular phones. However, different techniques have been used to build several online handwritten recognition systems for Arabic text, such as Neural Networks, Hidden Markov Model, Template Matching and others. Most of the researches on online text recognition have divided the recognition system into these three main phases which are preprocessing phase, feature extraction phase and recognition phase which considers as the most important phase and the heart of the whole system. This paper presents and compares techniques that have been used to recognize the Arabic handwriting scripts in online recognition systems. Those techniques attempt to recognize Arabic handwritten words, characters, digits or strokes. The structure and strategy of those reviewed techniques are explained in this article. The strengths and weaknesses of using these techniques will also be discussed.

Journal ArticleDOI
TL;DR: The method aims at training a simple neural network with three layers using backpropagation algorithm that converts handwritten text to machine readable and editable form and Malayalam to Unicode format.
Abstract: Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm. Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.

Journal ArticleDOI
TL;DR: The application of neural networks in recognizing characters from a printed script is explored and compared to traditional methods of generalization, a highly specific character set is trained for each type.
Abstract: the recent advances in the computing technology, many recognition tasks have become automated. Character Recognition maps a matrix of pixels into characters and words. Recently, artificial neural network theories have shown good capabilities in performing character recognition. In this paper, the application of neural networks in recognizing characters from a printed script is explored. Contrast to traditional methods of generalizing the character set, a highly specific character set is trained for each type. This can be termed as targeted character recognition.

Proceedings ArticleDOI
18 Sep 2012
TL;DR: This research investigates a features extraction method for handwritten Arabic word recognition that incorporates many characteristics of handwritten characters based on structural information (loops, stems, legs, diacritics).
Abstract: Due to the nature of handwriting with high degree of variability and imprecision, obtaining features that represent words is a difficult task. In this research, a features extraction method for handwritten Arabic word recognition is investigated. Its major goal is to maximize the recognition rate with the least amount of elements. This method incorporates many characteristics of handwritten characters based on structural information (loops, stems, legs, diacritics). Experiments are performed on Arabic personal names extracted from registers of the national Tunisian archive and on some Tunisian city names of IFN-ENIT database. The obtained results presented are encouraging and open other perspectives in the domain of the features and classifiers selection of Arabic Handwritten word recognition.

Proceedings ArticleDOI
18 Sep 2012
TL;DR: A novel text recognition algorithm based on usage of fuzzy logic rules relying on statistical data of the analyzed font is suggested, enabling the recognition of distorted letters that may not be retrieved otherwise.
Abstract: Text recognition and retrieval is a well known problem. Automated optical character recognition (OCR) tools do not supply a complete solution and in most cases human inspection is required. In this paper the authors suggest a novel text recognition algorithm based on usage of fuzzy logic rules relying on statistical data of the analyzed font. The new approach combines letter statistics and correlation coefficients in a set of fuzzy based rules, enabling the recognition of distorted letters that may not be retrieved otherwise. The authors focused on Rashi fonts associated with commentaries of the Bible that are actually handwritten calligraphy.

Patent
24 Apr 2012
TL;DR: In this paper, a document processing method and system divides a document into document pages, and encrypts the document pages by first key to obtain a plurality of encrypted pages; picks a part of words from the documents and encrypt them by second key to get a Significant Word Set (SWS); picks a parts of words (POS) from the picked POS and then encrypts them by third key for obtaining a Most Relevant Word Set(MRWS).
Abstract: A document processing method and system divides a document into document pages, and encrypts the document pages by first key to obtain a plurality of encrypted pages; picks a part of words from the document pages and encrypts them by second key to obtain a Significant Word Set (SWS); picks a part of words from the picked part of words and encrypts them by third key to obtain a Most Relevant Word Set (MRWS). The encrypted pages, the SWS and the MRWS are transmits to a remote server for storage. When user search a keyword in the document, the keyword is encrypted by the second and third keys for performing two query. The first query result is decrypted to obtain the search result. The second query result is decrypted and then checked whether it is a subset of the first decrypted query result for detecting unfaithful execution.

Patent
Jean-Luc Meunier1, Hervé Déjean1
10 Oct 2012
TL;DR: In this paper, a method and system for document processing allow a service provider to process a document without having access to the textual content of the document, without decoding the encoded tokens.
Abstract: A method and system for document processing allow a service provider to process a document without having access the textual content of the document. The system includes memory which receives an encoded source document from an associated client system. The encoded source document includes structural information and encoded content information. The encoded content information includes a plurality of encoded tokens generated by individually encoding each of a plurality of text tokens of the source document. The structural information includes location information for each of the plurality of text tokens. A processing module processes the encoded document to generate a modified document, without decoding the encoded tokens. A transmission module transmits the modified document to an associated client system whereby the client system is able to generate a transformed document based on the modified document and the plurality of text tokens.

Proceedings ArticleDOI
03 Dec 2012
TL;DR: This paper presents a method for recognizing legends in images of ancient coins that accounts for the special challenging conditions ofAncient coins and thus does not rely on character segmentation contrary to traditional Optical Character Recognition methods designed for text written on paper.
Abstract: This paper presents a method for recognizing legends in images of ancient coins. It accounts for the special challenging conditions of ancient coins and thus does not rely on character segmentation contrary to traditional Optical Character Recognition (OCR) methods designed for text written on paper. Instead, characters are detected by means of individual character classifiers applied to a dense grid of local SIFT features. Final word recognition is accomplished using a lexicon of known legend words. For this purpose, the Pictorial Structures approach is adopted to find the most likely word occurrences based on the previously detected characters. Experiments are conducted on a set of 180 coin images from the Roman period with 35 different legend words. Depending on the lexicon size used, the achieved word detection rate varies from 29% to 53%.

Patent
28 Nov 2012
TL;DR: In this paper, the authors proposed a document processing method and system of a document sharing platform, which comprises the following steps of: receiving documents uploaded by users; judging formats of the documents upload by the users; when the documents uploaded to the users are judged to be documents in plain text format, storing the documents inplain text format; and when the files uploaded by the user are judged as documents in nonpla-text format, converting the files in non-pla text format into documents in predetermined format and storing them in the prefixed format.
Abstract: The invention provides a document processing method and system of a document sharing platform. The method comprises the following steps of: receiving documents uploaded by users; judging formats of the documents uploaded by the users; when the documents uploaded by the users are judged to be documents in plain text format, storing the documents in plain text format; and when the documents uploaded by the users are judged to be documents in non-plain text format, converting the documents in non-plain text format into documents in predetermined format and storing the documents in the prefixed format. By using the mode, the advantages that the documents in plain text format have small transmission quantity, clear fonts and the like can be fully utilized and the reading experience of the users can be improved.

Journal ArticleDOI
TL;DR: The Back propagation Neural Network is used for efficient recognition where the errors were corrected through back propagation and rectified neuron values were transmitted by feed-forward method in the neural network of multiple layers.
Abstract: Advancement in Artificial Intelligence has lead to the developments of various "smart" devices. The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Character recognition is one of the most widely used biometric traits for authentication of person as well as document. Optical Character Recognition (OCR) is a type of document image analysis where scanned digital image that contains either machine printed or handwritten script input into an OCR software engine and translating it into an editable machine readable digital text format. A Neural network is designed to model the way in which the brain performs a particular task or function of interest. Each image character is comprised of 30×20 pixels. We have applied feature extraction technique for calculating the feature. Features extracted from characters are directions of pixels with respect to their neighboring pixels. These inputs are given to a back propagation neural network with hidden layer and output layer. We have used the Back propagation Neural Network for efficient recognition where the errors were corrected through back propagation and rectified neuron values were transmitted by feed-forward method in the neural network of multiple layers.

Proceedings ArticleDOI
27 Mar 2012
TL;DR: Results show that the proposed skew estimation is comparable with state-of-the-art methods and outperforms them on a real dataset consisting of 658 snippets.
Abstract: Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document for further processing. A pre-processing step of document analysis methods is a skew estimation of scanned or photographed documents. Current skew estimation methods require the existence of large text areas, are dependent on the text type and can be limited on a specific angle range. The proposed method is gradient based in combination with a Focused Nearest Neighbor Clustering of interest points and has no limitations regarding the detectable angle range. The upside/down decision is based on statistical analysis of ascenders and descenders. It can be applied to entire documents as well as to document fragments containing only a few words. Results show that the proposed skew estimation is comparable with state-of-the-art methods and outperforms them on a real dataset consisting of 658 snippets.