scispace - formally typeset
Journal ArticleDOI

Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

TLDR
A new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic first pre-processes and then classifies textual imaged documents into predefined categories.
Abstract
In recent years, many information retrieval, character recognition, and feature extraction methodologies in Devanagari and especially in Hindi have been proposed for different domain areas. Due to enormous scanned data availability and to provide an advanced improvement of existing Hindi automated systems beyond optical character recognition, a new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic is introduced. This first pre-processes and then classifies textual imaged documents into predefined categories. With this concept, this article depicts a feasibility study of such systems with the relevance of Hindi, a survey report of statistical measurements of Hindi keywords obtained from different sources, and the inherent challenges found in printed and handwritten documents. The technical reviews are provided and graphically represented to compare many parameters and estimate contents, forms and classifiers used in various existing techniques.

read more

Citations
More filters
Journal ArticleDOI

An efficient Devanagari character classification in printed and handwritten documents using SVM

TL;DR: An efficient Devanagari character classification model using SVM for printed and handwritten mono-lingual Hindi, Sanskrit and Marathi documents, which first preprocesses the image, segments it through projection profiles, removes shirorekha, extracts features, and then classifies the shirorikha-less characters into pre-defined character categories.
Journal ArticleDOI

Histogram of Oriented Gradients Based Off-Line Handwritten Devanagari Characters Recognition Using SVM, K-NN and NN Classifiers

TL;DR: Handwritten Devanagari Character Recognition System (HDCRS) is used to recognize isolated characters by making use of SVM, K-NN and NN well known classifiers.
Journal ArticleDOI

On Exhaustive Evaluation of Eager Machine Learning Algorithms for Classification of Hindi Verses

TL;DR: Text classification algorithms along with Natural Language Processing (NLP) facilitates fast, cost-effective, and scalable solution for classification and prediction of verses on Hindi corpus.
Book ChapterDOI

A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text

TL;DR: A proposed hybrid feature selection approach (HFSA) for text classification of Urdu news articles is presented, which incorporates widely used filter selection approaches along with Latent Semantic Indexing (LSI) to extract essential features ofUrdu documents.
Journal ArticleDOI

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

TL;DR: A new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively.
References
More filters
Proceedings ArticleDOI

An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)

TL;DR: An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent, and shows a good performance for single font scripts printed on clear documents.
Journal ArticleDOI

Offline Recognition of Devanagari Script: A Survey

TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Book ChapterDOI

Recognition of off-line handwritten devnagari characters using quadratic classifier

TL;DR: A quadratic classifier based scheme for the recognition of off-line Devnagari handwritten characters using chain code information of the contour points of the characters and using five-fold cross-validation technique for result computation.
Journal ArticleDOI

Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis

TL;DR: A new technique is presented for identification and segmentation of touching characters based on fuzzy multifactorial analysis and a predictive algorithm is developed for effectively selecting possible cut columns for segmenting the touching characters.
Proceedings ArticleDOI

Comparative Study of Devnagari Handwritten Character Recognition Using Different Feature and Classifiers

TL;DR: A comparative study of Devnagari handwritten character recognition using twelve different classifiers and four sets of feature is presented to provide new benchmark for future research.