scispace - formally typeset
Book ChapterDOI

An Efficient Hindi Text Classification Model Using SVM

Reads0
Chats0
TLDR
A Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies aSet of Hindi unknown documents.
Abstract
In today’s world, several digitized Hindi text documents are generated daily at the Government sites, news portals, and public and private sectors, which are required to be classified effectively into various mutually exclusive pre-defined categories. As such, many Hindi text-based processing systems exist in application domains of information retrieval, machine translation, text summarization, simplification, keyword extraction, and other related parsing and linguistic perspectives, but still, there is a wide scope to classify the extracted text of Hindi documents into pre-defined categories using a classifier. In this paper, a Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies a set of Hindi unknown documents. Such text classification becomes challenging in Hindi due to its large set of available conjuncts and letter combinations, its sentence structure, and multisense words. The experiments have been performed on a set of four Hindi documents of two categories, which have been classified by SVM with 100% accuracy.

read more

Citations
More filters
Proceedings ArticleDOI

Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques

TL;DR: The result of the experiment shows that the results of SVM, NB and random forest methods are better as compared to DTT and K-NN for used data set available in this experiment.
Proceedings ArticleDOI

Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques

TL;DR: Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.
Proceedings ArticleDOI

A Comprehensive Study for the Hindi Language to Implement Supervised Text Classification Techniques

TL;DR: The authors used the Hindi language resource for general news headlines from several news sources and used machine learning (ML) classification methods such as Random Forest (RF), Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) for text categorization.
Journal ArticleDOI

Multi - Class Document Classification : Effective and Systematized Method to Categorize Documents

TL;DR: This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents that is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.
References
More filters
Proceedings ArticleDOI

Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation

TL;DR: The effects of stemming, stop word removal and size of context window on Hindi word sense disambiguation and the % improvement in precision and recall is 9.24% and 12.68% over the baseline performance.
Journal ArticleDOI

A Journey from Indian Scripts Processing to Indian Language Processing

TL;DR: This overview examines the historical development of mechanizing Indian scripts and the computer processing of Indian languages and the challenges involved in their design and in exploiting their structural similarity that lead to a unified solution.
Proceedings ArticleDOI

Mining Complex Predicates In Hindi Using A Parallel Hindi-English Corpus

TL;DR: A simple method for detecting CPs of all kinds using a Hindi-English parallel corpus with an average precision of 89% and a recall of 90% is presented.
Proceedings ArticleDOI

Test model for summarizing hindi text using extraction method

TL;DR: The idea to summarize Hindi text documents using sentence extraction method is discussed, which uses Hindi Wordnet to tag appropriate POS of word for checking SOV of the sentence and genetic algorithm to optimize the summary generated based on the text feature terms which will cover maximum theme with less redundancy.
Journal ArticleDOI

Approaches to Temporal Expression Recognition in Hindi

TL;DR: In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed, and a reusable gold standard dataset for temporal tagging in Hindi is developed.