An Efficient Hindi Text Classification Model Using SVM

doi:10.1007/978-981-13-7150-9_24

Book ChapterDOI

An Efficient Hindi Text Classification Model Using SVM

Shalini Puri, +1 more

- pp 227-237

Chats0

TLDR

A Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies aSet of Hindi unknown documents.

Abstract:

In today’s world, several digitized Hindi text documents are generated daily at the Government sites, news portals, and public and private sectors, which are required to be classified effectively into various mutually exclusive pre-defined categories. As such, many Hindi text-based processing systems exist in application domains of information retrieval, machine translation, text summarization, simplification, keyword extraction, and other related parsing and linguistic perspectives, but still, there is a wide scope to classify the extracted text of Hindi documents into pre-defined categories using a classifier. In this paper, a Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies a set of Hindi unknown documents. Such text classification becomes challenging in Hindi due to its large set of available conjuncts and letter combinations, its sentence structure, and multisense words. The experiments have been performed on a set of four Hindi documents of two categories, which have been classified by SVM with 100% accuracy.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques

Kaushika Pal, +1 more

TL;DR: The result of the experiment shows that the results of SVM, NB and random forest methods are better as compared to DTT and K-NN for used data set available in this experiment.

...read moreread less

Proceedings ArticleDOI

Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques

Kaushika Pal, +1 more

TL;DR: Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.

...read moreread less

Proceedings ArticleDOI

A Comprehensive Study for the Hindi Language to Implement Supervised Text Classification Techniques

Vijay Kumar Soni, +1 more

TL;DR: The authors used the Hindi language resource for general news headlines from several news sources and used machine learning (ML) classification methods such as Random Forest (RF), Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) for text categorization.

...read moreread less

Journal ArticleDOI

ARTC: feature selection using association rules for text classification

Mozamel M. Saeed, +1 more

- 07 Sep 2022 -

Neural Computing and Applications

Journal ArticleDOI

Multi - Class Document Classification : Effective and Systematized Method to Categorize Documents

Kaushika Pal, +1 more

- 14 Feb 2020 -

International journal of scientific rese...

TL;DR: This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents that is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation

Satyendr Singh, +1 more

TL;DR: The effects of stemming, stop word removal and size of context window on Hindi word sense disambiguation and the % improvement in precision and recall is 9.24% and 12.68% over the baseline performance.

...read moreread less

Journal ArticleDOI

A Journey from Indian Scripts Processing to Indian Language Processing

R.M.K. Sinha

- 01 Jan 2009 -

IEEE Annals of the History of Computing

TL;DR: This overview examines the historical development of mechanizing Indian scripts and the computer processing of Indian languages and the challenges involved in their design and in exploiting their structural similarity that lead to a unified solution.

...read moreread less

Proceedings ArticleDOI

Mining Complex Predicates In Hindi Using A Parallel Hindi-English Corpus

R. Mahesh K. Sinha

TL;DR: A simple method for detecting CPs of all kinds using a Hindi-English parallel corpus with an average precision of 89% and a recall of 90% is presented.

...read moreread less

Proceedings ArticleDOI

Test model for summarizing hindi text using extraction method

Chetana Thaokar, +1 more

TL;DR: The idea to summarize Hindi text documents using sentence extraction method is discussed, which uses Hindi Wordnet to tag appropriate POS of word for checking SOV of the sentence and genetic algorithm to optimize the summary generated based on the text feature terms which will cover maximum theme with less redundancy.

...read moreread less

Journal ArticleDOI

Approaches to Temporal Expression Recognition in Hindi

Nitin Ramrakhiyani, +1 more

TL;DR: In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed, and a reusable gold standard dataset for temporal tagging in Hindi is developed.

...read moreread less

Collapse

An Efficient Hindi Text Classification Model Using SVM

Citations

Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques

Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques

A Comprehensive Study for the Hindi Language to Implement Supervised Text Classification Techniques

ARTC: feature selection using association rules for text classification

Multi - Class Document Classification : Effective and Systematized Method to Categorize Documents

References

Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation

A Journey from Indian Scripts Processing to Indian Language Processing

Mining Complex Predicates In Hindi Using A Parallel Hindi-English Corpus

Test model for summarizing hindi text using extraction method

Approaches to Temporal Expression Recognition in Hindi

Related Papers (5)

The Effect of Stemming on Arabic Text Classification

The Effect of Stemming on Arabic Text Classification: An Empirical Study

A comprehensive study of text classification algorithms

Arabic Text Classification Using Maximum Entropy

Application of TF-IDF Feature for Categorizing Documents of Online Bangla Web Text Corpus