Book ChapterDOI
An Efficient Hindi Text Classification Model Using SVM
Shalini Puri,Satya Prakash Singh +1 more
- pp 227-237
Reads0
Chats0
TLDR
A Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies aSet of Hindi unknown documents.Abstract:
In today’s world, several digitized Hindi text documents are generated daily at the Government sites, news portals, and public and private sectors, which are required to be classified effectively into various mutually exclusive pre-defined categories. As such, many Hindi text-based processing systems exist in application domains of information retrieval, machine translation, text summarization, simplification, keyword extraction, and other related parsing and linguistic perspectives, but still, there is a wide scope to classify the extracted text of Hindi documents into pre-defined categories using a classifier. In this paper, a Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies a set of Hindi unknown documents. Such text classification becomes challenging in Hindi due to its large set of available conjuncts and letter combinations, its sentence structure, and multisense words. The experiments have been performed on a set of four Hindi documents of two categories, which have been classified by SVM with 100% accuracy.read more
Citations
More filters
Proceedings ArticleDOI
Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques
Kaushika Pal,Biraj V. Patel +1 more
TL;DR: The result of the experiment shows that the results of SVM, NB and random forest methods are better as compared to DTT and K-NN for used data set available in this experiment.
Proceedings ArticleDOI
Automatic Multiclass Document Classification of Hindi Poems using Machine Learning Techniques
Kaushika Pal,Biraj V. Patel +1 more
TL;DR: Experiments shows that Naïve Bayes with 64% accuracy and Random Forest with 56% are performing better as compared to other algorithms for Hindi Poem Classification.
Proceedings ArticleDOI
A Comprehensive Study for the Hindi Language to Implement Supervised Text Classification Techniques
Vijay Kumar Soni,Smita Selot +1 more
TL;DR: The authors used the Hindi language resource for general news headlines from several news sources and used machine learning (ML) classification methods such as Random Forest (RF), Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) for text categorization.
Journal ArticleDOI
ARTC: feature selection using association rules for text classification
Journal ArticleDOI
Multi - Class Document Classification : Effective and Systematized Method to Categorize Documents
Kaushika Pal,Biraj V. Patel +1 more
TL;DR: This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents that is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.
References
More filters
Proceedings ArticleDOI
Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation
TL;DR: The effects of stemming, stop word removal and size of context window on Hindi word sense disambiguation and the % improvement in precision and recall is 9.24% and 12.68% over the baseline performance.
Journal ArticleDOI
A Journey from Indian Scripts Processing to Indian Language Processing
TL;DR: This overview examines the historical development of mechanizing Indian scripts and the computer processing of Indian languages and the challenges involved in their design and in exploiting their structural similarity that lead to a unified solution.
Proceedings ArticleDOI
Mining Complex Predicates In Hindi Using A Parallel Hindi-English Corpus
TL;DR: A simple method for detecting CPs of all kinds using a Hindi-English parallel corpus with an average precision of 89% and a recall of 90% is presented.
Proceedings ArticleDOI
Test model for summarizing hindi text using extraction method
Chetana Thaokar,Latesh Malik +1 more
TL;DR: The idea to summarize Hindi text documents using sentence extraction method is discussed, which uses Hindi Wordnet to tag appropriate POS of word for checking SOV of the sentence and genetic algorithm to optimize the summary generated based on the text feature terms which will cover maximum theme with less redundancy.
Journal ArticleDOI
Approaches to Temporal Expression Recognition in Hindi
TL;DR: In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed, and a reusable gold standard dataset for temporal tagging in Hindi is developed.