scispace - formally typeset
Search or ask a question
Author

Moqsadur Rahman

Bio: Moqsadur Rahman is an academic researcher. The author has contributed to research in topics: Deep learning & Bengali. The author has an hindex of 1, co-authored 1 publications receiving 3 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Bangla news which has been collected from newspapers and gathered around to make a Bengali Corpus is collected and classified using baseline and deep learning models of Machine Learning.
Abstract: Today’s universe is the type of world where everyone thrives to live in virtual life. According to the perspective of the present time, the online news portal holds a major door to that gradually increasing greedy life. So around the globe, the various platform has been developed to fulfill the requirement of mankind. A heavy load of work has been carried out for making this platform autonomous in the English language. That’s why the machine learning approach is quite a fully developed field in English in news classification. But it can't be said the same for Bangla language. These put in the inspiration to do a research on this topic. So, here Bangla news which has been collected from newspapers and gathered around to make a Bengali Corpus. After preprocessing the news text, different sorts of procedures to classify the news text using baseline and deep learning models of Machine Learning are applied.

8 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A modified stop-word set is developed and applied in the preprocessing stage which leads to significant improvement in the performance and shows that the Multi-layer Neural network, Naive Bayes and support vector machine provide better performance.
Abstract: Online and offline newspaper articles have become an integral phenomenon to our society. News articles have a significant impact on our personal and social activities but picking a piece of an appropriate news article is a challenging task for users from the ocean of sources. Recommending the appropriate news category helps find desired articles for the readers but categorizing news article manually is laborious, sluggish and expensive. Moreover, it gets more difficult when considering a resource-insufficient language like Bengali which is the fourth most spoken language of the world. However, very few approaches have been proposed for categorizing Bangla news articles where few machine learning algorithms were applied with limited resources. In this paper, we accentuate multiple machine learning approaches including a neural network to categorize Bangla news articles for two different datasets. News articles have been collected from the popular Bengali newspaper Prothom Alo to build Dataset I and dataset II has been gathered from the famous machine learning competition platform Kaggle. We develop a modified stop-word set and apply it in the preprocessing stage which leads to significant improvement in the performance. Our result shows that the Multi-layer Neural network, Naive Bayes and support vector machine provide better performance. Accuracy of 94.99%, 94.60%, 95.50% has been achieved for SVM, Logistic regression and Multi-layer dense Neural network, respectively.

4 citations

Book ChapterDOI
01 Jan 2008
TL;DR: The fundamentals of SVMs are discussed with emphasis to multiclass classification problems and applications in science, business and engineering.
Abstract: Support Vector Machines (SVMs) methods have become a popular tool for predictive data mining problems and novelty detection. They show good generalization performance on many real-life datasets and they are motivated theoretically through convex programming formulations. There are relatively few free parameters to adjust using cross validation and the architecture of the SVM learning machine does not need to be found by experimentation as in the case of Artificial Neural Networks (ANNs). We discuss the fundamentals of SVMs with emphasis to multiclass classification problems and applications in science, business and engineering.

3 citations

Proceedings ArticleDOI
27 Aug 2021
TL;DR: In this article, some supervised machine learning approaches and deep learning approaches have been proposed for classifying Bengali news documents and they have used an open dataset for their work which contains more than three hundred thousand (3, 76, 211) Bengali text documents.
Abstract: News is newly received remarkable facts about current phenomenon. Miscellaneous facts are constantly happening in this world. Mass media helps to reach these facts to the common folks widely. As we are pushed forward to modern world, getting a convenient environment, Bengali mass media are also leaning towards digital platforms. In this article, some supervised machine learning approaches and deep learning approaches have been proposed for classifying Bengali news documents. We have used an open dataset for our work which contains more than three hundred thousand (3, 76, 211) Bengali text documents. Removing stop-words, dropping duplicate data, tokenizing, stemming etc have been commonly done as preprocessing steps. Bag-of-Words with TF-IDF and some Word Embedding approaches - Average Word2Vec, Glove & fastText have been used for feature extraction. We have trained our text corpus using supervised machine learning method and Deep learning method. Significantly, among these models, Support Vector Machine with average Word2Vec has achieved 97% accuracy and Bidirectional LSTM has achieved 96% accuracy.

3 citations

Proceedings ArticleDOI
26 Feb 2022
TL;DR: This research work has taken data from open resources which are containing the Bangla newspaper article corpus with 12 categories and used deep learning methods: Convolutional Neural Network (CNN), Artificial Neural network (ANN), Long Short-term memory (LSTM) and Hybrid (CNN+Bi-L STM) methods to classify them based on their contents.
Abstract: With the evolution of the internet and digital network, the amount of textual data is increasing gradually. As a result, it becomes increasingly difficult to categorize vast amounts of data manually. But, with the benefit of the machine learning process, we can categorize these massive amounts of text data automatically according to their contents. Compared with other languages such as English, text categorization in Bangla is one of the challenging tasks. Because a lack of resources is the reason for this. In this research work, we take data from open resources which are containing the Bangla newspaper article corpus with 12 categories. For categorizing these article datasets, we have used word embedding methods to extract features from raw data and then used deep learning methods: Convolutional Neural Network (CNN), Artificial Neural Network (ANN), Long Short-term memory (LSTM) and Hybrid (CNN+Bi-LSTM) methods to classify them based on their contents. Next, we have analyzed the performance of these deep learning based classification models and looked into the reasons for it. Furthermore, the hybrid model has a greater accuracy of 88.56% in 10 categories and 84.93% in 12 categories. Lastly, the limitations of this study as well as future prospects have been discussed.

2 citations

Proceedings ArticleDOI
16 May 2022
TL;DR: Six types of Machine learning algorithms were used in order to classify the crime news in Bangladesh using online Bangla newspapers and TV channels using Web Scraper and two types of feature extractors have been used including CountVectorizer and TfidfVectorizer.
Abstract: The methodical approach to crime detection, crime pattern classification and crime tendency guessing is called crime analysis and prediction. Crime is naturally unpredictable and socially disruptive. With the increase in the population of Bangladesh, the tendency of crime is also increasing, which is destroying our society in various ways. Therefore, crime data analysis has become essential in order to predict future crime types. In our research paper, six types of Machine learning algorithms were used in order to classify the crime news. Crime news were fetched from online Bangla newspapers and TV channels using Web Scraper. In order to extract the features (important words), two types of feature extractors have been used including CountVectorizer and TfidfVectorizer where CountVectorizer was from a well-known python pre-trained package named BnVec. Accuracies of 87.69% and 86.09% were found from the Logistic Regression and SVM models respectively. Besides, Logistic regression provided less false negative with 86.65% recall and 86.58% F1-score. This research has a potential to be used to prevent crime and to apprehend, investigate and prosecute the criminals.