scispace - formally typeset
Search or ask a question
Author

Salma Mahgoub

Bio: Salma Mahgoub is an academic researcher. The author has an hindex of 1, co-authored 1 publications receiving 5 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A simple framework using different variables is proposed which helps in predicting student’s academic success using two different algorithms: Decision Trees and Bayesian Network.
Abstract: In this data world, where users spawn their digital footprint and generate a huge amount of unstructured data continuously with each activity, data mining techniques help in discovering interesting patterns, establishing relationships and unravel the problems through analysis, in different aspects of life. Educational data mining is a multidisciplinary research area, in which data from various educational organizations, is explored and made operational, for various facets concerned with the students, like predicting academic performance, analyse the learning pattern, solving e-learning issues, predict employability, visualize the critical courses affecting performance, investigate the reasons for student’s failure or drop out and thus make data-driven decisions to improve the institutions standards. This paper provides a brief overview of Data Mining tools and techniques, and its encroachment in the educational domain. It also proposes a simple framework using different variables which helps in predicting student’s academic success using two different algorithms: Decision Trees and Bayesian Network. Finally, a comparative analysis of accuracy is done. The results show that Bayesian Network outperforms the Decision Tress and gives better accuracy.

5 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class, and confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.
Abstract: An imbalanced dataset is commonly found in at least one class, which are typically exceeded by the other ones. A machine learning algorithm (classifier) trained with an imbalanced dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalanced dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance classifiers’ performance. In this study, we consider an imbalanced dataset from an educational context. Initially, we examine all shortcomings regarding the classification of an imbalanced dataset. Then, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifiers is measured using the underlying information in their confusion matrices, such as accuracy, precision, recall, and F measure. The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class. The analysis confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.

24 citations

Proceedings ArticleDOI
19 Feb 2021
TL;DR: The proposed model uses student related data collected by means of questionnaires given to parents, students and base registers which contain teachers input, which in turn is consolidated as the dataset and yields high accuracy rate.
Abstract: Educational data mining is of immense value in the analysis, tracking and prediction of student performance. A proper analysis and prediction model enables Educational Institutions to monitor the progress made by each student and to adopt suitable and timely corrective measures. The proposed model uses student related data collected by means of questionnaires given to parents, students and base registers which contain teachers input, which in turn is consolidated as the dataset. Feature selection is performed using Pearson’s correlation to find the most relevant fields. The prediction model uses Linear Regression Algorithm for predicting performances. The model uses 5 target data for performance prediction and the best target data is used for classification. This work is done using highly relevant, related, direct and indirect data. The new methodology used here is Best Prediction Model (BPM). It selects the best model through prediction and that model is used for classification. Classification is done using Support Vector Machine (SVM) Classifier. The proposed model yields comparatively better accuracy. The major highlights of this work is, it uses completely fresh dataset and yields high accuracy rate. Using this model the performance can be tracked over each semester, hence it ensures continuous and cumulative monitoring of the students. The model is beneficial to both the students and the institution as well.

2 citations

23 Jun 2021
TL;DR: A hybrid algorithm of principal component analysis (HPCA) in conjunction with four machines learning (ML) algorithms: random forest (RF), support vector machine (SVM), naive Bayes (NB) of Bayes network and C5.0 of decision tree (DT) is introduced in this paper so that there is always an improvement in the performances of classification.
Abstract: Data mining and its applications are ubiquitous for business purposes since its beginning. Data mining techniques are used by many fields for knowledge discovery as well as for strategic decisions. However, in the present era, some new and emerging areas like education systems are also using data mining successfully to discover meaningful patterns from the pool of data. The primary focus of all academic institutions is the prediction of student's academic performance. To achieve this, educational data mining (EDM) is used. All over the world, educational data mining (EDM) is gaining popularity among the researchers because of its need and importance for the society. To handle the complexity of large volume of educational institutions data, various informative technologies are used. Machine learning is used by many researchers to mine knowledge from the educational database for the improvement in students and instructor’s performance. The most challenging task in prediction models is to select the efficient technique by which satisfactorily results can be produced. A hybrid algorithm of principal component analysis (HPCA) in conjunction with four machines learning (ML) algorithms: random forest (RF), support vector machine (SVM), naive Bayes (NB) of Bayes network and C5.0 of decision tree (DT)is introduced in this paper so that there is always an improvement in the performances of classification. We evaluated our proposed model on three datasets taken from kaggle. In this paper, assessment metrics of the proposed model are classification accuracy, root mean square error (RSME), precision and recall. 10-fold cross-validation is also applied on these datasets for the evaluation of predictive performance. The proposed algorithm produced satisfactorily results of prediction which shows that HPCA is best for the optimal prediction method to get good result.