scispace - formally typeset
Search or ask a question
Author

Yousif A. Alhaj

Bio: Yousif A. Alhaj is an academic researcher from Wuhan University of Technology. The author has contributed to research in topics: Support vector machine & Feature selection. The author has an hindex of 3, co-authored 6 publications receiving 57 citations.

Papers
More filters
Journal ArticleDOI
29 Jul 2019-Sensors
TL;DR: A comprehensive survey of recent advances in the CSI-based sensing mechanism is presented and illustrates the drawbacks, discusses challenges, and presents some suggestions for the future of device-free sensing technology.
Abstract: Human motion detection and activity recognition are becoming vital for the applications in smart homes. Traditional Human Activity Recognition (HAR) mechanisms use special devices to track human motions, such as cameras (vision-based) and various types of sensors (sensor-based). These mechanisms are applied in different applications, such as home security, Human–Computer Interaction (HCI), gaming, and healthcare. However, traditional HAR methods require heavy installation, and can only work under strict conditions. Recently, wireless signals have been utilized to track human motion and HAR in indoor environments. The motion of an object in the test environment causes fluctuations and changes in the Wi-Fi signal reflections at the receiver, which result in variations in received signals. These fluctuations can be used to track object (i.e., a human) motion in indoor environments. This phenomenon can be improved and leveraged in the future to improve the internet of things (IoT) and smart home devices. The main Wi-Fi sensing methods can be broadly categorized as Received Signal Strength Indicator (RSSI), Wi-Fi radar (by using Software Defined Radio (SDR)) and Channel State Information (CSI). CSI and RSSI can be considered as device-free mechanisms because they do not require cumbersome installation, whereas the Wi-Fi radar mechanism requires special devices (i.e., Universal Software Radio Peripheral (USRP)). Recent studies demonstrate that CSI outperforms RSSI in sensing accuracy due to its stability and rich information. This paper presents a comprehensive survey of recent advances in the CSI-based sensing mechanism and illustrates the drawbacks, discusses challenges, and presents some suggestions for the future of device-free sensing technology.

49 citations

Journal ArticleDOI
TL;DR: Findings of this paper indicate that the ARLStem outperforms the ISRI and Tashaphyne stemmers, and clearly showed the effectiveness of the SVM over the KNN and NB classifiers.
Abstract: Stemming is one of the most effective techniques, which has been adopted in many applications, such as machine learning, machine translation, document classification (DC), information retrieval, and natural language processing. The stemming technique is meant to be applied during the classification of documents to reduce the high dimensionality of the feature space, which, in turn, raises the functioning of the classification system, particularly with extreme modulated language, for instance, Arabic language. This paper aims to study the impact of stemming techniques, namely Information Science Research Institute (ISRI), Tashaphyne, and ARLStem on Arabic DC. The classification algorithms, namely Naive Bayesian (NB), support vector machine (SVM), and K-nearest neighbors (KNN), are used in this paper. In addition, the chi-square feature selection is used to select the most relevant features. Experiments are conducted on CNN Arabic corpus, which is collected from Arabic websites to assess the performance of the classification system. In order to evaluate the classifiers, the K-fold cross-validation method and Micro-F1 are used. Findings of this paper indicate that the ARLStem outperforms the ISRI and Tashaphyne stemmers. The outcomes clearly showed the effectiveness of the SVM over the KNN and NB classifiers, which achieved 94.64% Micro-F1 value when using the ARLStem stemmer.

47 citations

Journal ArticleDOI
TL;DR: A novel text classification model, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space.
Abstract: We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

10 citations

Proceedings ArticleDOI
28 Nov 2018
TL;DR: Results show that SVM classifier was upgraded to KNN and NB classifiers using the TF-IDF representation approach and that the NB classifier outperformed the KNN-nearest Neighbor classifiers when using the representation approach in Bow.
Abstract: This paper is based on the influence of the frequency of words in the classification of Arabic documents, its effects on the representation of characteristics namely Bag of word (Bow) and Term frequency- Inverse Documents Frequency (TF-IDF). Three classification techniques are being discussed, namely Naive Bayes (NB), k-nearest Neighbor (KNN) and Support Vector Machine (SVM). The Chi-square is used as a selection function to select essential features and remove unnecessary features. An experiment in the classification of Arab documents of public data collected from Arab sites, namely the CNN Arabic Corpus, to study the performance of the classification. The K-fold to validate the classifier and The F1-Micro to test the classifier. Recent results show that SVM classifier was upgraded to KNN and NB classifiers using the TF-IDF representation approach and that the NB classifier outperformed the KNN and SVM classifiers when using the representation approach in Bow. The SVM and NB classifiers attached 94.38% and 93.47% Micro-F1 are worth eliminating the word.

8 citations

Book ChapterDOI
01 Jan 2020
TL;DR: This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification and results show that SVM outperforms LR and KNN.
Abstract: This chapter aims to study the effects of the light stemming technique on feature extraction where Bag of Words (BoW) and Term frequency- Inverse Documents (TF-IDF) are employed for Arabic document classification. Moreover, feature selection methods such as Chi-square (Chi2), Information gain (IG), and singular value decomposition (SVD) are used to select the most relevant features. K-nearest Neighbor (kNN), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers are used to build the classification model. Experiment are conducted using a public data collected from Arab websites, namely, BBC Arabic dataset. Experiment results show that SVM outperforms LR and KNN. Furthermore, BoW outperforms TF-IDF without using a stemming technique. Using a Robust Arabic Light Stemmer (ARLStem) as our main light stemmer shows a positive effect when combined with TF-IDF over the baseline. In the experiment where Chi2 is used as the feature selection technique, SVM resulted in 0.9568% F1-micro using BoW to extract the features from the dataset where 5000 relevant features were selected. In the experiment where IG is used as the feature selection method, SVM achieved 0.9588% F1-micro with BoW and 4000 selected features. Finally in the experiment where SVD is used as the feature selection technique, SVM reached 0.9569% F1-micro when using BoW and 5000 relevant feature were selected. The aforementioned experiments report the best results achieved where stemming is not employed.

4 citations


Cited by
More filters
01 Jan 2014
TL;DR: This article surveys the new trend of channel response in localization and investigates a large body of recent works and classify them overall into three categories according to how to use CSI, highlighting the differences between CSI and RSSI.
Abstract: The spatial features of emitted wireless signals are the basis of location distinction and determination for wireless indoor localization. Available in mainstream wireless signal measurements, the Received Signal Strength Indicator (RSSI) has been adopted in vast indoor localization systems. However, it suffers from dramatic performance degradation in complex situations due to multipath fading and temporal dynamics. Break-through techniques resort to finer-grained wireless channel measurement than RSSI. Different from RSSI, the PHY layer power feature, channel response, is able to discriminate multipath characteristics, and thus holds the potential for the convergence of accurate and pervasive indoor localization. Channel State Information (CSI, reflecting channel response in 802.11 a/g/n) has attracted many research efforts and some pioneer works have demonstrated submeter or even centimeter-level accuracy. In this article, we survey this new trend of channel response in localization. The differences between CSI and RSSI are highlighted with respect to network layering, time resolution, frequency resolution, stability, and accessibility. Furthermore, we investigate a large body of recent works and classify them overall into three categories according to how to use CSI. For each category, we emphasize the basic principles and address future directions of research in this new and largely open area.

612 citations

Journal ArticleDOI
03 Oct 2020-Sensors
TL;DR: This study outlines the advantages and disadvantages and a breakdown of the methods applied in the current state-of-the-art approaches to detect COVID-19 and highlights some future research directions, which need to be explored further to produce innovative technologies to control this pandemic.
Abstract: COVID-19, caused by SARS-CoV-2, has resulted in a global pandemic recently. With no approved vaccination or treatment, governments around the world have issued guidance to their citizens to remain at home in efforts to control the spread of the disease. The goal of controlling the spread of the virus is to prevent strain on hospitals. In this paper, we focus on how non-invasive methods are being used to detect COVID-19 and assist healthcare workers in caring for COVID-19 patients. Early detection of COVID-19 can allow for early isolation to prevent further spread. This study outlines the advantages and disadvantages and a breakdown of the methods applied in the current state-of-the-art approaches. In addition, the paper highlights some future research directions, which need to be explored further to produce innovative technologies to control this pandemic.

61 citations

Journal ArticleDOI
TL;DR: Two complementary reviews of computational linguistics and organizational text mining research are conducted to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted, the research question under investigation, and the data set’s characteristics.
Abstract: Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often...

60 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a new feature selection method using a modified Slime mold algorithm (SMA) based on the firefly algorithm (FA), which is adopted to improve the exploration of SMA, since it has high ability to discover the feasible regions which have optimal solution.
Abstract: Feature selection (FS) methods are necessary to develop intelligent analysis tools that require data preprocessing and enhancing the performance of the machine learning algorithms. FS aims to maximize the classification accuracy by minimizing the number of selected features. This paper presents a new FS method using a modified Slime mould algorithm (SMA) based on the firefly algorithm (FA). In the developed SMAFA, FA is adopted to improve the exploration of SMA, since it has high ability to discover the feasible regions which have optima solution. This will lead to enhance the convergence by increasing the quality of the final output. SMAFA is evaluated using twenty UCI datasets and also with comprehensive comparisons to a number of the existing MH algorithms. To further assess the applicability of SMAFA, two high-dimensional datasets related to the QSAR modeling are used. Experimental results verified the promising performance of SMAFA using different performance measures.

60 citations