scispace - formally typeset
Search or ask a question
Author

Mohammed Nasser

Bio: Mohammed Nasser is an academic researcher from University of A Coruña. The author has contributed to research in topics: Outlier & Regression analysis. The author has an hindex of 12, co-authored 33 publications receiving 445 citations. Previous affiliations of Mohammed Nasser include University of Rajshahi & University of Malaya.

Papers
More filters
Journal ArticleDOI
TL;DR: This work has built two models for the classification purpose, one is based on Support Vector Machines (SVM) and the other is Random Forests (RF), and Experimental results show that either classifier is effective.
Abstract: The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with many features. To get rid of this problem, several types of intrusion detection methods have been proposed and shown different levels of accuracy. This is why the choice of the effective and robust method for IDS is very important topic in information security. In this work, we have built two models for the classification purpose. One is based on Support Vector Machines (SVM) and the other is Random Forests (RF). Experimental results show that either classifier is effective. SVM is slightly more accurate, but more expensive in terms of time. RF produces similar accuracy in a much faster manner if given modeling parameters. These classifiers can contribute to an IDS system as one source of analysis and increase its accuracy. In this paper, KDD’99 Dataset is used and find out which one is the best intrusion detector for this dataset. Statistical analysis on KDD’99 dataset found important issues which highly affect the performance of evaluated systems and results in a very poor evaluation of anomaly detection approaches. The most important deficiency in the KDD’99 dataset is the huge number of redundant records. To solve these issues, we have developed a new dataset, KDD99Train+ and KDD99Test+, which does not include any redundant records in the train set as well as in the test set, so the classifiers will not be biased towards more frequent records. The numbers of records in the train and test sets are now reasonable, which make it affordable to run the experiments on the complete set without the need to randomly select a small portion. The findings of this paper will be very useful to use SVM and RF in a more meaningful way in order to maximize the performance rate and minimize the false negative rate.

131 citations

Journal ArticleDOI
TL;DR: Results show that the Random Forest based proposed approach can select most important and relevant features useful for classification, which reduces not only the number of input features and time but also increases the classification accuracy.
Abstract: An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the organization. It deals with large amount of data, which contains various ir-relevant and redundant features and results in increased processing time and low detection rate. Therefore, feature selection should be treated as an indispensable pre-processing step to improve the overall system performance significantly while mining on huge datasets. In this context, in this paper, we focus on a two-step approach of feature selection based on Random Forest. The first step selects the features with higher variable importance score and guides the initialization of search process for the second step whose outputs the final feature subset for classification and in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset, so the classifiers and feature selection method will not be biased towards more frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The experimental results show that the Random Forest based proposed approach can select most im-portant and relevant features useful for classification, which, in turn, reduces not only the number of input features and time but also increases the classification accuracy.

88 citations

Journal ArticleDOI
TL;DR: The NORMAN SCORE “SARS-CoV-2 in sewage” database provides a platform for rapid, open access data sharing, validated by the uploading of 276 data sets from nine countries to-date and is a resource for the development of recommendations on minimum data requirements for wastewater pathogen surveillance.

43 citations

Posted ContentDOI
16 Nov 2020-medRxiv
TL;DR: In this paper, statistical regression models from the viral load detected in the wastewater and the epidemiological data from A Coruna health system that allowed us to estimate the number of infected people, including symptomatic and asymptomatic individuals, with reliability close to 90%, were developed.
Abstract: The quantification of the SARS-CoV-2 RNA load in wastewater has emerged as a useful tool to monitor COVID-19 outbreaks in the community. This approach was implemented in the metropolitan area of A Coruna (NW Spain), where wastewater from a treatment plant was analyzed to track the epidemic dynamics in a population of 369,098 inhabitants. Statistical regression models from the viral load detected in the wastewater and the epidemiological data from A Coruna health system that allowed us to estimate the number of infected people, including symptomatic and asymptomatic individuals, with reliability close to 90%, were developed. These models can help to understand the real magnitude of the epidemic in a population at any given time and can be used as an effective early warning tool for predicting outbreaks. The methodology of the present work could be used to develop a similar wastewater-based epidemiological model to track the evolution of the COVID-19 epidemic anywhere in the world.

28 citations

Journal ArticleDOI
TL;DR: The finite mixture of ARMA-GARCH model is applied instead of AR or ARMA models to compare with the standard BP and SVM in forecasting financial time series (daily stock market index returns and exchange rate returns) and shows that the SVM model shows long memory property in forecastingFinancial returns.
Abstract: The use of GARCH type models and computational-intelligence-based techniques for forecasting financial time series has been proved extremely successful in recent times. In this article, we apply the finite mixture of ARMA-GARCH model instead of AR or ARMA models to compare with the standard BP and SVM in forecasting financial time series (daily stock market index returns and exchange rate returns). We do not apply the pure GARCH model as the finite mixture of the ARMA-GARCH model outperforms the pure GARCH model. These models are evaluated on five performance metrics or criteria. Our experiment shows that the SVM model outperforms both the finite mixture of ARMA-GARCH and BP models in deviation performance criteria. In direction performance criteria, the finite mixture of ARMA-GARCH model performs better. The memory property of these forecasting techniques is also examined using the behavior of forecasted values vis-a-vis the original values. Only the SVM model shows long memory property in forecasting fi...

26 citations


Cited by
More filters
Journal ArticleDOI
01 May 1981
TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

4,948 citations

01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

1,815 citations

Journal ArticleDOI
01 Apr 2017-Catena
TL;DR: In this article, the authors used three state-of-the-art data mining techniques, namely, logistic model tree (LMT), random forest (RF), and classification and regression tree (CART) models, to map landslide susceptibility.
Abstract: The main purpose of the present study is to use three state-of-the-art data mining techniques, namely, logistic model tree (LMT), random forest (RF), and classification and regression tree (CART) models, to map landslide susceptibility. Long County was selected as the study area. First, a landslide inventory map was constructed using history reports, interpretation of aerial photographs, and extensive field surveys. A total of 171 landslide locations were identified in the study area. Twelve landslide-related parameters were considered for landslide susceptibility mapping, including slope angle, slope aspect, plan curvature, profile curvature, altitude, NDVI, land use, distance to faults, distance to roads, distance to rivers, lithology, and rainfall. The 171 landslides were randomly separated into two groups with a 70/30 ratio for training and validation purposes, and different ratios of non-landslides to landslides grid cells were used to obtain the highest classification accuracy. The linear support vector machine algorithm (LSVM) was used to evaluate the predictive capability of the 12 landslide conditioning factors. Second, LMT, RF, and CART models were constructed using training data. Finally, the applied models were validated and compared using receiver operating characteristics (ROC), and predictive accuracy (ACC) methods. Overall, all three models exhibit reasonably good performances; the RF model exhibits the highest predictive capability compared with the LMT and CART models. The RF model, with a success rate of 0.837 and a prediction rate of 0.781, is a promising technique for landslide susceptibility mapping. Therefore, these three models are useful tools for spatial prediction of landslide susceptibility.

591 citations

Journal ArticleDOI
TL;DR: The interval-valued HFSs and the corresponding correlation coefficient formulas are developed and demonstrated their application in clustering with intervals-valued hesitant fuzzy information through a specific numerical example.

449 citations