Author
Mohammed Nasser
Other affiliations: University of Rajshahi, University of Malaya
Bio: Mohammed Nasser is an academic researcher from University of A Coruña. The author has contributed to research in topics: Outlier & Regression analysis. The author has an hindex of 12, co-authored 33 publications receiving 445 citations. Previous affiliations of Mohammed Nasser include University of Rajshahi & University of Malaya.
Papers
More filters
••
TL;DR: This work has built two models for the classification purpose, one is based on Support Vector Machines (SVM) and the other is Random Forests (RF), and Experimental results show that either classifier is effective.
Abstract: The success of
any Intrusion Detection System (IDS) is a complicated problem due to its
nonlinearity and the quantitative or qualitative network traffic data stream
with many features. To get rid of this problem, several types of intrusion
detection methods have been proposed and shown different levels of accuracy.
This is why the choice of the effective and robust method for IDS is very
important topic in information security. In this work, we have built two models
for the classification purpose. One is based on Support Vector Machines (SVM)
and the other is Random Forests (RF). Experimental results show that either
classifier is effective. SVM is slightly more accurate, but more expensive in
terms of time. RF produces similar accuracy in a much faster manner if given
modeling parameters. These classifiers can contribute to an IDS system as one
source of analysis and increase its accuracy. In this paper, KDD’99 Dataset is used and find out which
one is the best intrusion
detector for this dataset. Statistical
analysis on KDD’99 dataset found important issues which highly affect the
performance of evaluated systems and results in a very poor evaluation of
anomaly detection approaches. The most important deficiency in the KDD’99 dataset
is the huge number of redundant records. To solve these
issues, we have developed a new dataset, KDD99Train+ and KDD99Test+, which does
not include any redundant records in the train set as well as in the test set,
so the classifiers will not be biased towards more frequent records. The
numbers of records in the train and test sets are now reasonable, which make it
affordable to run the experiments on the complete set without the need to
randomly select a small portion. The findings of this paper will be very useful
to use SVM and RF in a more
meaningful way in order to maximize the performance rate and minimize the false
negative rate.
131 citations
••
TL;DR: Results show that the Random Forest based proposed approach can select most important and relevant features useful for classification, which reduces not only the number of input features and time but also increases the classification accuracy.
Abstract: An intrusion detection system
collects and analyzes information from different areas within a computer or a
network to identify possible security threats that include threats from both
outside as well as inside of the organization. It deals with large amount of
data, which contains various ir-relevant and redundant features and results in
increased processing time and low detection rate. Therefore, feature selection
should be treated as an indispensable pre-processing step to improve the
overall system performance significantly while mining on huge datasets. In this
context, in this paper, we focus on a two-step approach of feature selection based
on Random Forest. The first step selects the features with higher variable
importance score and guides the initialization of search process for the second
step whose outputs the final feature subset for classification and
in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99
intrusion detection datasets, which are based on DARPA 98 dataset, provides
labeled data for researchers working in the field of intrusion detection. The
important deficiency in the KDD’99 data set is the huge number of redundant
records as observed earlier. Therefore, we have derived a data set RRE-KDD by
eliminating redundant record from KDD’99 train and test dataset, so the
classifiers and feature selection method will not be biased towards more
frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+
dataset for training and testing purposes, respectively. The experimental
results show that the Random Forest based proposed approach can select most
im-portant and relevant features useful for classification, which, in turn,
reduces not only the number of input features and time but also increases the
classification accuracy.
88 citations
••
Luleå University of Technology1, University of Cyprus2, Slovak University of Technology in Bratislava3, Vienna University of Technology4, Mario Negri Institute for Pharmacological Research5, James I University6, Swedish University of Agricultural Sciences7, National and Kapodistrian University of Athens8, Norwegian Institute for Water Research9, University of A Coruña10, University of Antwerp11, University of South Australia12, Comenius University in Bratislava13, Cranfield University14, Lehigh University15, University of Bath16, King Abdullah University of Science and Technology17, University of Belgrade18, Aristotle University of Thessaloniki19, University of Alcalá20, Norwegian University of Life Sciences21, Innsbruck Medical University22, National Institute for Health and Welfare23, Uppsala University24, Slovak Academy of Sciences25
TL;DR: The NORMAN SCORE “SARS-CoV-2 in sewage” database provides a platform for rapid, open access data sharing, validated by the uploading of 276 data sets from nine countries to-date and is a resource for the development of recommendations on minimum data requirements for wastewater pathogen surveillance.
43 citations
••
TL;DR: In this paper, statistical regression models from the viral load detected in the wastewater and the epidemiological data from A Coruna health system that allowed us to estimate the number of infected people, including symptomatic and asymptomatic individuals, with reliability close to 90%, were developed.
Abstract: The quantification of the SARS-CoV-2 RNA load in wastewater has emerged as a useful tool to monitor COVID-19 outbreaks in the community. This approach was implemented in the metropolitan area of A Coruna (NW Spain), where wastewater from a treatment plant was analyzed to track the epidemic dynamics in a population of 369,098 inhabitants. Statistical regression models from the viral load detected in the wastewater and the epidemiological data from A Coruna health system that allowed us to estimate the number of infected people, including symptomatic and asymptomatic individuals, with reliability close to 90%, were developed. These models can help to understand the real magnitude of the epidemic in a population at any given time and can be used as an effective early warning tool for predicting outbreaks. The methodology of the present work could be used to develop a similar wastewater-based epidemiological model to track the evolution of the COVID-19 epidemic anywhere in the world.
28 citations
••
TL;DR: The finite mixture of ARMA-GARCH model is applied instead of AR or ARMA models to compare with the standard BP and SVM in forecasting financial time series (daily stock market index returns and exchange rate returns) and shows that the SVM model shows long memory property in forecastingFinancial returns.
Abstract: The use of GARCH type models and computational-intelligence-based techniques for forecasting financial time series has been proved extremely successful in recent times. In this article, we apply the finite mixture of ARMA-GARCH model instead of AR or ARMA models to compare with the standard BP and SVM in forecasting financial time series (daily stock market index returns and exchange rate returns). We do not apply the pure GARCH model as the finite mixture of the ARMA-GARCH model outperforms the pure GARCH model. These models are evaluated on five performance metrics or criteria. Our experiment shows that the SVM model outperforms both the finite mixture of ARMA-GARCH and BP models in deviation performance criteria. In direction performance criteria, the finite mixture of ARMA-GARCH model performs better. The memory property of these forecasting techniques is also examined using the behavior of forecasted values vis-a-vis the original values. Only the SVM model shows long memory property in forecasting fi...
26 citations
Cited by
More filters
••
01 May 1981TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.
4,948 citations
01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.
1,815 citations
••
TL;DR: In this article, the authors used three state-of-the-art data mining techniques, namely, logistic model tree (LMT), random forest (RF), and classification and regression tree (CART) models, to map landslide susceptibility.
Abstract: The main purpose of the present study is to use three state-of-the-art data mining techniques, namely, logistic model tree (LMT), random forest (RF), and classification and regression tree (CART) models, to map landslide susceptibility. Long County was selected as the study area. First, a landslide inventory map was constructed using history reports, interpretation of aerial photographs, and extensive field surveys. A total of 171 landslide locations were identified in the study area. Twelve landslide-related parameters were considered for landslide susceptibility mapping, including slope angle, slope aspect, plan curvature, profile curvature, altitude, NDVI, land use, distance to faults, distance to roads, distance to rivers, lithology, and rainfall. The 171 landslides were randomly separated into two groups with a 70/30 ratio for training and validation purposes, and different ratios of non-landslides to landslides grid cells were used to obtain the highest classification accuracy. The linear support vector machine algorithm (LSVM) was used to evaluate the predictive capability of the 12 landslide conditioning factors. Second, LMT, RF, and CART models were constructed using training data. Finally, the applied models were validated and compared using receiver operating characteristics (ROC), and predictive accuracy (ACC) methods. Overall, all three models exhibit reasonably good performances; the RF model exhibits the highest predictive capability compared with the LMT and CART models. The RF model, with a success rate of 0.837 and a prediction rate of 0.781, is a promising technique for landslide susceptibility mapping. Therefore, these three models are useful tools for spatial prediction of landslide susceptibility.
591 citations
••
TL;DR: The interval-valued HFSs and the corresponding correlation coefficient formulas are developed and demonstrated their application in clustering with intervals-valued hesitant fuzzy information through a specific numerical example.
449 citations