scispace - formally typeset
Search or ask a question
Author

Firuz Kamalov

Bio: Firuz Kamalov is an academic researcher from Canadian University of Dubai. The author has contributed to research in topics: Computer science & Feature selection. The author has an hindex of 9, co-authored 54 publications receiving 355 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The goal of this paper is to demonstrate the effects of class imbalance on classification models and determine that the relationship between the class imbalance ratio and the accuracy is convex.

292 citations

Journal ArticleDOI
TL;DR: A computational intelligence method called Variable Analysis (Va) is proposed that considers feature-to-class correlations and reduces feature- to-feature correlations and was able to derive fewer numbers of features from adult, adolescent, and child screening methods yet maintained competitive predictive accuracy, sensitivity, and specificity rates.

120 citations

Journal ArticleDOI
TL;DR: This paper investigates the performance of the sampling method based on kernel density estimation (KDE) and concludes that the proposed method would be a valuable tool in problems involving imbalanced class distribution.

72 citations

Journal ArticleDOI
TL;DR: Three neural network models including multilayer perceptron, convolutional net, and long short-term memory net are constructed and tested and it is shown that predicting significant changes in stock price can be accomplished with a high degree of accuracy.
Abstract: Stock price prediction is a rich research topic that has attracted interest from various areas of science. The recent success of machine learning in speech and image recognition has prompted researchers to apply these methods to asset price prediction. The majority of literature has been devoted to predicting either the actual asset price or the direction of price movement. In this paper, we study a hitherto little explored question of predicting significant changes in stock price based on previous changes using machine learning algorithms. We are particularly interested in the performance of neural network classifiers in the given context. To this end, we construct and test three neural network models including multilayer perceptron, convolutional net, and long short-term memory net. As benchmark models, we use random forest and relative strength index methods. The models are tested using 10-year daily stock price data of four major US public companies. Test results show that predicting significant changes in stock price can be accomplished with a high degree of accuracy. In particular, we obtain substantially better results than similar studies that forecast the direction of price change.

48 citations

Journal ArticleDOI
TL;DR: A new filtering method is proposed that combines and normalizes the scores of three major feature selection methods: information gain, chi-squared statistic and inter-correlation and maximizes the stability in the variables’ scores without losing the overall accuracy of the predictive model.
Abstract: One of the major aspects of any classification process is selecting the relevant set of features to be used in a classification algorithm. This initial step in data analysis is called the feature selection process. Disposing of the irrelevant features from the dataset will reduce the complexity of the classification task and will increase the robustness of the decision rules when applied on the test set. This paper proposes a new filtering method that combines and normalizes the scores of three major feature selection methods: information gain, chi-squared statistic and inter-correlation. Our method utilizes the strengths of each of the aforementioned methods to maximum advantage while avoiding their drawbacks—especially the disparity of the results produced by these methods. Our filtering method stabilizes each variable score and gives it the true rank among the input data’s available variables. Hence it maximizes the stability in the variables’ scores without losing the overall accuracy of the predictive model. A number of experiments on different datasets from various domains have shown that features chosen by the proposed method are highly predictive when compared with features selected by other existing filtering methods. The evaluation of the filtering phase was conducted via thorough experimentations using a number of predictive classification algorithms in addition to statistical analysis of the filtering methods’ scores.

42 citations


Cited by
More filters
Dissertation
01 Jul 2016
TL;DR: In this paper, a clustering-based under-sampling strategy was proposed to balance the imbalance between the minority class and the majority class, where the number of clusters in the majority classes is set to be equal to the number in the minority classes.
Abstract: Abstract Class imbalance is often a problem in various real-world data sets, where one class (i.e. the minority class) contains a small number of data points and the other (i.e. the majority class) contains a large number of data points. It is notably difficult to develop an effective model using current data mining and machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. Random undersampling and oversampling have been used in numerous studies to ensure that the different classes contain the same number of data points. A classifier ensemble (i.e. a structure containing several classifiers) can be trained on several different balanced data sets for later classification purposes. In this paper, we introduce two undersampling strategies in which a clustering technique is used during the data preprocessing step. Specifically, the number of clusters in the majority class is set to be equal to the number of data points in the minority class. The first strategy uses the cluster centers to represent the majority class, whereas the second strategy uses the nearest neighbors of the cluster centers. A further study was conducted to examine the effect on performance of the addition or deletion of 5 to 10 cluster centers in the majority class. The experimental results obtained using 44 small-scale and 2 large-scale data sets revealed that the clustering-based undersampling approach with the second strategy outperformed five state-of-the-art approaches. Specifically, this approach combined with a single multilayer perceptron classifier and C4.5 decision tree classifier ensembles delivered optimal performance over both small- and large-scale data sets.

336 citations

Journal ArticleDOI
TL;DR: The goal of this paper is to demonstrate the effects of class imbalance on classification models and determine that the relationship between the class imbalance ratio and the accuracy is convex.

292 citations

Journal ArticleDOI
TL;DR: This paper provides a critical analysis of the literature in ML, focusing on the application of Artificial Neural Network (ANN) to sport results prediction, and proposes a novel sport prediction framework through which ML can be used as a learning strategy.

166 citations