scispace - formally typeset
Search or ask a question
Author

Azuraliza Abu Bakar

Bio: Azuraliza Abu Bakar is an academic researcher from National University of Malaysia. The author has contributed to research in topics: Feature selection & Rough set. The author has an hindex of 18, co-authored 207 publications receiving 1538 citations. Previous affiliations of Azuraliza Abu Bakar include Information Technology University.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes hybrid feature selection approaches based on the Genetic Algorithm that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously.
Abstract: An enhanced genetic algorithm (EGA) is proposed to reduce text dimensionality.The proposed EGA outperformed the traditional genetic algorithm.The EGA is incorporated with six filter feature selection methods to create hybrid feature selection approaches.The proposed hybrid approaches outperformed the single filtering methods. This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

182 citations

Journal ArticleDOI
TL;DR: The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.
Abstract: In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.

69 citations

Journal ArticleDOI
TL;DR: This study analyses multiple classifiers such as Naive Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data to show the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.
Abstract: Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naive Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

59 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

01 Jan 2002

9,314 citations

01 Jan 1981
TL;DR: In this article, Monte Carlo techniques are used to estimate the probability of a given set of variables for a particular set of classes of data, such as conditional probability and hypergeometric probability.
Abstract: 1. Introduction 1.1 An Overview 1.2 Some Examples 1.3 A Brief History 1.4 A Chapter Summary 2. Probability 2.1 Introduction 2.2 Sample Spaces and the Algebra of Sets 2.3 The Probability Function 2.4 Conditional Probability 2.5 Independence 2.6 Combinatorics 2.7 Combinatorial Probability 2.8 Taking a Second Look at Statistics (Monte Carlo Techniques) 3. Random Variables 3.1 Introduction 3.2 Binomial and Hypergeometric Probabilities 3.3 Discrete Random Variables 3.4 Continuous Random Variables 3.5 Expected Values 3.6 The Variance 3.7 Joint Densities 3.8 Transforming and Combining Random Variables 3.9 Further Properties of the Mean and Variance 3.10 Order Statistics 3.11 Conditional Densities 3.12 Moment-Generating Functions 3.13 Taking a Second Look at Statistics (Interpreting Means) Appendix 3.A.1 MINITAB Applications 4. Special Distributions 4.1 Introduction 4.2 The Poisson Distribution 4.3 The Normal Distribution 4.4 The Geometric Distribution 4.5 The Negative Binomial Distribution 4.6 The Gamma Distribution 4.7 Taking a Second Look at Statistics (Monte Carlo Simulations) Appendix 4.A.1 MINITAB Applications Appendix 4.A.2 A Proof of the Central Limit Theorem 5. Estimation 5.1 Introduction 5.2 Estimating Parameters: The Method of Maximum Likelihood and the Method of Moments 5.3 Interval Estimation 5.4 Properties of Estimators 5.5 Minimum-Variance Estimators: The Crami?½r-Rao Lower Bound 5.6 Sufficient Estimators 5.7 Consistency 5.8 Bayesian Estimation 5.9 Taking A Second Look at Statistics (Beyond Classical Estimation) Appendix 5.A.1 MINITAB Applications 6. Hypothesis Testing 6.1 Introduction 6.2 The Decision Rule 6.3 Testing Binomial Dataâ H0: p = po 6.4 Type I and Type II Errors 6.5 A Notion of Optimality: The Generalized Likelihood Ratio 6.6 Taking a Second Look at Statistics (Statistical Significance versus â Practicalâ Significance) 7. Inferences Based on the Normal Distribution 7.1 Introduction 7.2 Comparing Y-i?½ s/ vn and Y-i?½ S/ vn 7.3 Deriving the Distribution of Y-i?½ S/ vn 7.4 Drawing Inferences About i?½ 7.5 Drawing Inferences About s2 7.6 Taking a Second Look at Statistics (Type II Error) Appendix 7.A.1 MINITAB Applications Appendix 7.A.2 Some Distribution Results for Y and S2 Appendix 7.A.3 A Proof that the One-Sample t Test is a GLRT Appendix 7.A.4 A Proof of Theorem 7.5.2 8. Types of Data: A Brief Overview 8.1 Introduction 8.2 Classifying Data 8.3 Taking a Second Look at Statistics (Samples Are Not â Validâ !) 9. Two-Sample Inferences 9.1 Introduction 9.2 Testing H0: i?½X =i?½Y 9.3 Testing H0: s2X=s2Yâ The F Test 9.4 Binomial Data: Testing H0: pX = pY 9.5 Confidence Intervals for the Two-Sample Problem 9.6 Taking a Second Look at Statistics (Choosing Samples) Appendix 9.A.1 A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2) Appendix 9.A.2 MINITAB Applications 10. Goodness-of-Fit Tests 10.1 Introduction 10.2 The Multinomial Distribution 10.3 Goodness-of-Fit Tests: All Parameters Known 10.4 Goodness-of-Fit Tests: Parameters Unknown 10.5 Contingency Tables 10.6 Taking a Second Look at Statistics (Outliers) Appendix 10.A.1 MINITAB Applications 11. Regression 11.1 Introduction 11.2 The Method of Least Squares 11.3 The Linear Model 11.4 Covariance and Correlation 11.5 The Bivariate Normal Distribution 11.6 Taking a Second Look at Statistics (How Not to Interpret the Sample Correlation Coefficient) Appendix 11.A.1 MINITAB Applications Appendix 11.A.2 A Proof of Theorem 11.3.3 12. The Analysis of Variance 12.1 Introduction 12.2 The F Test 12.3 Multiple Comparisons: Tukeyâ s Method 12.4 Testing Subhypotheses with Contrasts 12.5 Data Transformations 12.6 Taking a Second Look at Statistics (Putting the Subject of Statistics togetherâ the Contributions of Ronald A. Fisher) Appendix 12.A.1 MINITAB Applications Appendix 12.A.2 A Proof of Theorem 12.2.2 Appendix 12.A.3 The Distribution of SSTR/(kâ 1) SSE/(nâ k)When H1 is True 13. Randomized Block Designs 13.1 Introduction 13.2 The F Test for a Randomized Block Design 13.3 The Paired t Test 13.4 Taking a Second Look at Statistics (Choosing between a Two-Sample t Test and a Paired t Test) Appendix 13.A.1 MINITAB Applications 14. Nonparametric Statistics 14.1 Introduction 14.2 The Sign Test 14.3 Wilcoxon Tests 14.4 The Kruskal-Wallis Test 14.5 The Friedman Test 14.6 Testing for Randomness 14.7 Taking a Second Look at Statistics (Comparing Parametric and Nonparametric Procedures) Appendix 14.A.1 MINITAB Applications Appendix: Statistical Tables Answers to Selected Odd-Numbered Questions Bibliography Index

524 citations