Home
/
Authors
/
Azuraliza Abu Bakar

Author

Azuraliza Abu Bakar

Other affiliations: Information Technology University

Bio: Azuraliza Abu Bakar is an academic researcher from National University of Malaysia. The author has contributed to research in topics: Feature selection & Rough set. The author has an hindex of 18, co-authored 207 publications receiving 1538 citations. Previous affiliations of Azuraliza Abu Bakar include Information Technology University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2004
2002
2001
2000
1993

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hybrid feature selection based on enhanced genetic algorithm for text categorization

[...]

Abdullah Saeed Ghareb¹, Azuraliza Abu Bakar¹, Abdul Razak Hamdan¹•Institutions (1)

National University of Malaysia¹

01 May 2016-Expert Systems With Applications

TL;DR: This paper proposes hybrid feature selection approaches based on the Genetic Algorithm that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously.

...read moreread less

Abstract: An enhanced genetic algorithm (EGA) is proposed to reduce text dimensionality.The proposed EGA outperformed the traditional genetic algorithm.The EGA is incorporated with six filter feature selection methods to create hybrid feature selection approaches.The proposed hybrid approaches outperformed the single filtering methods. This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

...read moreread less

182 citations

Journal Article•DOI•

A review of feature selection techniques in sentiment analysis

[...]

Siti Rohaidah Ahmad¹, Azuraliza Abu Bakar², Mohd Ridzwan Yaakub²•Institutions (2)

National Defence University of Malaysia¹, National University of Malaysia²

01 Jan 2019

69 citations

Journal Article•DOI•

Multi-objective PSO algorithm for mining numerical association rules without a priori discretization

[...]

Vahid Beiranvand¹, Mohamad Mobasher-Kashani², Azuraliza Abu Bakar²•Institutions (2)

University of British Columbia¹, National University of Malaysia²

01 Jul 2014-Expert Systems With Applications

TL;DR: The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.

...read moreread less

Abstract: In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.

...read moreread less

69 citations

Journal Article•DOI•

Medical data classification with Naive Bayes approach

[...]

Khadija Mohammad Al-Aidaroos¹, Azuraliza Abu Bakar, Zalinda Othman•Institutions (1)

National University of Malaysia¹

01 Sep 2012-Information Technology Journal

63 citations

Journal Article•DOI•

Comparative Analysis of Data Mining Techniques for Malaysian Rainfall Prediction

[...]

Suhaila Zainudin¹, Dalia Sami Jasim¹, Azuraliza Abu Bakar¹•Institutions (1)

National University of Malaysia¹

09 Dec 2016-International Journal on Advanced Science, Engineering and Information Technology

TL;DR: This study analyses multiple classifiers such as Naive Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data to show the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

...read moreread less

Abstract: Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naive Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

...read moreread less

59 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

An introduction to mathematical statistical and its applications / Richard J. Larsen, Morris L. Marx

[...]

Richard J. Larsen, Morris L. Marx

01 Jan 1981

TL;DR: In this article, Monte Carlo techniques are used to estimate the probability of a given set of variables for a particular set of classes of data, such as conditional probability and hypergeometric probability.

...read moreread less

Abstract: 1. Introduction 1.1 An Overview 1.2 Some Examples 1.3 A Brief History 1.4 A Chapter Summary 2. Probability 2.1 Introduction 2.2 Sample Spaces and the Algebra of Sets 2.3 The Probability Function 2.4 Conditional Probability 2.5 Independence 2.6 Combinatorics 2.7 Combinatorial Probability 2.8 Taking a Second Look at Statistics (Monte Carlo Techniques) 3. Random Variables 3.1 Introduction 3.2 Binomial and Hypergeometric Probabilities 3.3 Discrete Random Variables 3.4 Continuous Random Variables 3.5 Expected Values 3.6 The Variance 3.7 Joint Densities 3.8 Transforming and Combining Random Variables 3.9 Further Properties of the Mean and Variance 3.10 Order Statistics 3.11 Conditional Densities 3.12 Moment-Generating Functions 3.13 Taking a Second Look at Statistics (Interpreting Means) Appendix 3.A.1 MINITAB Applications 4. Special Distributions 4.1 Introduction 4.2 The Poisson Distribution 4.3 The Normal Distribution 4.4 The Geometric Distribution 4.5 The Negative Binomial Distribution 4.6 The Gamma Distribution 4.7 Taking a Second Look at Statistics (Monte Carlo Simulations) Appendix 4.A.1 MINITAB Applications Appendix 4.A.2 A Proof of the Central Limit Theorem 5. Estimation 5.1 Introduction 5.2 Estimating Parameters: The Method of Maximum Likelihood and the Method of Moments 5.3 Interval Estimation 5.4 Properties of Estimators 5.5 Minimum-Variance Estimators: The Crami?½r-Rao Lower Bound 5.6 Sufficient Estimators 5.7 Consistency 5.8 Bayesian Estimation 5.9 Taking A Second Look at Statistics (Beyond Classical Estimation) Appendix 5.A.1 MINITAB Applications 6. Hypothesis Testing 6.1 Introduction 6.2 The Decision Rule 6.3 Testing Binomial Dataâ H0: p = po 6.4 Type I and Type II Errors 6.5 A Notion of Optimality: The Generalized Likelihood Ratio 6.6 Taking a Second Look at Statistics (Statistical Significance versus â Practicalâ Significance) 7. Inferences Based on the Normal Distribution 7.1 Introduction 7.2 Comparing Y-i?½ s/ vn and Y-i?½ S/ vn 7.3 Deriving the Distribution of Y-i?½ S/ vn 7.4 Drawing Inferences About i?½ 7.5 Drawing Inferences About s2 7.6 Taking a Second Look at Statistics (Type II Error) Appendix 7.A.1 MINITAB Applications Appendix 7.A.2 Some Distribution Results for Y and S2 Appendix 7.A.3 A Proof that the One-Sample t Test is a GLRT Appendix 7.A.4 A Proof of Theorem 7.5.2 8. Types of Data: A Brief Overview 8.1 Introduction 8.2 Classifying Data 8.3 Taking a Second Look at Statistics (Samples Are Not â Validâ !) 9. Two-Sample Inferences 9.1 Introduction 9.2 Testing H0: i?½X =i?½Y 9.3 Testing H0: s2X=s2Yâ The F Test 9.4 Binomial Data: Testing H0: pX = pY 9.5 Confidence Intervals for the Two-Sample Problem 9.6 Taking a Second Look at Statistics (Choosing Samples) Appendix 9.A.1 A Derivation of the Two-Sample t Test (A Proof of Theorem 9.2.2) Appendix 9.A.2 MINITAB Applications 10. Goodness-of-Fit Tests 10.1 Introduction 10.2 The Multinomial Distribution 10.3 Goodness-of-Fit Tests: All Parameters Known 10.4 Goodness-of-Fit Tests: Parameters Unknown 10.5 Contingency Tables 10.6 Taking a Second Look at Statistics (Outliers) Appendix 10.A.1 MINITAB Applications 11. Regression 11.1 Introduction 11.2 The Method of Least Squares 11.3 The Linear Model 11.4 Covariance and Correlation 11.5 The Bivariate Normal Distribution 11.6 Taking a Second Look at Statistics (How Not to Interpret the Sample Correlation Coefficient) Appendix 11.A.1 MINITAB Applications Appendix 11.A.2 A Proof of Theorem 11.3.3 12. The Analysis of Variance 12.1 Introduction 12.2 The F Test 12.3 Multiple Comparisons: Tukeyâ s Method 12.4 Testing Subhypotheses with Contrasts 12.5 Data Transformations 12.6 Taking a Second Look at Statistics (Putting the Subject of Statistics togetherâ the Contributions of Ronald A. Fisher) Appendix 12.A.1 MINITAB Applications Appendix 12.A.2 A Proof of Theorem 12.2.2 Appendix 12.A.3 The Distribution of SSTR/(kâ 1) SSE/(nâ k)When H1 is True 13. Randomized Block Designs 13.1 Introduction 13.2 The F Test for a Randomized Block Design 13.3 The Paired t Test 13.4 Taking a Second Look at Statistics (Choosing between a Two-Sample t Test and a Paired t Test) Appendix 13.A.1 MINITAB Applications 14. Nonparametric Statistics 14.1 Introduction 14.2 The Sign Test 14.3 Wilcoxon Tests 14.4 The Kruskal-Wallis Test 14.5 The Friedman Test 14.6 Testing for Randomness 14.7 Taking a Second Look at Statistics (Comparing Parametric and Nonparametric Procedures) Appendix 14.A.1 MINITAB Applications Appendix: Statistical Tables Answers to Selected Odd-Numbered Questions Bibliography Index

...read moreread less

524 citations

[서평]「Algorithms on Strings, Trees, and Sequences」

[...]

김동규

01 Mar 2000

512 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse