Home
/
Authors
/
Hussein Almuallim

Author

Hussein Almuallim

King Fahd University of Petroleum and Minerals

Other affiliations: Oregon State University

Bio: Hussein Almuallim is an academic researcher from King Fahd University of Petroleum and Minerals. The author has contributed to research in topics: Decision tree & Incremental decision tree. The author has an hindex of 10, co-authored 26 publications receiving 1743 citations. Previous affiliations of Hussein Almuallim include Oregon State University.

Papers

PDF

Open Access

More filters

Proceedings Article•

Learning with many irrelevant features

[...]

Hussein Almuallim¹, Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

14 Jul 1991

TL;DR: It is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features, and suggests that training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

...read moreread less

Abstract: In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that-- contrary to expectations--these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

...read moreread less

716 citations

Journal Article•DOI•

Learning Boolean concepts in the presence of many irrelevant features

[...]

Hussein Almuallim¹, Thomas G. Dietterich²•Institutions (2)

King Fahd University of Petroleum and Minerals¹, Oregon State University²

01 Sep 1994-Artificial Intelligence

TL;DR: Five algorithms that identify a subset of features sufficient to construct a hypothesis consistent with the training examples are presented and it is shown that any learning algorithm implementing the MIN-FEATURES bias requires ⊖(( ln ( l δ ) + [2 p + p ln n])/e) training examples to guarantee PAC-learning a concept having p relevant features out of n available features.

...read moreread less

537 citations

Journal Article•DOI•

A Method of Recognition of Arabic Cursive Handwriting

[...]

Hussein Almuallim, Shoichiro Yamaguchi¹•Institutions (1)

Tokyo Institute of Technology¹

01 May 1987-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this correspondence, a structural recognition method of Arabic cursively handwritten words is proposed, in which words are first segmented into strokes and classified using their geometrical and topological properties.

...read moreread less

Abstract: In spite of the progress of machine recognition techniques of Latin, Kana, and Chinese characters over the two past decades, the machine recognition of Arabic characters has remained almost untouched. In this correspondence, a structural recognition method of Arabic cursively handwritten words is proposed. In this method, words are first segmented into strokes. Those strokes are then classified using their geometrical and topological properties. Finally, the relative position of the classified strokes are examined, and the strokes are combined in several steps into a string of characters that represents the recognized word. Experimental results on texts handwritten by two persons showed high recognition accuracy.

...read moreread less

209 citations

Efficient Algorithms for Identifying Relevant Features

[...]

Hussein Almuallim, Thomas G. Dietterich

01 Jan 1992

TL;DR: Experimental studies show that the earning performance of ID3 is greatly improved when these algorithms are used to process the training data by eliminating the irrelevant features from ID3''s consideration.

...read moreread less

Abstract: This paper describes different methods for exact and approximate implementation of the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This bias is useful for learning domains where many irrelevant features are present in the training data. We first introduce FOCUS-2, a new algorithm that exactly implements the MIN-FEATURES bias. This algorithm is empirically shown to be substantially faster than the FOCUS algorithm previously given in [Almuallim and Dietterich 91]. We then introduce the Mutual-Information-Greedy, Simple-Greedy and Weighted-Greedy Algorithms, which apply efficient heuristics for approximating the MIN-Features bias. These algorithms employ greedy heuristics that trade optimality for computational efficiency. Experimental studies show that the ;earning performance of ID3 is greatly improved when these algorithms are used to process the training data by eliminating the irrelevant features from ID3''s consideration. In particular, the Weighted-Greedy algorithm provides an excellent and efficient approximation of the MIN-Features bias.

...read moreread less

129 citations

Journal Article•DOI•

An efficient algorithm for optimal pruning of decision trees

[...]

Hussein Almuallim¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Jun 1996-Artificial Intelligence

TL;DR: A new algorithm called OPT-2 for optimal pruning of decision trees is introduced, based on dynamic programming, which is an improvement over the recently published OPT algorithm of Bohanec and Bratko especially in the case of heavy pruning.

...read moreread less

56 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Journal Article•DOI•

Wrappers for feature subset selection

[...]

Ron Kohavi, George H. John

01 Dec 1997-Artificial Intelligence

TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.

...read moreread less

8,610 citations

Book•

Evolutionary algorithms for solving multi-objective problems

[...]

Gary B. Lamont, David A. Van Veldhuizen

30 Jun 2002

TL;DR: This paper presents a meta-anatomy of the multi-Criteria Decision Making process, which aims to provide a scaffolding for the future development of multi-criteria decision-making systems.

...read moreread less

Abstract: List of Figures. List of Tables. Preface. Foreword. 1. Basic Concepts. 2. Evolutionary Algorithm MOP Approaches. 3. MOEA Test Suites. 4. MOEA Testing and Analysis. 5. MOEA Theory and Issues. 3. MOEA Theoretical Issues. 6. Applications. 7. MOEA Parallelization. 8. Multi-Criteria Decision Making. 9. Special Topics. 10. Epilog. Appendix A: MOEA Classification and Technique Analysis. Appendix B: MOPs in the Literature. Appendix C: Ptrue & PFtrue for Selected Numeric MOPs. Appendix D: Ptrue & PFtrue for Side-Constrained MOPs. Appendix E: MOEA Software Availability. Appendix F: MOEA-Related Information. Index. References.

...read moreread less

5,994 citations

Correlation-based Feature Selection for Machine Learning

[...]

Mark Hall

01 Jan 1998

TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.

...read moreread less

Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

...read moreread less

3,533 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse