Home
/
Authors
/
Angus S. K. Ng

Author

Angus S. K. Ng

Other affiliations: Griffith University, City University of Hong Kong

Bio: Angus S. K. Ng is an academic researcher from University of Queensland. The author has contributed to research in topics: Random effects model & Mixture model. The author has an hindex of 6, co-authored 7 publications receiving 4432 citations. Previous affiliations of Angus S. K. Ng include Griffith University & City University of Hong Kong.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Top 10 algorithms in data mining

[...]

Xindong Wu¹, Vipin Kumar², J. Ross Quinlan, Joydeep Ghosh³, Qiang Yang⁴, Hiroshi Motoda⁵, Geoffrey J. McLachlan⁶, Angus S. K. Ng⁷, Bing Liu⁸, Philip S. Yu⁹, Zhi-Hua Zhou¹⁰, Michael Steinbach², David J. Hand¹¹, Dan Steinberg¹² - Show less +10 more•Institutions (12)

University of Vermont¹, University of Minnesota², University of Texas at Austin³, Hong Kong University of Science and Technology⁴, Osaka University⁵, University of Queensland⁶, Griffith University⁷, University of Illinois at Chicago⁸, IBM⁹, Nanjing University¹⁰, Imperial College London¹¹, University of Salford¹²

19 Dec 2007-Knowledge and Information Systems

TL;DR: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART.

...read moreread less

Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

...read moreread less

4,944 citations

Journal Article•DOI•

Long‐term survivor mixture model with random effects: application to a multi‐centre clinical trial of carcinoma

[...]

Kelvin K. W. Yau¹, Angus S. K. Ng²•Institutions (2)

City University of Hong Kong¹, University of Queensland²

15 Jun 2001-Statistics in Medicine

TL;DR: The long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach and is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration.

...read moreread less

Abstract: A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survival data are obtained from a multi-centre clinical trial, it is conceived that the environmental conditions and facilities shared within clinic affects the proportion cured as well as the failure risk for the uncured individuals. It necessitates a long-term survivor mixture model with random effects. In this paper, the long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach. The proposed model is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration. Some simulation experiments are performed to assess the applicability of the model based on the average biases of the estimates formed.

...read moreread less

58 citations

Journal Article•DOI•

Finite mixture regression model with random effects: application to neonatal hospital length of stay

[...]

Kelvin K. W. Yau¹, Andy H. Lee², Angus S. K. Ng³•Institutions (3)

City University of Hong Kong¹, Curtin University², University of Queensland³

28 Jan 2003-Computational Statistics & Data Analysis

TL;DR: It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation and the development of the class of finite mixture GLMM.

...read moreread less

52 citations

Journal Article•DOI•

Determinants of maternity length of stay: a Gamma mixture risk-adjusted model.

[...]

Andy H. Lee¹, Angus S. K. Ng², Kelvin K.W. Yau²•Institutions (2)

Curtin University¹, University of Queensland²

01 Jan 2001-Health Care Management Science

TL;DR: A Gamma mixture risk-adjusted model is proposed in order to analyze heterogeneity of maternity LOS within obstetrical Diagnosis Related Groups (DRGs) and to model variations in LOS.

...read moreread less

Abstract: With obstetrical delivery being the most frequent cause for hospital admissions, it is important to determine health- and patient-related characteristics affecting maternity length of stay (LOS). Although the average inpatient LOS has decreased steadily over the years, the issue of the appropriate LOS after delivery is complex and hotly debated, especially since the introduction of the mandatory minimum-stay legislation in the USA. The purpose of this paper is to identity factors associated with maternity LOS and to model variations in LOS. A Gamma mixture risk-adjusted model is proposed in order to analyze heterogeneity of maternity LOS within obstetrical Diagnosis Related Groups (DRGs). The determination of pertinent factors would benefit hospital administrators and clinicians to manage LOS and expenditures efficiently.

...read moreread less

34 citations

Journal Article•DOI•

A zero-augmented gamma mixed model for longitudinal data with many zeros

[...]

Kelvin K. W. Yau¹, Andy H. Lee¹, Angus S. K. Ng¹•Institutions (1)

City University of Hong Kong¹

01 Jun 2002-Australian & New Zealand Journal of Statistics

TL;DR: In this article, a zero-augmented gamma random effects model for analyzing longitudinal data with many zeros was proposed. But the model was not applied to evaluate the effectiveness of a workplace risk assessment teams program, trialled within the cleaning services of a Western Australian public hospital.

...read moreread less

Abstract: In many occupational safety interventions, the objective is to reduce the injury incidence as well as the mean claims cost once injury has occurred. The claims cost data within a period typically contain a large proportion of zero observations (no claim). The distribution thus comprises a point mass at 0 mixed with a non-degenerate parametric component. Essentially, the likelihood function can be factorized into two orthogonal components. These two components relate respectively to the effect of covariates on the incidence of claims and the magnitude of claims, given that claims are made. Furthermore, the longitudinal nature of the intervention inherently imposes some correlation among the observations. This paper introduces a zero-augmented gamma random effects model for analysing longitudinal data with many zeros. Adopting the generalized linear mixed model (GLMM) approach reduces the original problem to the fitting of two independent GLMMs. The method is applied to evaluate the effectiveness of a workplace risk assessment teams program, trialled within the cleaning services of a Western Australian public hospital.

...read moreread less

17 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Survey on Transfer Learning

[...]

Sinno Jialin Pan¹, Qiang Yang¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Oct 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

...read moreread less

Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

...read moreread less

18,616 citations

Journal Article•DOI•

Business intelligence and analytics: from big data to big impact

[...]

Hsinchun Chen¹, Roger H. L. Chiang², Veda C. Storey³•Institutions (3)

University of Arizona¹, University of Cincinnati², J. Mack Robinson College of Business³

01 Dec 2012-Management Information Systems Quarterly

TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.

...read moreread less

Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

...read moreread less

4,610 citations

Journal Article•DOI•

Big Data: A Survey

[...]

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

2,303 citations

Journal Article•DOI•

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

[...]

Mikel Galar, Alberto Fernández¹, Edurne Barrenechea, Humberto Bustince, Francisco Herrera² - Show less +1 more•Institutions (2)

University of Jaén¹, University of Granada²

01 Jul 2012

TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.

...read moreread less

Abstract: Classifier learning with data-sets that suffer from imbalanced class distributions is a challenging problem in data mining community. This issue occurs when the number of examples that represent one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. In machine learning, the ensemble of classifiers are known to increase the accuracy of single classifiers by combining several of them, but neither of these learning techniques alone solve the class imbalance problem, to deal with this issue the ensemble learning algorithms have to be designed specifically. In this paper, our aim is to review the state of the art on ensemble techniques in the framework of imbalanced data-sets, with focus on two-class problems. We propose a taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based. In addition, we develop a thorough empirical comparison by the consideration of the most significant published approaches, within the families of the taxonomy proposed, to show whether any of them makes a difference. This comparison has shown the good behavior of the simplest approaches which combine random undersampling techniques with bagging or boosting ensembles. In addition, the positive synergy between sampling techniques and bagging has stood out. Furthermore, our results show empirically that ensemble-based algorithms are worthwhile since they outperform the mere use of preprocessing techniques before learning the classifier, therefore justifying the increase of complexity by means of a significant enhancement of the results.

...read moreread less

2,228 citations

Journal Article•DOI•

Modeling wine preferences by data mining from physicochemical properties

[...]

Paulo Cortez¹, António Cerdeira, Fernando Almeida, Telmo Matos, José Reis¹ - Show less +1 more•Institutions (1)

University of Minho¹

01 Nov 2009

TL;DR: A data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step is proposed, which is useful to support the oenologist wine tasting evaluations and improve wine production.

...read moreread less

Abstract: We propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A large dataset (when compared to other studies in this domain) is considered, with white and red vinho verde samples (from Portugal). Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful to support the oenologist wine tasting evaluations and improve wine production. Furthermore, similar techniques can help in target marketing by modeling consumer tastes from niche markets.

...read moreread less

1,121 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse