scispace - formally typeset
Search or ask a question
Author

Simon Fong

Bio: Simon Fong is an academic researcher from University of Macau. The author has contributed to research in topics: Data stream mining & Metaheuristic. The author has an hindex of 37, co-authored 523 publications receiving 6161 citations. Previous affiliations of Simon Fong include Nanyang Technological University & La Trobe University.


Papers
More filters
Proceedings ArticleDOI
15 May 2014
TL;DR: Different variations of DBSCAN algorithms that were proposed so far are surveyed and critically evaluated and their limitations are also listed.
Abstract: Data Mining is all about data analysis techniques. It is useful for extracting hidden and interesting patterns from large datasets. Clustering techniques are important when it comes to extracting knowledge from large amount of spatial data collected from various applications including GIS, satellite images, X-ray crystallography, remote sensing and environmental assessment and planning etc. To extract useful pattern from these complex data sources several popular spatial data clustering techniques have been proposed. DBSCAN (Density Based Spatial Clustering of Applications with Noise) is a pioneer density based algorithm. It can discover clusters of any arbitrary shape and size in databases containing even noise and outliers. DBSCAN however are known to have a number of problems such as: (a) it requires user's input to specify parameter values for executing the algorithm; (b) it is prone to dilemma in deciding meaningful clusters from datasets with varying densities; (c) and it incurs certain computational complexity. Many researchers attempted to enhance the basic DBSCAN algorithm, in order to overcome these drawbacks, such as VDBSCAN, FDBSCAN, DD_DBSCAN, and IDBSCAN. In this study, we survey over different variations of DBSCAN algorithms that were proposed so far. These variations are critically evaluated and their limitations are also listed.

244 citations

Journal ArticleDOI
TL;DR: How AI assists cancer diagnosis and prognosis is explored, specifically with regard to its unprecedented accuracy, which is even higher than that of general statistical applications in oncology.

235 citations

Book ChapterDOI
01 Jan 2014
TL;DR: Oversampling and undersampling are found to work well in improving the classification for the imbalanced dataset using decision tree, while boosting and bagging did not improve the Decision Tree performance.
Abstract: Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1 = Died, 0 = Alive). The sample size is 4976 cases with 4.2 % (Died) and 95.8 % (Alive) cases. CART, C5 and CHAID were chosen as the classifiers. In classification problems, the accuracy rate of the predictive model is not an appropriate measure when there is imbalanced problem due to the fact that it will be biased towards the majority class. Thus, the performance of the classifier is measured using sensitivity and precision Oversampling and undersampling are found to work well in improving the classification for the imbalanced dataset using decision tree. Meanwhile, boosting and bagging did not improve the Decision Tree performance.

215 citations

Journal ArticleDOI
TL;DR: The paper concludes that such mass-market health monitoring systems will only be prevalent when implemented together with home environmental monitoring and control systems.
Abstract: Wireless technology development has increased rapidly due to it’s convenience and cost effectiveness compared to wired applications, particularly considering the advantages offered by Wireless Sensor Network (WSN) based applications. Such applications exist in several domains including healthcare, medical, industrial and home automation. In the present study, a home-based wireless ECG monitoring system using Zigbee technology is considered. Such systems can be useful for monitoring people in their own home as well as for periodic monitoring by physicians for appropriate healthcare, allowing people to live in their home for longer. Health monitoring systems can continuously monitor many physiological signals and offer further analysis and interpretation. The characteristics and drawbacks of these systems may affect the wearer’s mobility during monitoring the vital signs. Real-time monitoring systems record, measure, and monitor the heart electrical activity while maintaining the consumer’s comfort. Zigbee devices can offer low-power, small size, and a low-cost suitable solution for monitoring the ECG signal in the home, but such systems are often designed in isolation, with no consideration of existing home control networks and smart home solutions. The present study offers a state of the art review and then introduces the main concepts and contents of the wireless ECG monitoring systems. In addition, models of the ECG signal and the power consumption formulas are highlighted. Challenges and future perspectives are also reported. The paper concludes that such mass-market health monitoring systems will only be prevalent when implemented together with home environmental monitoring and control systems.

209 citations

Book ChapterDOI
11 Jul 2011
TL;DR: In this article, a combination of a recently developed Accelerated PSO and a nonlinear support vector machine (SVM) is used for solving business optimization problems, and the proposed SVM is applied to production optimization, and then used for income prediction and project scheduling.
Abstract: Business optimization is becoming increasingly important because all business activities aim to maximize the profit and performance of products and services, under limited resources and appropriate constraints. Recent developments in support vector machine and metaheuristics show many advantages of these techniques. In particular, particle swarm optimization is now widely used in solving tough optimization problems. In this paper, we use a combination of a recently developed Accelerated PSO and a nonlinear support vector machine to form a framework for solving business optimization problems. We first apply the proposed APSO-SVM to production optimization, and then use it for income prediction and project scheduling. We also carry out some parametric studies and discuss the advantages of the proposed metaheuristic SVM.

209 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2002

9,314 citations

Book ChapterDOI
01 Jan 2010

5,842 citations