Jimmy Xiangji Huang
Bio: Jimmy Xiangji Huang is an academic researcher from York University. The author has contributed to research in topics: Ranking (information retrieval) & Computer science. The author has an hindex of 23, co-authored 150 publications receiving 2199 citations. Previous affiliations of Jimmy Xiangji Huang include Keele University & Central China Normal University.
TL;DR: Experiments show that CSVAC (Combining Support Vectors with Ant Colony) outperforms SVM alone or CSOACN alone in terms of both classification rate and run-time efficiency.
Abstract: In this paper, we introduce a new machine-learning-based data classification algorithm that is applied to network intrusion detection. The basic task is to classify network activities (in the network log as connection records) as normal or abnormal while minimizing misclassification. Although different classification models have been developed for network intrusion detection, each of them has its strengths and weaknesses, including the most commonly applied Support Vector Machine (SVM) method and the Clustering based on Self-Organized Ant Colony Network (CSOACN). Our new approach combines the SVM method with CSOACNs to take the advantages of both while avoiding their weaknesses. Our algorithm is implemented and evaluated using a standard benchmark KDD99 data set. Experiments show that CSVAC (Combining Support Vectors with Ant Colony) outperforms SVM alone or CSOACN alone in terms of both classification rate and run-time efficiency.
01 Dec 2016
TL;DR: This study proposes a new and robust machine learning model based on a convolutional neural network (CNN) to automatically classify single cells in thin blood smears on standard microscope slides as either infected or uninfected.
Abstract: Malaria is a major global health threat. The standard way of diagnosing malaria is by visually examining blood smears for parasite-infected red blood cells under the microscope by qualified technicians. This method is inefficient and the diagnosis depends on the experience and the knowledge of the person doing the examination. Automatic image recognition technologies based on machine learning have been applied to malaria blood smears for diagnosis before. However, the practical performance has not been sufficient so far. This study proposes a new and robust machine learning model based on a convolutional neural network (CNN) to automatically classify single cells in thin blood smears on standard microscope slides as either infected or uninfected. In a ten-fold cross-validation based on 27,578 single cell images, the average accuracy of our new 16-layer CNN model is 97.37%. A transfer learning model only achieves 91.99% on the same images. The CNN model shows superiority over the transfer learning model in all performance indicators such as sensitivity (96.99% vs 89.00%), specificity (97.75% vs 94.98%), precision (97.73% vs 95.12%), F1 score (97.36% vs 90.24%), and Matthews correlation coefficient (94.75% vs 85.25%).
••01 Nov 2014
TL;DR: The experimental results indicate that the proposed deep model is able to reveal previously unknown concepts and performs much better than the conventional shallow models.
Abstract: Computer aid technology is widely applied in decision-making and outcome assessment of healthcare delivery, in which modeling knowledge and expert experience is technically important. However, the conventional rule-based models are incapable of capturing the underlying knowledge because they are incapable of simulating the complexity of human brains and highly rely on feature representation of problem domains. Thus we attempt to apply a deep model to overcome this weakness. The deep model can simulate the thinking procedure of human and combine feature representation and learning in a unified model. A modified version of convolutional deep belief networks is used as an effective training method for large-scale data sets. Then it is tested by two instances: a dataset on hypertension retrieved from a HIS system, and a dataset on Chinese medical diagnosis and treatment prescription from a manual converted electronic medical record (EMR) database. The experimental results indicate that the proposed deep model is able to reveal previously unknown concepts and performs much better than the conventional shallow models.
TL;DR: It is shown that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed, and an integrated sampling technique is proposed that outperforms individual SVMs as well as several other state-of-the-art classifiers.
Abstract: Learning from imbalanced datasets is difficult. The insufficient information that is associated with the minority class impedes making a clear understanding of the inherent structure of the dataset. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique, which incorporates both over-sampling and under-sampling, with an ensemble of SVMs to improve the prediction performance. Extensive experiments show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.
TL;DR: A new Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN) model is proposed for aspect-level sentiment classification, which leverages the syntactic dependency tree and commonsense knowledge via GCN to enhance the representation of the sentence toward the given aspect.
Abstract: Aspect-level sentiment classification is a fundamental subtask of fine-grained sentiment analysis. The syntactic information and commonsense knowledge are important and useful for aspect-level sentiment classification, while only a limited number of studies have explored to incorporate them via flexible graph convolutional neural networks (GCN) for this task. In this paper, we propose a new Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN) model for aspect-level sentiment classification, which leverages the syntactic dependency tree and commonsense knowledge via GCN. In particular, to enhance the representation of the sentence toward the given aspect, we develop two strategies to model the syntactic dependency tree and commonsense knowledge graph, namely SK-GCN 1 and SK-GCN 2 respectively. SK-GCN 1 models the dependency tree and knowledge graph via Syntax-based GCN (S-GCN) and Knowledge-based GCN (K-GCN) independently, and SK-GCN 2 models them jointly. We also apply pre-trained BERT to this task and obtain new state-of-the-art results. Extensive experiments on five benchmark datasets demonstrate that our approach can effectively improve the performance of aspect-level sentiment classification compared with the state-of-the-art methods.
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These
TL;DR: It is suggested that deep learning approaches could be the vehicle for translating big biomedical data into improved human health and develop holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.
Abstract: Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.
••01 Jan 2017
TL;DR: A comprehensive up-to-date review of research employing deep learning in health informatics is presented, providing a critical analysis of the relative merit, and potential pitfalls of the technique as well as its future outlook.
Abstract: With a massive influx of multimodality data, the role of data analytics in health informatics has grown rapidly in the last decade. This has also prompted increasing interests in the generation of analytical, data driven models based on machine learning in health informatics. Deep learning, a technique with its foundation in artificial neural networks, is emerging in recent years as a powerful tool for machine learning, promising to reshape the future of artificial intelligence. Rapid improvements in computational power, fast data storage, and parallelization have also contributed to the rapid uptake of the technology in addition to its predictive power and ability to generate automatically optimized high-level features and semantic interpretation from the input data. This article presents a comprehensive up-to-date review of research employing deep learning in health informatics, providing a critical analysis of the relative merit, and potential pitfalls of the technique as well as its future outlook. The paper mainly focuses on key applications of deep learning in the fields of translational bioinformatics, medical imaging, pervasive sensing, medical informatics, and public health.