Author
Jianguo Chen
Other affiliations: University of Illinois at Chicago, University of Toronto, Agency for Science, Technology and Research
Bio: Jianguo Chen is an academic researcher from Hunan University. The author has contributed to research in topics: Cluster analysis & Computer science. The author has an hindex of 10, co-authored 24 publications receiving 625 citations. Previous affiliations of Jianguo Chen include University of Illinois at Chicago & University of Toronto.
Papers
More filters
TL;DR: In this paper, a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform is presented. And the PRF algorithm is optimized based on a hybrid approach combining dataparallel and task-parallel optimization, and a dual parallel approach is carried out in the training process of RF and a task Directed Acyclic Graph (DAG) is created according to the parallel training process.
Abstract: With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform. The PRF algorithm is optimized based on a hybrid approach combining data-parallel and task-parallel optimization. From the perspective of data-parallel optimization, a vertical data-partitioning method is performed to reduce the data communication cost effectively, and a data-multiplexing method is performed is performed to allow the training dataset to be reused and diminish the volume of data. From the perspective of task-parallel optimization, a dual parallel approach is carried out in the training process of RF, and a task Directed Acyclic Graph (DAG) is created according to the parallel training process of PRF and the dependence of the Resilient Distributed Datasets (RDD) objects. Then, different task schedulers are invoked for the tasks in the DAG. Moreover, to improve the algorithm's accuracy for large, high-dimensional, and noisy data, we perform a dimension-reduction approach in the training process and a weighted voting approach in the prediction process prior to parallelization. Extensive experimental results indicate the superiority and notable advantages of the PRF algorithm over the relevant algorithms implemented by Spark MLlib and other studies in terms of the classification accuracy, performance, and scalability. With the expansion of the scale of the random forest model and the Spark cluster, the advantage of the PRF algorithm is more obvious.
308 citations
TL;DR: Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.
Abstract: In this paper, we propose a Distributed Intelligent Video Surveillance (DIVS) system using Deep Learning (DL) algorithms and deploy it in an edge computing environment. We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The DIVS system can migrate computing workloads from the network center to network edges to reduce huge network communication overhead and provide low-latency and accurate video analysis solutions. We implement the proposed DIVS system and address the problems of parallel training, model synchronization, and workload balancing. Task-level parallel and model-level parallel training methods are proposed to further accelerate the video analysis process. In addition, we propose a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Moreover, a dynamic data migration approach is proposed to address the imbalance of workload and computational power of edge nodes. Experimental results showed that the EC architecture can provide elastic and scalable computing power, and the proposed DIVS system can efficiently handle video surveillance and analysis tasks.
164 citations
TL;DR: To effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering and the proposed Disease Diagnosis and Treatment Recommendation System (DDTRS) derives disease treatment recommendations intelligently and accurately.
Abstract: It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses and treatments between different departments and hospitals. Thus, when new diseases occur with atypical symptoms, inexperienced doctors might have difficulty in identifying them promptly and accurately. Therefore, to maximize the utilization of the advanced medical technology of developed hospitals and the rich medical knowledge of experienced doctors, a Disease Diagnosis and Treatment Recommendation System (DDTRS) is proposed in this paper. First, to effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering. In addition, association analyses on Disease-Diagnosis (D-D) rules and Disease-Treatment (D-T) rules are conducted by the Apriori algorithm separately. The appropriate diagnosis and treatment schemes are recommended for patients and inexperienced doctors, even if they are in a limited therapeutic environment. Moreover, to reach the goals of high performance and low latency response, we implement a parallel solution for DDTRS using the Apache Spark cloud platform. Extensive experimental results demonstrate that the proposed DDTRS realizes disease-symptom clustering effectively and derives disease treatment recommendations intelligently and accurately.
107 citations
TL;DR: The proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy and introduces task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths.
Abstract: Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
101 citations
Posted Content•
TL;DR: This survey presents medical and AI researchers with a comprehensive view of the existing and potential applications of AI technology in combating CO VID-19 with the goal of inspiring researches to continue to maximize the advantages of AI and big data to fight COVID-19.
Abstract: The COVID-19 pandemic caused by the SARS-CoV-2 virus has spread rapidly worldwide, leading to a global outbreak. Most governments, enterprises, and scientific research institutions are participating in the COVID-19 struggle to curb the spread of the pandemic. As a powerful tool against COVID-19, artificial intelligence (AI) technologies are widely used in combating this pandemic. In this survey, we investigate the main scope and contributions of AI in combating COVID-19 from the aspects of disease detection and diagnosis, virology and pathogenesis, drug and vaccine development, and epidemic and transmission prediction. In addition, we summarize the available data and resources that can be used for AI-based COVID-19 research. Finally, the main challenges and potential directions of AI in fighting against COVID-19 are discussed. Currently, AI mainly focuses on medical image inspection, genomics, drug development, and transmission prediction, and thus AI still has great potential in this field. This survey presents medical and AI researchers with a comprehensive view of the existing and potential applications of AI technology in combating COVID-19 with the goal of inspiring researches to continue to maximize the advantages of AI and big data to fight COVID-19.
63 citations
Cited by
More filters
TL;DR: This survey investigates some of the work that has been done to enable the integrated blockchain and edge computing system and discusses the research challenges, identifying several vital aspects of the integration of blockchain andEdge computing: motivations, frameworks, enabling functionalities, and challenges.
Abstract: Blockchain, as the underlying technology of crypto-currencies, has attracted significant attention. It has been adopted in numerous applications, such as smart grid and Internet-of-Things. However, there is a significant scalability barrier for blockchain, which limits its ability to support services with frequent transactions. On the other side, edge computing is introduced to extend the cloud resources and services to be distributed at the edge of the network, but currently faces challenges in its decentralized management and security. The integration of blockchain and edge computing into one system can enable reliable access and control of the network, storage, and computation distributed at the edges, hence providing a large scale of network servers, data storage, and validity computation near the end in a secure manner. Despite the prospect of integrated blockchain and edge computing systems, its scalability enhancement, self organization, functions integration, resource management, and new security issues remain to be addressed before widespread deployment. In this survey, we investigate some of the work that has been done to enable the integrated blockchain and edge computing system and discuss the research challenges. We identify several vital aspects of the integration of blockchain and edge computing: motivations, frameworks, enabling functionalities, and challenges. Finally, some broader perspectives are explored.
488 citations
11 Aug 2020
TL;DR: Fangcang shelter hospitals are a novel public health concept that served to isolate patients with mild to moderate COVID-19 from their families and communities, while providing medical care, disease monitoring, food, shelter, and social activities.
Abstract: Fangcang shelter hospitals are a novel public health concept. They were implemented for the first time in China in February, 2020, to tackle the coronavirus disease 2019 (COVID-19) outbreak. The Fangcang shelter hospitals in China were large-scale, temporary hospitals, rapidly built by converting existing public venues, such as stadiums and exhibition centres, into health-care facilities. They served to isolate patients with mild to moderate COVID-19 from their families and communities, while providing medical care, disease monitoring, food, shelter, and social activities. We document the development of Fangcang shelter hospitals during the COVID-19 outbreak in China and explain their three key characteristics (rapid construction, massive scale, and low cost) and five essential functions (isolation, triage, basic medical care, frequent monitoring and rapid referral, and essential living and social engagement). Fangcang shelter hospitals could be powerful components of national responses to the COVID-19 pandemic, as well as future epidemics and public health emergencies.
367 citations
Journal Article•
TL;DR: An influenza epidemic simulation model was adapted to estimate the likelihood of human-to-human transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in a simulated Singaporean population and the combined intervention was the most effective, reducing the estimated median number of infections.
Abstract: Summary Background Since the coronavirus disease 2019 outbreak began in the Chinese city of Wuhan on Dec 31, 2019, 68 imported cases and 175 locally acquired infections have been reported in Singapore. We aimed to investigate options for early intervention in Singapore should local containment (eg, preventing disease spread through contact tracing efforts) be unsuccessful. Methods We adapted an influenza epidemic simulation model to estimate the likelihood of human-to-human transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in a simulated Singaporean population. Using this model, we estimated the cumulative number of SARS-CoV-2 infections at 80 days, after detection of 100 cases of community transmission, under three infectivity scenarios (basic reproduction number [R0] of 1·5, 2·0, or 2·5) and assuming 7·5% of infections are asymptomatic. We first ran the model assuming no intervention was in place (baseline scenario), and then assessed the effect of four intervention scenarios compared with a baseline scenario on the size and progression of the outbreak for each R0 value. These scenarios included isolation measures for infected individuals and quarantining of family members (hereafter referred to as quarantine); quarantine plus school closure; quarantine plus workplace distancing; and quarantine, school closure, and workplace distancing (hereafter referred to as the combined intervention). We also did sensitivity analyses by altering the asymptomatic fraction of infections (22·7%, 30·0%, 40·0%, and 50·0%) to compare outbreak sizes under the same control measures. Findings For the baseline scenario, when R0 was 1·5, the median cumulative number of infections at day 80 was 279 000 (IQR 245 000–320 000), corresponding to 7·4% (IQR 6·5–8·5) of the resident population of Singapore. The median number of infections increased with higher infectivity: 727 000 cases (670 000–776 000) when R0 was 2·0, corresponding to 19·3% (17·8–20·6) of the Singaporean population, and 1 207 000 cases (1 164 000–1 249 000) when R0 was 2·5, corresponding to 32% (30·9–33·1) of the Singaporean population. Compared with the baseline scenario, the combined intervention was the most effective, reducing the estimated median number of infections by 99·3% (IQR 92·6–99·9) when R0 was 1·5, by 93·0% (81·5–99·7) when R0 was 2·0, and by 78·2% (59·0 −94·4) when R0 was 2·5. Assuming increasing asymptomatic fractions up to 50·0%, up to 277 000 infections were estimated to occur at day 80 with the combined intervention relative to 1800 for the baseline at R0 of 1·5. Interpretation Implementing the combined intervention of quarantining infected individuals and their family members, workplace distancing, and school closure once community transmission has been detected could substantially reduce the number of SARS-CoV-2 infections. We therefore recommend immediate deployment of this strategy if local secondary transmission is confirmed within Singapore. However, quarantine and workplace distancing should be prioritised over school closure because at this early stage, symptomatic children have higher withdrawal rates from school than do symptomatic adults from work. At higher asymptomatic proportions, intervention effectiveness might be substantially reduced requiring the need for effective case management and treatments, and preventive measures such as vaccines. Funding Singapore Ministry of Health, Singapore Population Health Improvement Centre.
317 citations
TL;DR: Compared to other 7 models, decision tree (DT), random forest (RF) and deep cascade forest (DCF) trained by data sets of pH, DO, CODMn, and NH3-N had significantly better performance in prediction of all 6 Levels of water quality recommended by Chinese government.
Abstract: The water quality prediction performance of machine learning models may be not only dependent on the models, but also dependent on the parameters in data set chosen for training the learning models. Moreover, the key water parameters should also be identified by the learning models, in order to further reduce prediction costs and improve prediction efficiency. Here we endeavored for the first time to compare the water quality prediction performance of 10 learning models (7 traditional and 3 ensemble models) using big data (33,612 observations) from the major rivers and lakes in China from 2012 to 2018, based on the precision, recall, F1-score, weighted F1-score, and explore the potential key water parameters for future model prediction. Our results showed that the bigger data could improve the performance of learning models in prediction of water quality. Compared to other 7 models, decision tree (DT), random forest (RF) and deep cascade forest (DCF) trained by data sets of pH, DO, CODMn, and NH3-N had significantly better performance in prediction of all 6 Levels of water quality recommended by Chinese government. Moreover, two key water parameter sets (DO, CODMn, and NH3-N; CODMn, and NH3-N) were identified and validated by DT, RF and DCF to be high specificities for perdition water quality. Therefore, DT, RF and DCF with selected key water parameters could be prioritized for future water quality monitoring and providing timely water quality warning.
208 citations
TL;DR: This survey covers the main steps of deep learning-based BTC methods, including preprocessing, features extraction, and classification, along with their achievements and limitations, and investigates the state-of-the-art convolutional neural network models for BTC by performing extensive experiments using transfer learning with and without data augmentation.
Abstract: Brain tumor is one of the most dangerous cancers in people of all ages, and its grade recognition is a challenging problem for radiologists in health monitoring and automated diagnosis. Recently, numerous methods based on deep learning have been presented in the literature for brain tumor classification (BTC) in order to assist radiologists for a better diagnostic analysis. In this overview, we present an in-depth review of the surveys published so far and recent deep learning-based methods for BTC. Our survey covers the main steps of deep learning-based BTC methods, including preprocessing, features extraction, and classification, along with their achievements and limitations. We also investigate the state-of-the-art convolutional neural network models for BTC by performing extensive experiments using transfer learning with and without data augmentation. Furthermore, this overview describes available benchmark data sets used for the evaluation of BTC. Finally, this survey does not only look into the past literature on the topic but also steps on it to delve into the future of this area and enumerates some research directions that should be followed in the future, especially for personalized and smart healthcare.
188 citations