scispace - formally typeset
Search or ask a question
Author

Min Zhu

Bio: Min Zhu is an academic researcher from Zhejiang University. The author has contributed to research in topics: Random forest & Sample size determination. The author has an hindex of 3, co-authored 5 publications receiving 93 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed method enhanced the overall performance of the classifier while producing high accuracy in identifying both majority and minority class.
Abstract: The classification in class imbalanced data has drawn significant interest in medical application. Most existing methods are prone to categorize the samples into the majority class, resulting in bias, in particular the insufficient identification of minority class. A kind of novel approach, class weights random forest is introduced to address the problem, by assigning individual weights for each class instead of a single weight. The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed method enhanced the overall performance of the classifier while producing high accuracy in identifying both majority and minority class.

128 citations

Journal ArticleDOI
Min Zhu1, Jing Xia1, Molei Yan, Guolong Cai, Jing Yan, Gangmin Ning1 
TL;DR: The results show that, by applying INGA, the feature dimensionality of datasets was reduced from 77 to 10 and that the model achieved an accuracy of 92% in predicting 28-day death in sepsis patients, which is significantly higher than other methods.
Abstract: With the development of medical technology, more and more parameters are produced to describe the human physiological condition, forming high-dimensional clinical datasets. In clinical analysis, data are commonly utilized to establish mathematical models and carry out classification. High-dimensional clinical data will increase the complexity of classification, which is often utilized in the models, and thus reduce efficiency. The Niche Genetic Algorithm (NGA) is an excellent algorithm for dimensionality reduction. However, in the conventional NGA, the niche distance parameter is set in advance, which prevents it from adjusting to the environment. In this paper, an Improved Niche Genetic Algorithm (INGA) is introduced. It employs a self-adaptive niche-culling operation in the construction of the niche environment to improve the population diversity and prevent local optimal solutions. The INGA was verified in a stratification model for sepsis patients. The results show that, by applying INGA, the feature dimensionality of datasets was reduced from 77 to 10 and that the model achieved an accuracy of 92% in predicting 28-day death in sepsis patients, which is significantly higher than other methods.

20 citations

Journal ArticleDOI
Jing Xia1, Su Pan1, Min Zhu1, Guolong Cai, Molei Yan, Qun Su1, Jing Yan, Gangmin Ning1 
TL;DR: The results demonstrate that the eLSTM is capable of dynamically predicting the mortality of patients in complex clinical situations.
Abstract: In intensive care unit (ICU), it is essential to predict the mortality of patients and mathematical models aid in improving the prognosis accuracy. Recently, recurrent neural network (RNN), especially long short-term memory (LSTM) network, showed advantages in sequential modeling and was promising for clinical prediction. However, ICU data are highly complex due to the diverse patterns of diseases; therefore, instead of single LSTM model, an ensemble algorithm of LSTM (eLSTM) is proposed, utilizing the superiority of the ensemble framework to handle the diversity of clinical data. The eLSTM algorithm was evaluated by the acknowledged database of ICU admissions Medical Information Mart for Intensive Care III (MIMIC-III). The investigation in total of 18415 cases shows that compared with clinical scoring systems SAPS II, SOFA, and APACHE II, random forests classification algorithm, and the single LSTM classifier, the eLSTM model achieved the superior performance with the largest value of area under the receiver operating characteristic curve (AUROC) of 0.8451 and the largest area under the precision-recall curve (AUPRC) of 0.4862. Furthermore, it offered an early prognosis of ICU patients. The results demonstrate that the eLSTM is capable of dynamically predicting the mortality of patients in complex clinical situations.

18 citations

Journal ArticleDOI
01 Nov 2014
TL;DR: For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling to reduce model generalization error, and improve accuracy of prediction.
Abstract: Traditional random forest algorithm is difficult to achieve very good effect for the classification of small sample data set. Because in the process of repeated random selection, selection sample is little, resulting in trees with very small degree of difference, which floods right decisions, makes bigger generalization error of the model, and the predict rate is reduced. For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling. Eventually reduce model generalization error, and improve accuracy of prediction.

2 citations

Book ChapterDOI
Jing Xia1, Min Zhu1, Shengyu Zhang1, Molei Yan, Guolong Cai, Jing Yan, Gangmin Ning1 
01 Jan 2015
TL;DR: Preliminary results exhibited that the established model is potential to help improve the patients’ management by quickly stratifying the sepsis severity and is superior to the conventional APACHE scoring method.
Abstract: Sepsis is a kind of systemic inflammatory response syndrome caused by infection and it endangers the life of patients seriously due to its rapid development progression and high mortality rate. In clinic it is highly demanded to quantitatively stratify the severity of sepsis for individual management. This work aimed to build a quantitative model for sepsis patients which can stratify the disease severity in three levels. For this purpose, clinical data were collected and preprocessed, i.e. screening, normalization and data replenishing. Afterwards, sepsis sensitive parameters were tested and selected, which were utilized as the input of the stratification model. For the model, the algorithm of Support Vector Machine was applied. Eventually, the model was tested in total of 522 clinical cases and an accuracy of 67.5% in stratification was achieved. The performance of the established model is superior to the conventional APACHE scoring method. Preliminary results exhibited that the established model is potential to help improve the patients’ management by quickly stratifying the sepsis severity.

Cited by
More filters
Journal ArticleDOI
10 Jul 2018-Sensors
TL;DR: A deep neural network model that integrates the CNN and LSTM architectures is developed, and through historical data such as cumulated hours of rain, cumulated wind speed and PM2.5 concentration, the forecasting accuracy of the proposed CNN-LSTM model (APNet) is verified to be the highest in this paper.
Abstract: In modern society, air pollution is an important topic as this pollution exerts a critically bad influence on human health and the environment. Among air pollutants, Particulate Matter (PM2.5) consists of suspended particles with a diameter equal to or less than 2.5 μm. Sources of PM2.5 can be coal-fired power generation, smoke, or dusts. These suspended particles in the air can damage the respiratory and cardiovascular systems of the human body, which may further lead to other diseases such as asthma, lung cancer, or cardiovascular diseases. To monitor and estimate the PM2.5 concentration, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are combined and applied to the PM2.5 forecasting system. To compare the overall performance of each algorithm, four measurement indexes, Mean Absolute Error (MAE), Root Mean Square Error (RMSE) Pearson correlation coefficient and Index of Agreement (IA) are applied to the experiments in this paper. Compared with other machine learning methods, the experimental results showed that the forecasting accuracy of the proposed CNN-LSTM model (APNet) is verified to be the highest in this paper. For the CNN-LSTM model, its feasibility and practicability to forecast the PM2.5 concentration are also verified in this paper. The main contribution of this paper is to develop a deep neural network model that integrates the CNN and LSTM architectures, and through historical data such as cumulated hours of rain, cumulated wind speed and PM2.5 concentration. In the future, this study can also be applied to the prevention and control of PM2.5.

426 citations

Journal Article
TL;DR: Patients with vertebral, hip, distal radius, and proximal humerus fractures are most common among the osteoporosis-related fractures.
Abstract: Patients with vertebral, hip, distal radius, and proximal humerus fractures are most common among the osteoporosis-related fractures. The incidences of these fractures increase with age, however, the increase patterns differ between the fracture sites. The prevalence of vertebral fracture for Japanese is similar or slightly higher and the incidences of osteoporosis-related limb fractures are lower than those for Caucacians. A decrease in prevalence of vertebral fractures and an increase in the incidence of limb fractures are the secular trend in Japan. Previous fractures are significant risk factor for both vertebral and hip fractures. Greater physical activity increases the risk of distal radius fractures, and decreases the risk of proximal humerus fractures.

364 citations

Journal ArticleDOI
TL;DR: Experimental results show that compared with other traditional machine learning methods, the prediction performance of the estimating model proposed in this paper is proven to be the best and the feasibility and practicality of electricity price prediction is confirmed.
Abstract: Electricity price is a key influencer in the electricity market. Electricity market trades by each participant are based on electricity price. The electricity price adjusted with the change in supply and demand relationship can reflect the real value of electricity in the transaction process. However, for the power generating party, bidding strategy determines the level of profit, and the accurate prediction of electricity price could make it possible to determine a more accurate bidding price. This cannot only reduce transaction risk, but also seize opportunities in the electricity market. In order to effectively estimate electricity price, this paper proposes an electricity price forecasting system based on the combination of 2 deep neural networks, the Convolutional Neural Network (CNN) and the Long Short Term Memory (LSTM). In order to compare the overall performance of each algorithm, the Mean Absolute Error (MAE) and Root-Mean-Square error (RMSE) evaluating measures were applied in the experiments of this paper. Experiment results show that compared with other traditional machine learning methods, the prediction performance of the estimating model proposed in this paper is proven to be the best. By combining the CNN and LSTM models, the feasibility and practicality of electricity price prediction is also confirmed in this paper.

131 citations

Journal ArticleDOI
TL;DR: This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies.

128 citations

Journal ArticleDOI
TL;DR: The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed method enhanced the overall performance of the classifier while producing high accuracy in identifying both majority and minority class.
Abstract: The classification in class imbalanced data has drawn significant interest in medical application. Most existing methods are prone to categorize the samples into the majority class, resulting in bias, in particular the insufficient identification of minority class. A kind of novel approach, class weights random forest is introduced to address the problem, by assigning individual weights for each class instead of a single weight. The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed method enhanced the overall performance of the classifier while producing high accuracy in identifying both majority and minority class.

128 citations