scispace - formally typeset
Open accessJournal ArticleDOI: 10.1007/S13369-020-05212-Z

Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm

04 Mar 2021-Arabian Journal for Science and Engineering (Springer Berlin Heidelberg)-Vol. 46, Iss: 9, pp 1-12
Abstract: Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular KNN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel KNN variant (KNNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The KNNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The KNNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three KNN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field.

... read more

Citations
  More

9 results found


Open accessJournal ArticleDOI: 10.1016/J.ASOC.2020.106906
Abstract: COVID-19, as an infectious disease, has shocked the world and still threatens the lives of billions of people. Recently, the detection of coronavirus (COVID-19) is a critical task for the medical practitioner. Unfortunately, COVID-19 spreads so quickly between people and approaches millions of people worldwide in few months. It is very much essential to quickly and accurately identify the infected people so that prevention of spread can be taken. Although several medical tests have been used to detect certain injuries, the hopefully detection efficiency has not been accomplished yet. In this paper, a new Hybrid Diagnose Strategy (HDS) has been introduced. HDS relies on a novel technique for ranking selected features by projecting them into a proposed Patient Space (PS). A Feature Connectivity Graph (FCG) is constructed which indicates both the weight of each feature as well as the binding degree to other features. The rank of a feature is determined based on two factors; the first is the feature weight, while the second is its binding degree to its neighbors in PS. Then, the ranked features are used to derive the classification model that can classify new persons to decide whether they are infected or not. The classification model is a hybrid model that consists of two classifiers; fuzzy inference engine and Deep Neural Network (DNN). The proposed HDS has been compared against recent techniques. Experimental results have shown that the proposed HDS outperforms the other competitors in terms of the average value of accuracy, precision, recall, and F-measure in which it provides about of 97.658%, 96.756%, 96.55%, and 96.615% respectively. Additionally, HDS provides the lowest error value of 2.342%. Further, the results were validated statistically using Wilcoxon Signed Rank Test and Friedman Test.

... read more

Topics: Feature selection (51%)

13 Citations


Open accessJournal ArticleDOI: 10.1109/JSEN.2021.3061178
Yuzhen Chen1, Menghan Hu1, Chunjun Hua1, Guangtao Zhai2  +3 moreInstitutions (3)
Abstract: Coronavirus Disease 2019 (COVID-19) has spread all over the world since it broke out massively in December 2019, which has caused a large loss to the whole world. Both the confirmed cases and death cases have reached a relatively frightening number. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of COVID-19, can be transmitted by small respiratory droplets. To curb its spread at the source, wearing masks is a convenient and effective measure. In most cases, people use face masks in a high-frequent but short-time way. Aimed at solving the problem that we do not know which service stage of the mask belongs to, we propose a detection system based on the mobile phone. We first extract four features from the gray level co-occurrence matrixes (GLCMs) of the face mask’s micro-photos. Next, a three-result detection system is accomplished by using K Nearest Neighbor (KNN) algorithm. The results of validation experiments show that our system can reach an accuracy of 82.87% (measured by macro-measures) on the testing dataset. The precision of Type I ‘normal use’ and the recall of type III ‘not recommended’ reach 92.00% and 92.59%. In future work, we plan to expand the detection objects to more mask types. This work demonstrates that the proposed mobile microscope system can be used as an assistant for face mask being used, which may play a positive role in fighting against COVID-19.

... read more

10 Citations


Open accessPosted Content
Yuzhen Chen1, Menghan Hu1, Chunjun Hua1, Guangtao Zhai2  +3 moreInstitutions (3)
Abstract: Coronavirus Disease 2019 (COVID-19) has spread all over the world since it broke out massively in December 2019, which has caused a large loss to the whole world. Both the confirmed cases and death cases have reached a relatively frightening number. Syndrome coronaviruses 2 (SARS-CoV-2), the cause of COVID-19, can be transmitted by small respiratory droplets. To curb its spread at the source, wearing masks is a convenient and effective measure. In most cases, people use face masks in a high-frequent but short-time way. Aimed at solving the problem that we don't know which service stage of the mask belongs to, we propose a detection system based on the mobile phone. We first extract four features from the GLCMs of the face mask's micro-photos. Next, a three-result detection system is accomplished by using KNN algorithm. The results of validation experiments show that our system can reach a precision of 82.87% (standard deviation=8.5%) on the testing dataset. In future work, we plan to expand the detection objects to more mask types. This work demonstrates that the proposed mobile microscope system can be used as an assistant for face mask being used, which may play a positive role in fighting against COVID-19.

... read more

2 Citations


Open accessJournal ArticleDOI: 10.1007/S11356-021-15902-2
shuwei jia1, Yao Li1, Tianhui Fang2Institutions (2)
Abstract: The COVID-19 pandemic now affects the entire world and has many major effects on the global economy, environment, health, and society. Focusing on the harm COVID-19 poses for human health and society, this study used system dynamics to establish a prevention and control model that combines material supply, public opinion dissemination, public awareness, scientific and technological research, staggered work shifts, and the warning effect (of law/policy). Causal loop analysis was used to identify interactions between subsystems and explore the key factors affecting social benefit. Further, different scenarios were dynamically simulated to explore optimal combination modes. The main findings were as follows: (1) The low supervision mode will produce a lag effect and superimposed effect on material supply and impede social benefit. (2) The strong supervision mode has multiple performances; it can reduce online public opinion dissemination and the rate of concealment and false declaration and improve government credibility and social benefit. However, a fading effect will appear in the middle and late periods, and over time, the effect of strong supervision will gradually weaken (but occasionally rebound) and thus require adjustment. These findings can provide a theoretical basis for improving epidemic prevention and control measures.

... read more

Topics: Causal loop diagram (52%), Public opinion (50%)

1 Citations


Open accessDOI: 10.1007/978-3-030-90421-0_1
Elif Kartal1Institutions (1)
01 Jan 2022-
Abstract: In this study, it is aimed to improve the Covid-19 predictions in terms of the distinction between Covid-19 and Flu by using several well-known ensemble learning methods namely, majority voting, bagging, boosting, and stacking. In this scope, the performance of base machine learning models was compared with the ensemble ones (majority voting, C5.0, stochastic gradient boosting, bagged CART, random forest, and stacking models) on a public Covid-19 dataset in which observations are labelled as Covid-19 and Flu. Since the task belongs to a classification problem, supervised machine learning algorithms (logistic regression (via generalized linear model), classification and regression trees, artificial neural networks, and support vector machines) are used as base learners. The Cross-Industry Standard Process Model for Data Mining, which is consisted of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment, is used as the study method. In the model performance evaluation stage, an additional metric was proposed by considering the accuracy and its change interval (max-min). The performance of the models was discussed in terms of accuracy and the proposed metric. A Shiny application is developed by using the best performing model. The application enables users to predict Covid-19 status through a web interface and to use it interactively. Analyses are performed with R and RStudio.

... read more

Topics: Ensemble learning (67%), Boosting (machine learning) (61%), Random forest (56.99%) ... show more

References
  More

30 results found


Open accessJournal ArticleDOI: 10.1148/RADIOL.2020200642
Tao Ai1, Zhenlu Yang, Hongyan Hou2, Chenao Zhan1  +5 moreInstitutions (2)
26 Feb 2020-Radiology
Abstract: Chest CT had higher sensitivity for diagnosis of COVID-19 as compared with initial reverse-transcription polymerase chain reaction from swab samples in the epidemic area of China.

... read more

3,596 Citations


Book ChapterDOI: 10.1007/978-3-540-31865-1_25
Cyril Goutte1, Eric Gaussier1Institutions (1)
21 Mar 2005-
Abstract: We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.

... read more

Topics: F1 score (62%), Precision and recall (59%), Probabilistic logic (52%) ... show more

969 Citations


Open accessJournal ArticleDOI: 10.1148/RADIOL.2020200905
Lin Li, Lixin Qin1, Zeguo Xu, Youbing Yin  +14 moreInstitutions (3)
19 Mar 2020-Radiology
Abstract: Background Coronavirus disease 2019 (COVID-19) has widely spread all over the world since the beginning of 2020. It is desirable to develop automatic and accurate detection of COVID-19 using chest CT. Purpose To develop a fully automatic framework to detect COVID-19 using chest CT and evaluate its performance. Materials and Methods In this retrospective and multicenter study, a deep learning model, the COVID-19 detection neural network (COVNet), was developed to extract visual features from volumetric chest CT scans for the detection of COVID-19. CT scans of community-acquired pneumonia (CAP) and other non-pneumonia abnormalities were included to test the robustness of the model. The datasets were collected from six hospitals between August 2016 and February 2020. Diagnostic performance was assessed with the area under the receiver operating characteristic curve, sensitivity, and specificity. Results The collected dataset consisted of 4352 chest CT scans from 3322 patients. The average patient age (±standard deviation) was 49 years ± 15, and there were slightly more men than women (1838 vs 1484, respectively; P = .29). The per-scan sensitivity and specificity for detecting COVID-19 in the independent test set was 90% (95% confidence interval [CI]: 83%, 94%; 114 of 127 scans) and 96% (95% CI: 93%, 98%; 294 of 307 scans), respectively, with an area under the receiver operating characteristic curve of 0.96 (P < .001). The per-scan sensitivity and specificity for detecting CAP in the independent test set was 87% (152 of 175 scans) and 92% (239 of 259 scans), respectively, with an area under the receiver operating characteristic curve of 0.95 (95% CI: 0.93, 0.97). Conclusion A deep learning model can accurately detect coronavirus 2019 and differentiate it from community-acquired pneumonia and other lung conditions. © RSNA, 2020 Online supplemental material is available for this article.

... read more

942 Citations


Open accessPosted ContentDOI: 10.1101/2020.02.14.20023028
Shuai Wang1, Bo Kang2, Jinlu Ma3, Xianjun Zeng4  +7 moreInstitutions (4)
11 Mar 2020-medRxiv
Abstract: Background The outbreak of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2) has caused more than 2.5 million cases of Corona Virus Disease (COVID-19) in the world so far, with that number continuing to grow. To control the spread of the disease, screening large numbers of suspected cases for appropriate quarantine and treatment is a priority. Pathogenic laboratory testing is the gold standard but is time-consuming with significant false negative results. Therefore, alternative diagnostic methods are urgently needed to combat the disease. Based on COVID-19 radiographical changes in CT images, we hypothesized that Artificial Intelligence’s deep learning methods might be able to extract COVID-19’s specific graphical features and provide a clinical diagnosis ahead of the pathogenic test, thus saving critical time for disease control. Methods and Findings We collected 1,065 CT images of pathogen-confirmed COVID-19 cases (325 images) along with those previously diagnosed with typical viral pneumonia (740 images). We modified the Inception transfer-learning model to establish the algorithm, followed by internal and external validation. The internal validation achieved a total accuracy of 89.5% with specificity of 0.88 and sensitivity of 0.87. The external testing dataset showed a total accuracy of 79.3% with specificity of 0.83 and sensitivity of 0.67. In addition, in 54 COVID-19 images that first two nucleic acid test results were negative, 46 were predicted as COVID-19 positive by the algorithm, with the accuracy of 85.2%. Conclusion These results demonstrate the proof-of-principle for using artificial intelligence to extract radiological features for timely and accurate COVID-19 diagnosis. Author summary To control the spread of the COVID-19, screening large numbers of suspected cases for appropriate quarantine and treatment measures is a priority. Pathogenic laboratory testing is the gold standard but is time-consuming with significant false negative results. Therefore, alternative diagnostic methods are urgently needed to combat the disease. We hypothesized that Artificial Intelligence’s deep learning methods might be able to extract COVID-19’s specific graphical features and provide a clinical diagnosis ahead of the pathogenic test, thus saving critical time. We collected 1,065 CT images of pathogen-confirmed COVID-19 cases along with those previously diagnosed with typical viral pneumonia. We modified the Inception transfer-learning model to establish the algorithm. The internal validation achieved a total accuracy of 89.5% with specificity of 0.88 and sensitivity of 0.87. The external testing dataset showed a total accuracy of 79.3% with specificity of 0.83 and sensitivity of 0.67. In addition, in 54 COVID-19 images that first two nucleic acid test results were negative, 46 were predicted as COVID-19 positive by the algorithm, with the accuracy of 85.2%. Our study represents the first study to apply artificial intelligence to CT images for effectively screening for COVID-19.

... read more

Topics: Gold standard (test) (55%)

571 Citations


Open accessPosted Content
Abstract: Purpose: Develop AI-based automated CT image analysis tools for detection, quantification, and tracking of Coronavirus; demonstrate they can differentiate coronavirus patients from non-patients. Materials and Methods: Multiple international datasets, including from Chinese disease-infected areas were included. We present a system that utilizes robust 2D and 3D deep learning models, modifying and adapting existing AI models and combining them with clinical understanding. We conducted multiple retrospective experiments to analyze the performance of the system in the detection of suspected COVID-19 thoracic CT features and to evaluate evolution of the disease in each patient over time using a 3D volume review, generating a Corona score. The study includes a testing set of 157 international patients (China and U.S). Results: Classification results for Coronavirus vs Non-coronavirus cases per thoracic CT studies were 0.996 AUC (95%CI: 0.989-1.00) ; on datasets of Chinese control and infected patients. Possible working point: 98.2% sensitivity, 92.2% specificity. For time analysis of Coronavirus patients, the system output enables quantitative measurements for smaller opacities (volume, diameter) and visualization of the larger opacities in a slice-based heat map or a 3D volume display. Our suggested Corona score measures the progression of disease over time. Conclusion: This initial study, which is currently being expanded to a larger population, demonstrated that rapidly developed AI-based image analysis can achieve high accuracy in detection of Coronavirus as well as quantification and tracking of disease burden.

... read more

Topics: Population (51%)

476 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20222
20216
20201
Network Information
Related Papers (5)
An Algorithm for Classifying Incomplete Data with Selective Bayes Classifiers15 Dec 2007

Jingnian Chen, Xiaoping Xue +2 more

72% related
An Efficient, Ensemble-Based Classification Framework for Big Medical Data.23 Sep 2021

Firoz Khan, Balusupati Veera Venkata Siva Prasad +3 more

71% related
Classification Methods in Data Mining:A Detailed Survey01 Jan 2014

Departmentof Cse

71% related