scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.

24 Apr 2020-PLOS ONE (Public Library of Science)-Vol. 15, Iss: 4
TL;DR: The method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, suggesting that this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.
Abstract: The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such major viral outbreaks demand early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 virus genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020. Our results support a hypothesis of a bat origin and classify the COVID-19 virus as Sarbecovirus, within Betacoronavirus. Our method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

415 citations


Cites methods from "Machine learning using intrinsic ge..."

  • ...Inspirationally, within a short period of time since COVID-19 outbreak, advanced machine learning techniques have been used in taxonomic classification of COVID-19 genomes (8), CRISPR-based COVID-19 detection assay (6), survival prediction of severe COVID-19 patients (11), and discovering potential drug candidates against COVID-19 (4)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors present an overview of recent studies using Machine Learning and Artificial Intelligence to tackle many aspects of the COVID-19 crisis and highlight the need for international cooperation to maximize the potential of AI in this and future pandemics.
Abstract: COVID-19, the disease caused by the SARS-CoV-2 virus, has been declared a pandemic by the World Health Organization, which has reported over 18 million confirmed cases as of August 5, 2020 In this review, we present an overview of recent studies using Machine Learning and, more broadly, Artificial Intelligence, to tackle many aspects of the COVID-19 crisis We have identified applications that address challenges posed by COVID-19 at different scales, including: molecular, by identifying new or existing drugs for treatment;clinical, by supporting diagnosis and evaluating prognosis based on medical imaging and non-invasive measures;and societal, by tracking both the epidemic and the accompanying infodemic using multiple data sources We also review datasets, tools, and resources needed to facilitate Artificial Intelligence research, and discuss strategic considerations related to the operational implementation of multidisciplinary partnerships and open science We highlight the need for international cooperation to maximize the potential of AI in this and future pandemics ©2020 AI Access Foundation All rights reserved

315 citations

Journal ArticleDOI
TL;DR: It is reported that digital solutions and innovative technologies have mainly been proposed for the diagnosis of COVID-19 and digital solutions that integrate with the traditional methods, such as AI-based diagnostic algorithms based both on imaging and/or clinical data, seem promising.
Abstract: Background: The COVID-19 pandemic is favoring digital transitions in many industries and in society as a whole. Health care organizations have responded to the first phase of the pandemic by rapidly adopting digital solutions and advanced technology tools. Objective: The aim of this review is to describe the digital solutions that have been reported in the early scientific literature to mitigate the impact of COVID-19 on individuals and health systems. Methods: We conducted a systematic review of early COVID-19–related literature (from January 1 to April 30, 2020) by searching MEDLINE and medRxiv with appropriate terms to find relevant literature on the use of digital technologies in response to the pandemic. We extracted study characteristics such as the paper title, journal, and publication date, and we categorized the retrieved papers by the type of technology and patient needs addressed. We built a scoring rubric by cross-classifying the patient needs with the type of technology. We also extracted information and classified each technology reported by the selected articles according to health care system target, grade of innovation, and scalability to other geographical areas. Results: The search identified 269 articles, of which 124 full-text articles were assessed and included in the review after screening. Most of the selected articles addressed the use of digital technologies for diagnosis, surveillance, and prevention. We report that most of these digital solutions and innovative technologies have been proposed for the diagnosis of COVID-19. In particular, within the reviewed articles, we identified numerous suggestions on the use of artificial intelligence (AI)–powered tools for the diagnosis and screening of COVID-19. Digital technologies are also useful for prevention and surveillance measures, such as contact-tracing apps and monitoring of internet searches and social media usage. Fewer scientific contributions address the use of digital technologies for lifestyle empowerment or patient engagement. Conclusions: In the field of diagnosis, digital solutions that integrate with traditional methods, such as AI-based diagnostic algorithms based both on imaging and clinical data, appear to be promising. For surveillance, digital apps have already proven their effectiveness; however, problems related to privacy and usability remain. For other patient needs, several solutions have been proposed, such as telemedicine or telehealth tools. These tools have long been available, but this historical moment may actually be favoring their definitive large-scale adoption. It is worth taking advantage of the impetus provided by the crisis; it is also important to keep track of the digital solutions currently being proposed to implement best practices and models of care in future and to adopt at least some of the solutions proposed in the scientific literature, especially in national health systems, which have proved to be particularly resistant to the digital transition in recent years.

239 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: The hybrid machine learning methods of adaptive network-based fuzzy inference system and multi-layered perceptron-imperialist competitive algorithm are proposed to predict time series of infected individuals and mortality rate and predict that by late May, the outbreak and the total morality will drop substantially.
Abstract: Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to the lack of essential data and uncertainty, the epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19, and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are proposed to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for 9 days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.

172 citations


Cites methods from "Machine learning using intrinsic ge..."

  • ..., case identifications [51], classification of novel pathogens [52], modification of SIR-based models [53], diagnosis [54,55], survival prediction [56], and ICU demand prediction [57]....

    [...]

Journal ArticleDOI
TL;DR: Using Artificial Neural Networks (ANNs) experiments, the concentration of PM2.5 and PM10 linked to COVID-19-related deaths is determined and new threshold values identified are higher than the limits imposed by the European Parliament.

163 citations


Cites methods from "Machine learning using intrinsic ge..."

  • ...We refer to the emerging literature on the relationship among air pollution and virus epidemic diffusion (Alimadadi et al., 2020; Barstugan et al., 2020; Punn et al., 2020; Randhawa et al., 2020; and Tuli et al., 2020) to rely on a ML methodology....

    [...]

  • ...Second, following the empirical strategy employed by an emerging air pollution-virus epidemic literature (Alimadadi et al., 2020; Barstugan et al., 2020; Punn et al., 2020; Randhawa et al., 2020; Tuli et al., 2020), this study applies Artificial Neural Networks (ANNs) experiments and used a Machine Learning (ML) approach....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations

Journal ArticleDOI
TL;DR: Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed a clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily, which is the seventh member of the family of coronaviruses that infect humans.
Abstract: In December 2019, a cluster of patients with pneumonia of unknown cause was linked to a seafood wholesale market in Wuhan, China. A previously unknown betacoronavirus was discovered through the use of unbiased sequencing in samples from patients with pneumonia. Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed a clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily. Different from both MERS-CoV and SARS-CoV, 2019-nCoV is the seventh member of the family of coronaviruses that infect humans. Enhanced surveillance and further investigation are ongoing. (Funded by the National Key Research and Development Program of China and the National Major Project for Control and Prevention of Infectious Disease in China.).

21,455 citations

Journal ArticleDOI
03 Feb 2020-Nature
TL;DR: Identification and characterization of a new coronavirus (2019-nCoV), which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China, and it is shown that this virus belongs to the species of SARSr-CoV, indicates that the virus is related to a bat coronav virus.
Abstract: Since the outbreak of severe acute respiratory syndrome (SARS) 18 years ago, a large number of SARS-related coronaviruses (SARSr-CoVs) have been discovered in their natural reservoir host, bats1–4. Previous studies have shown that some bat SARSr-CoVs have the potential to infect humans5–7. Here we report the identification and characterization of a new coronavirus (2019-nCoV), which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started on 12 December 2019, had caused 2,794 laboratory-confirmed infections including 80 deaths by 26 January 2020. Full-length genome sequences were obtained from five patients at an early stage of the outbreak. The sequences are almost identical and share 79.6% sequence identity to SARS-CoV. Furthermore, we show that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. Pairwise protein sequence analysis of seven conserved non-structural proteins domains show that this virus belongs to the species of SARSr-CoV. In addition, 2019-nCoV virus isolated from the bronchoalveolar lavage fluid of a critically ill patient could be neutralized by sera from several patients. Notably, we confirmed that 2019-nCoV uses the same cell entry receptor—angiotensin converting enzyme II (ACE2)—as SARS-CoV. Characterization of full-length genome sequences from patients infected with a new coronavirus (2019-nCoV) shows that the sequences are nearly identical and indicates that the virus is related to a bat coronavirus.

16,857 citations

Journal ArticleDOI
TL;DR: The phylogenetic analysis suggests that bats might be the original host of this virus, an animal sold at the seafood market in Wuhan might represent an intermediate host facilitating the emergence of the virus in humans.

9,474 citations

Journal ArticleDOI
03 Feb 2020-Nature
TL;DR: Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.
Abstract: Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health1–3. Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China5. This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans. Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China.

9,231 citations

Related Papers (5)