scispace - formally typeset
Search or ask a question

Showing papers by "Derrick W. Crook published in 2019"


Journal ArticleDOI
30 Aug 2019
TL;DR: In this article, the authors compared hybrid assembly for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or from SMRT Pacific Biosciences (PacBio).
Abstract: Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods impact on assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or from SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the Enterobacteriaceae family, as these frequently have highly plastic, repetitive genetic structures and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

148 citations


Journal ArticleDOI
TL;DR: While further optimization is required to improve sensitivity, this approach shows promise for the Nanopore platform to be used in the diagnosis and genetic analysis of influenza virus and other respiratory viruses.
Abstract: Influenza is a major global public health threat as a result of its highly pathogenic variants, large zoonotic reservoir, and pandemic potential. Metagenomic viral sequencing offers the potential for a diagnostic test for influenza virus which also provides insights on transmission, evolution, and drug resistance and simultaneously detects other viruses. We therefore set out to apply the Oxford Nanopore Technologies sequencing method to metagenomic sequencing of respiratory samples. We generated influenza virus reads down to a limit of detection of 102 to 103 genome copies/ml in pooled samples, observing a strong relationship between the viral titer and the proportion of influenza virus reads (P = 4.7 × 10−5). Applying our methods to clinical throat swabs, we generated influenza virus reads for 27/27 samples with mid-to-high viral titers (cycle threshold [CT] values, 99% complete sequences for all eight gene segments. We also detected a human coronavirus coinfection in one clinical sample. While further optimization is required to improve sensitivity, this approach shows promise for the Nanopore platform to be used in the diagnosis and genetic analysis of influenza virus and other respiratory viruses.

109 citations


Journal ArticleDOI
02 Dec 2019
TL;DR: The ability of Mykrobe-based DST to guide personalized therapeutic regimen design in the context of complex drug susceptibility profiles is measured, showing 94% concordance of implied regimen with that driven by phenotypic DST, higher than all other benchmarked tools.
Abstract: Two billion people are infected with Mycobacterium tuberculosis, leading to 10 million new cases of active tuberculosis and 1.5 million deaths annually. Universal access to drug susceptibility testing (DST) has become a World Health Organization priority. We previously developed a software tool, Mykrobe predictor, which provided offline species identification and drug resistance predictions for M. tuberculosis from whole genome sequencing (WGS) data. Performance was insufficient to support the use of WGS as an alternative to conventional phenotype-based DST, due to mutation catalogue limitations. Here we present a new tool, Mykrobe, which provides the same functionality based on a new software implementation. Improvements include i) an updated mutation catalogue giving greater sensitivity to detect pyrazinamide resistance, ii) support for user-defined resistance catalogues, iii) improved identification of non-tuberculous mycobacterial species, and iv) an updated statistical model for Oxford Nanopore Technologies sequencing data. Mykrobe is released under MIT license at https://github.com/mykrobe-tools/mykrobe. We incorporate mutation catalogues from the CRyPTIC consortium et al. (2018) and from Walker et al. (2015), and make improvements based on performance on an initial set of 3206 and an independent set of 5845 M. tuberculosis Illumina sequences. To give estimates of error rates, we use a prospectively collected dataset of 4362 M. tuberculosis isolates. Using culture based DST as the reference, we estimate Mykrobe to be 100%, 95%, 82%, 99% sensitive and 99%, 100%, 99%, 99% specific for rifampicin, isoniazid, pyrazinamide and ethambutol resistance prediction respectively. We benchmark against four other tools on 10207 (=5845+4362) samples, and also show that Mykrobe gives concordant results with nanopore data. We measure the ability of Mykrobe-based DST to guide personalized therapeutic regimen design in the context of complex drug susceptibility profiles, showing 94% concordance of implied regimen with that driven by phenotypic DST, higher than all other benchmarked tools.

88 citations


Journal ArticleDOI
12 Mar 2019-Mbio
TL;DR: The work indicates that the use of antimicrobials outside the health care environment has selected for resistant organisms, and in the case of RT078, has contributed to the emergence of a human pathogen.
Abstract: The increasing clinical importance of human infections (frequently severe) caused by Clostridium difficile PCR ribotype 078 (RT078) was first reported in 2008. The severity of symptoms (mortality of ≤30%) and the higher proportion of infections among community and younger patients raised concerns. Farm animals, especially pigs, have been identified as RT078 reservoirs. We aimed to understand the recent changes in RT078 epidemiology by investigating a possible role for antimicrobial selection in its recent evolutionary history. Phylogenetic analysis of international RT078 genomes (isolates from 2006 to 2014, n = 400), using time-scaled, recombination-corrected, maximum likelihood phylogenies, revealed several recent clonal expansions. A common ancestor of each expansion had independently acquired a different allele of the tetracycline resistance gene tetM. Consequently, an unusually high proportion (76.5%) of RT078 genomes were tetM positive. Multiple additional tetracycline resistance determinants were also identified (including efflux pump tet40), frequently sharing a high level of nucleotide sequence identity (up to 100%) with sequences found in the pig pathogen Streptococcus suis and in other zoonotic pathogens such as Campylobacter jejuni and Campylobacter coli. Each RT078 tetM clonal expansion lacked geographic structure, indicating rapid, recent international spread. Resistance determinants for C. difficile infection-triggering antimicrobials, including fluoroquinolones and clindamycin, were comparatively rare in RT078. Tetracyclines are used intensively in agriculture; this selective pressure, plus rapid, international spread via the food chain, may explain the increased RT078 prevalence in humans. Our work indicates that the use of antimicrobials outside the health care environment has selected for resistant organisms, and in the case of RT078, has contributed to the emergence of a human pathogen. IMPORTANCEClostridium difficile PCR ribotype 078 (RT078) has multiple reservoirs; many are agricultural. Since 2005, this genotype has been increasingly associated with human infections in both clinical settings and the community. Investigations of RT078 whole-genome sequences revealed that tetracycline resistance had been acquired on multiple independent occasions. Phylogenetic analysis revealed a rapid, recent increase in numbers of closely related tetracycline-resistant RT078 (clonal expansions), suggesting that tetracycline selection has strongly influenced its recent evolutionary history. We demonstrate recent international spread of emergent, tetracycline-resistant RT078. A similar tetracycline-positive clonal expansion was also identified in unrelated nontoxigenic C. difficile, suggesting that this process may be widespread and may be independent of disease-causing ability. Resistance to typical C. difficile infection-associated antimicrobials (e.g., fluoroquinolones, clindamycin) occurred only sporadically within RT078. Selective pressure from tetracycline appears to be a key factor in the emergence of this human pathogen and the rapid international dissemination that followed, plausibly via the food chain.

76 citations


Journal ArticleDOI
TL;DR: Results provided a comprehensive comparison of various techniques and confirmed the application of machine learning for better prediction of the large diverse tuberculosis data and mutation ranking showed the possibility of finding new resistance/susceptible markers.
Abstract: MOTIVATION Timely identification of Mycobacterium tuberculosis (MTB) resistance to existing drugs is vital to decrease mortality and prevent the amplification of existing antibiotic resistance. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. However, they have been not validated on a large cohort of MTB samples from multi-centers across the world in terms of resistance prediction and resistance marker identification. Several machine learning classifiers and linear dimension reduction techniques were developed and compared for a cohort of 13 402 isolates collected from 16 countries across 6 continents and tested 11 drugs. RESULTS Compared to conventional molecular diagnostic test, area under curve of the best machine learning classifier increased for all drugs especially by 23.11%, 15.22% and 10.14% for pyrazinamide, ciprofloxacin and ofloxacin, respectively (P < 0.01). Logistic regression and gradient tree boosting found to perform better than other techniques. Moreover, logistic regression/gradient tree boosting with a sparse principal component analysis/non-negative matrix factorization step compared with the classifier alone enhanced the best performance in terms of F1-score by 12.54%, 4.61%, 7.45% and 9.58% for amikacin, moxifloxacin, ofloxacin and capreomycin, respectively, as well increasing area under curve for amikacin and capreomycin. Results provided a comprehensive comparison of various techniques and confirmed the application of machine learning for better prediction of the large diverse tuberculosis data. Furthermore, mutation ranking showed the possibility of finding new resistance/susceptible markers. AVAILABILITY AND IMPLEMENTATION The source code can be found at http://www.robots.ox.ac.uk/ davidc/code.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

59 citations


Posted ContentDOI
26 Jan 2019-bioRxiv
TL;DR: It is shown that automated hybrid assembly can automatically fully reconstruct complex bacterial genomes of Enterobacteriaceae isolates in the majority of cases and represents a low-cost, high-quality approach for reconstructing bacterial genomes using publicly available software.
Abstract: Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads ( IMPACT STATEMENT Illumina short-read sequencing is frequently used for tasks in bacterial genomics, such as assessing which species are present within samples, checking if specific genes of interest are present within individual isolates, and reconstructing the evolutionary relationships between strains. However, while short-read sequencing can reveal significant detail about the genomic content of bacterial isolates, it is often insufficient for assessing genomic structure: how different genes are arranged within genomes, and particularly which genes are on plasmids – potentially highly mobile components of the genome frequently carrying antimicrobial resistance elements. This is because Illumina short reads are typically too short to span repetitive structures in the genome, making it impossible to accurately reconstruct these repetitive regions. One solution is to complement Illumina short reads with long reads generated with SMRT Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT) sequencing platforms. Using this approach, called ‘hybrid assembly’, we show that we can automatically fully reconstruct complex bacterial genomes of Enterobacteriaceae isolates in the majority of cases (best-performing method: 17/20 isolates). In particular, by comparing different methods we find that using the assembler Unicycler with Illumina and ONT reads represents a low-cost, high-quality approach for reconstructing bacterial genomes using publicly available software. DATA SUMMARY Raw sequencing data and assemblies have been deposited in NCBI under BioProject Accession PRJNA422511 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA422511). We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.

55 citations


Journal ArticleDOI
22 Feb 2019-eLife
TL;DR: Staphylococcal pyomyositis, like tetanus and diphtheria, is established as critically dependent on a single toxin and the potential for association studies to identify specific bacterial genes promoting severe human disease is demonstrated.
Abstract: Pyomyositis is a severe bacterial infection of skeletal muscle, commonly affecting children in tropical regions, predominantly caused by Staphylococcus aureus. To understand the contribution of bacterial genomic factors to pyomyositis, we conducted a genome-wide association study of S. aureus cultured from 101 children with pyomyositis and 417 children with asymptomatic nasal carriage attending the Angkor Hospital for Children, Cambodia. We found a strong relationship between bacterial genetic variation and pyomyositis, with estimated heritability 63.8% (95% CI 49.2-78.4%). The presence of the Panton-Valentine leucocidin (PVL) locus increased the odds of pyomyositis 130-fold (p=10-17.9). The signal of association mapped both to the PVL-coding sequence and to the sequence immediately upstream. Together these regions explained over 99.9% of heritability (95% CI 93.5-100%). Our results establish staphylococcal pyomyositis, like tetanus and diphtheria, as critically dependent on a single toxin and demonstrate the potential for association studies to identify specific bacterial genes promoting severe human disease.

52 citations


Journal ArticleDOI
TL;DR: SNP-IT is presented, a single-nucleotide polymorphism–based tool to identify all members of MTBC, including animal clades, and an unexpectedly high number of M. orygis isolates is detected in the United Kingdom.
Abstract: The clinical phenotype of zoonotic tuberculosis and its contribution to the global burden of disease are poorly understood and probably underestimated. This shortcoming is partly because of the inability of currently available laboratory and in silico tools to accurately identify all subspecies of the Mycobacterium tuberculosis complex (MTBC). We present SNPs to Identify TB (SNP-IT), a single-nucleotide polymorphism-based tool to identify all members of MTBC, including animal clades. By applying SNP-IT to a collection of clinical genomes from a UK reference laboratory, we detected an unexpectedly high number of M. orygis isolates. M. orygis is seen at a similar rate to M. bovis, yet M. orygis cases have not been previously described in the United Kingdom. From an international perspective, it is possible that M. orygis is an underestimated zoonosis. Accurate identification will enable study of the clinical phenotype, host range, and transmission mechanisms of all subspecies of MTBC in greater detail.

49 citations


Journal ArticleDOI
24 Oct 2019
TL;DR: Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods, and particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database.
Abstract: Shotgun metagenomics is increasingly used to characterise microbial communities, particularly for the investigation of antimicrobial resistance (AMR) in different animal and environmental contexts. There are many different approaches for inferring the taxonomic composition and AMR gene content of complex community samples from shotgun metagenomic data, but there has been little work establishing the optimum sequencing depth, data processing and analysis methods for these samples. In this study we used shotgun metagenomics and sequencing of cultured isolates from the same samples to address these issues. We sampled three potential environmental AMR gene reservoirs (pig caeca, river sediment, effluent) and sequenced samples with shotgun metagenomics at high depth (~ 200 million reads per sample). Alongside this, we cultured single-colony isolates of Enterobacteriaceae from the same samples and used hybrid sequencing (short- and long-reads) to create high-quality assemblies for comparison to the metagenomic data. To automate data processing, we developed an open-source software pipeline, ‘ResPipe’. Taxonomic profiling was much more stable to sequencing depth than AMR gene content. 1 million reads per sample was sufficient to achieve < 1% dissimilarity to the full taxonomic composition. However, at least 80 million reads per sample were required to recover the full richness of different AMR gene families present in the sample, and additional allelic diversity of AMR genes was still being discovered in effluent at 200 million reads per sample. Normalising the number of reads mapping to AMR genes using gene length and an exogenous spike of Thermus thermophilus DNA substantially changed the estimated gene abundance distributions. While the majority of genomic content from cultured isolates from effluent was recoverable using shotgun metagenomics, this was not the case for pig caeca or river sediment. Sequencing depth and profiling method can critically affect the profiling of polymicrobial animal and environmental samples with shotgun metagenomics. Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods. Particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database. ResPipe, the open-source software pipeline we have developed, is freely available ( https://gitlab.com/hsgweon/ResPipe ).

48 citations


Journal ArticleDOI
TL;DR: In this article, the role of patients with asymptomatic Clostridium difficile infection (CDI) and colonization identified in a study conducted during 2006-2007 at 6 Canadian hospitals underwent typing by pulsed-field gel electrophoresis, multilocus sequence typing, and WGS.
Abstract: Background: Whole genome sequencing (WGS) studies can enhance our understanding of the role of patients with asymptomatic Clostridium difficile colonization in transmission. Methods: Isolates obtained from patients with Clostridium difficile infection (CDI) and colonization identified in a study conducted during 2006–2007 at 6 Canadian hospitals underwent typing by pulsed-field gel electrophoresis, multilocus sequence typing, and WGS. Isolates from incident CDI cases not in the initial study were also sequenced where possible. Ward movement and typing data were combined to identify plausible donors for each CDI case, as defined by shared time and space within predefined limits. Proportions of plausible donors for CDI cases that were colonized, infected, or both were examined. Results: Five hundred fifty-four isolates were sequenced successfully, 353 from colonized patients and 201 from CDI cases. The NAP1/027/ST1 strain was the most common strain, found in 124 (62%) of infected and 92 (26%) of colonized patients. A donor with a plausible ward link was found for 81 CDI cases (40%) using WGS with a threshold of ≤2 single nucleotide polymorphisms to determine relatedness. Sixty-five (32%) CDI cases could be linked to both infected and colonized donors. Exclusive linkages to infected and colonized donors were found for 28 (14%) and 12 (6%) CDI cases, respectively. Conclusions: Colonized patients contribute to transmission, but CDI cases are more likely linked to other infected patients than colonized patients in this cohort with high rates of the NAP1/027/ST1 strain, highlighting the importance of local prevalence of virulent strains in determining transmission dynamics.

44 citations


Journal ArticleDOI
TL;DR: The authors' analysis highlights the apparent molecular propensity of K. quasipneumoniae to persist in the environment as well as acquire carbapenemase plasmids from other species and enabled an assessment of the genetic rearrangements which may facilitate horizontal transmission of carbAPenemases.
Abstract: Several emerging pathogens have arisen as a result of selection pressures exerted by modern health care. Klebsiella quasipneumoniae was recently defined as a new species, yet its prevalence, niche, and propensity to acquire antimicrobial resistance genes are not fully described. We have been tracking inter- and intraspecies transmission of the Klebsiella pneumoniae carbapenemase (KPC) gene, blaKPC, between bacteria isolated from a single institution. We applied a combination of Illumina and PacBio whole-genome sequencing to identify and compare K. quasipneumoniae from patients and the hospital environment over 10- and 5-year periods, respectively. There were 32 blaKPC-positive K. quasipneumoniae isolates, all of which were identified as K. pneumoniae in the clinical microbiology laboratory, from 8 patients and 11 sink drains, with evidence for seven separate blaKPC plasmid acquisitions. Analysis of a single subclade of K. quasipneumoniae subsp. quasipneumoniae (n = 23 isolates) from three patients and six rooms demonstrated seeding of a sink by a patient, subsequent persistence of the strain in the hospital environment, and then possible transmission to another patient. Longitudinal analysis of this strain demonstrated the acquisition of two unique blaKPC plasmids and then subsequent within-strain genetic rearrangement through transposition and homologous recombination. Our analysis highlights the apparent molecular propensity of K. quasipneumoniae to persist in the environment as well as acquire carbapenemase plasmids from other species and enabled an assessment of the genetic rearrangements which may facilitate horizontal transmission of carbapenemases.

Journal ArticleDOI
TL;DR: Eight potential new resistance-conferring single nucleotide polymorphisms (SNPs) were identified, potentially clinically important, as they all occurred in samples that were predicted to be inducibly resistant and for which a macrolide would therefore currently be indicated.
Abstract: Mycobacterium abscessus is emerging as an important pathogen in chronic lung diseases, with concern regarding patient-to-patient transmission. The recent introduction of routine whole-genome sequencing (WGS) as a replacement for existing reference techniques in England provides an opportunity to characterize the genetic determinants of resistance. We conducted a systematic review to catalogue all known resistance-determining mutations. This knowledge was used to construct a predictive algorithm based on mutations in the erm(41) and rrl genes which was tested on a collection of 203 sequentially acquired clinical isolates for which there were paired genotype/phenotype data. A search for novel resistance-determining mutations was conducted using a heuristic algorithm. The sensitivity of existing knowledge for predicting resistance in clarithromycin was 95% (95% confidence interval [CI], 89 to 98%), and the specificity was 66% (95% CI, 54 to 76%). The subspecies alone was a poor predictor of resistance to clarithromycin. Eight potential new resistance-conferring single nucleotide polymorphisms (SNPs) were identified. WGS demonstrated probable resistance-determining SNPs in regions that the NTM-DR line probe cannot detect. These mutations are potentially clinically important, as they all occurred in samples that were predicted to be inducibly resistant and for which a macrolide would therefore currently be indicated. We were unable to explain all resistance, raising the possibility of the involvement of other as yet unidentified genes.

Journal ArticleDOI
01 May 2019-Plasmid
TL;DR: From the evaluation of transconjugants created from the mating of three chromosomally isogenic Klebsiella pneumoniae carbapenemase positive Citrobacter freundii isolates, there is still much to learn about SCPs, and the high rate of co-transfer of multiple plasmids from real-world carbapanemase-producing Enterobacteriales.

Journal ArticleDOI
TL;DR: Trehalose supplementation did not increase ribotype-027 virulence in a clinically-validated gut model, and increases in total dietary trehalose during the early-mid 2000s C. difficile epidemic were likely relatively minimal.

Journal ArticleDOI
TL;DR: Even though the use of carbapenems in companion animals is restricted, the concurrent presence of blaCMY-42 and other antimicrobial resistance genes could lead to co-selection ofcarbapenemase genes in this population.
Abstract: Background/Objectives:Carbapenemase-producing Enterobacteriaceae (CPE) are a public health threat, and have been found in humans, animals and the environment. Carbapenems are not authorized for use in EU or UK companion animals, and the prevalence of carbapenem-resistant Gram-negative bacilli (CRGNB) in this population is unknown. Methods:We investigated CRGNB isolated from animal specimens received by one diagnostic laboratory from 34 UK veterinary practices (September 2015-December 2016). Any Gram-negative isolates from clinical specimens showing reduced susceptibility to fluoroquinolones and/or aminoglycosides and/or cephalosporins were investigated phenotypically and genotypically for carbapenemases. A complete genome assembly (Illumina/Nanopore) was generated for the single isolate identified to investigate the genetic context for carbapenem resistance. Results:One ST410 Escherichia coli isolate [(CARB35); 1/191, 0.5%], cultured from a wound in a springer spaniel, harboured a known carbapenem resistance gene (blaNDM-5). The gene was located in the chromosome on an integrated 100 kb IncF plasmid, also harbouring other drug resistance genes (mrx, sul1, ant1 and dfrA). The isolate also contained blaCMY-42 and blaTEM-190 on two separate plasmids (IncI1 and IncFII, respectively) that showed homology with other publicly available plasmid sequences from Italy and Myanmar. Conclusions:Even though the use of carbapenems in companion animals is restricted, the concurrent presence of blaCMY-42 and other antimicrobial resistance genes could lead to co-selection of carbapenemase genes in this population. Further studies investigating the selection and flow of plasmids carrying important resistance genes amongst humans and companion animals are needed.

Journal ArticleDOI
TL;DR: An end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and a clustering variant based on DeepAMR, for learning clusters in latent space of the data that captures lineage-related clusters in the latent space.
Abstract: Motivation Resistance co-occurrence within first-line anti-tuberculosis (TB) drugs is a common phenomenon. Existing methods based on genetic data analysis of Mycobacterium tuberculosis (MTB) have been able to predict resistance of MTB to individual drugs, but have not considered the resistance co-occurrence and cannot capture latent structure of genomic data that corresponds to lineages. Results We used a large cohort of TB patients from 16 countries across six continents where whole-genome sequences for each isolate and associated phenotype to anti-TB drugs were obtained using drug susceptibility testing recommended by the World Health Organization. We then proposed an end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and developed DeepAMR_cluster, a clustering variant based on DeepAMR, for learning clusters in latent space of the data. The results showed that DeepAMR outperformed baseline model and four machine learning models with mean AUROC from 94.4% to 98.7% for predicting resistance to four first-line drugs [i.e. isoniazid (INH), ethambutol (EMB), rifampicin (RIF), pyrazinamide (PZA)], multi-drug resistant TB (MDR-TB) and pan-susceptible TB (PANS-TB: MTB that is susceptible to all four first-line anti-TB drugs). In the case of INH, EMB, PZA and MDR-TB, DeepAMR achieved its best mean sensitivity of 94.3%, 91.5%, 87.3% and 96.3%, respectively. While in the case of RIF and PANS-TB, it generated 94.2% and 92.2% sensitivity, which were lower than baseline model by 0.7% and 1.9%, respectively. t-SNE visualization shows that DeepAMR_cluster captures lineage-related clusters in the latent space. Availability and implementation The details of source code are provided at http://www.robots.ox.ac.uk/∼davidc/code.php. Supplementary information Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: Endocarditis was used as a clinically relevant case study to investigate the relationship between clinical cases and diagnostic codes, to understand discrepancies and to improve design of future studies, and commonly used diagnostic codes in studies of endocarditis had good predictive ability.
Abstract: Diagnostic codes from electronic health records are widely used to assess patterns of disease. Infective endocarditis is an uncommon but serious infection, with objective diagnostic criteria. Electronic health records have been used to explore the impact of changing guidance on antibiotic prophylaxis for dental procedures on incidence, but limited data on the accuracy of the diagnostic codes exists. Endocarditis was used as a clinically relevant case study to investigate the relationship between clinical cases and diagnostic codes, to understand discrepancies and to improve design of future studies. Electronic health record data from two UK tertiary care centres were linked with data from a prospectively collected clinical endocarditis service database (Leeds Teaching Hospital) or retrospective clinical audit and microbiology laboratory blood culture results (Oxford University Hospitals Trust). The relationship between diagnostic codes for endocarditis and confirmed clinical cases according to the objective Duke criteria was assessed, and impact on estimations of disease incidence and trends. In Leeds 2006–2016, 738/1681(44%) admissions containing any endocarditis code represented a definite/possible case, whilst 263/1001(24%) definite/possible endocarditis cases had no endocarditis code assigned. In Oxford 2010–2016, 307/552(56%) reviewed endocarditis-coded admissions represented a clinical case. Diagnostic codes used by most endocarditis studies had good positive predictive value (PPV) but low sensitivity (e.g. I33-primary 82% and 43% respectively); one (I38-secondary) had PPV under 6%. Estimating endocarditis incidence using raw admission data overestimated incidence trends twofold. Removing records with non-specific codes, very short stays and readmissions improved predictive ability. Estimating incidence of streptococcal endocarditis using secondary codes also overestimated increases in incidence over time. Reasons for discrepancies included changes in coding behaviour over time, and coding guidance allowing assignment of a code mentioning ‘endocarditis’ where endocarditis was never mentioned in the clinical notes. Commonly used diagnostic codes in studies of endocarditis had good predictive ability. Other apparently plausible codes were poorly predictive. Use of diagnostic codes without examining sensitivity and predictive ability can give inaccurate estimations of incidence and trends. Similar considerations may apply to other diseases. Health record studies require validation of diagnostic codes and careful data curation to minimise risk of serious errors.

Journal ArticleDOI
TL;DR: FosA influences the inaccuracy of susceptibility testing by methods readily available in a clinical laboratory compared to agar dilution, and further research is needed to determine the impact of fosA on clinical outcomes.
Abstract: With multidrug-resistant (MDR) Enterobacterales on the rise, a nontoxic antimicrobial agent with a unique mechanism of action such as fosfomycin seems attractive. However, establishing accurate fosfomycin susceptibility testing for non-Escherichia coli isolates in a clinical microbiology laboratory remains problematic. We evaluated fosfomycin susceptibility by multiple methods with 96 KPC-producing clinical isolates of multiple strains and species collected at a single center between 2008 and 2016. In addition, we assessed the presence of fosfomycin resistance genes from whole-genome sequencing (WGS) data using NCBI's AMRFinder and custom HMM search. Susceptibility testing was performed using a glucose-6-phosphate-supplemented fosfomycin Etest and Kirby-Bauer disk diffusion (DD) assays, and the results were compared to those obtained by agar dilution. Clinical Laboratory and Standards Institute (CLSI) breakpoints for E. coli were applied for interpretation. Overall, 63% (60/96) of isolates were susceptible by Etest, 70% (67/96) by DD, and 88% (84/96) by agar dilution. fosA was detected in 80% (70/88) of previously sequenced isolates, with species-specific associations and alleles, and fosA-positive isolates were associated with higher MIC distributions. Disk potentiation testing was performed using sodium phosphonoformate to inhibit fosA and showed significant increases in the zone diameter of DD testing for isolates that were fosA positive compared to those that were fosA negative. The addition of sodium phosphonoformate (PPF) corrected 10/14 (71%) major errors in categorical agreement with agar dilution. Our results indicate that fosA influences the inaccuracy of susceptibility testing by methods readily available in a clinical laboratory compared to agar dilution. Further research is needed to determine the impact of fosA on clinical outcomes.

Journal ArticleDOI
TL;DR: Hash-cgMLST as mentioned in this paper is a refinement to core genome multilocus sequence typing in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters.
Abstract: Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 ( 2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.

Posted ContentDOI
11 Jan 2019-bioRxiv
TL;DR: An approach to improve the sensitivity/specificity of pyrazinamide resistance prediction in genetics-based clinical microbiology workflows, highlights novel mutations for future biochemical investigation, and is a proof of concept for using this approach in other drugs such as bedaquiline.
Abstract: Pyrazinamide is one of four first-line antibiotics currently used to treat tuberculosis and has been included in newer treatment regimens undergoing clinical trials due to its unique sterilizing effects and synergy with newer drugs. However, phenotypic antibiotic susceptibility testing for pyrazinamide is problematic. Resistance to pyrazinamide is primarily driven by genetic variation in pncA, which encodes PncA, an enzyme that converts pyrazinamide into its active form. We curated a derivation dataset of 291 non-redundant, missense amino acid mutations in PncA with associated high-confidence phenotypes from studies of clinical isolates and in vitro/in vivo screening studies and then trained machine learning models to predict pyrazinamide resistance based on sequence- and structure-based features of each missense mutation. The clinical relevance of the models was tested by predicting the binary resistance phenotype of 2,292 clinical isolates harboring missense mutations in PncA to pyrazinamide. The probabilities of resistance predicted by the model were also compared with in vitro pyrazinamide minimum inhibitory concentrations of 27 isolates to determine whether the machine learning model could predict the degree of resistance. Finally, we predicted the effect on pyrazinamide resistance of the remaining 814 possible missense mutations caused by single nucleotide polymorphisms in PncA that have not yet been observed in public databases. Overall, this work offers an approach to improve the sensitivity and specificity of pyrazinamide resistance prediction in genetics-based clinical microbiology workflows for tuberculosis, highlights novel mutations for future biochemical investigation, and is a proof of concept for using this approach in other drugs such as bedaquiline.

Posted ContentDOI
31 May 2019-bioRxiv
TL;DR: The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity, and when reads were aligned to the same genome from which they were sequenced, among the highest performing pipelines was Novoalign/GATK.
Abstract: Background Accurately identifying SNPs from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 41 SNP calling pipelines using simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally-sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia and Klebsiella. Results We evaluated the performance of 41 SNP calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic, bacteria such as Escherichia coli, but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest performing pipelines was Novoalign/GATK. However, across the full range of (divergent) genomes, among the consistently highest-performing pipelines was Snippy.

Journal ArticleDOI
TL;DR: CPE exposure can lead to colonization, clonal expansion and resistance gene transfer within intact human colonic microbiota, emphasizing the need to control exposure to antimicrobials.
Abstract: Background: Carbapenemase-producing Enterobacteriaceae (CPE) pose a major global health risk. Mobile genetic elements account for much of the increasing CPE burden. Objectives: To investigate CPE colonization and the impact of antibiotic exposure on subsequent resistance gene dissemination within the gut microbiota using a model to simulate the human colon. Methods: Gut models seeded with CPE-negative human faeces [screened with BioMerieux chromID® CARBA-SMART (Carba-Smart), Cepheid Xpert® Carba-R assay (XCR)] were inoculated with distinct carbapenemase-producing Klebsiella pneumoniae strains (KPC, NDM) and challenged with imipenem or piperacillin/tazobactam then meropenem. Resistant populations were enumerated daily on selective agars (Carba-Smart); CPE genes were confirmed by PCR (XCR, Check-Direct CPE Screen for BD MAX™). CPE gene dissemination was tracked using PacBio long-read sequencing. Results: CPE populations increased during inoculation, plateauing at ∼105 log10 cfu/mL in both models and persisting throughout the experiments (>65 days), with no evidence of CPE ‘washout’. After antibiotic administration, there was evidence of interspecies plasmid transfer of blaKPC-2 (111 742 bp IncFII/IncR plasmid, 99% identity to pKpQIL-D2) and blaNDM-1 (∼170 kb IncFIB/IncFII plasmid), and CPE populations rose from 45% of the total lactose-fermenting populations in the KPC model. Isolation of a blaNDM-1K. pneumoniae with one chromosomal single-nucleotide variant compared with the inoculated strain indicated clonal expansion within the model. Antibiotic administration exposed a previously undetected K. pneumoniae encoding blaOXA-232 (KPC model). Conclusions: CPE exposure can lead to colonization, clonal expansion and resistance gene transfer within intact human colonic microbiota. Furthermore, under antibiotic selective pressure, new resistant populations emerge, emphasizing the need to control exposure to antimicrobials.

Journal ArticleDOI
TL;DR: Gastrointestinal ESC-R-EC/KP colonisation is widespread in Cambodian children/adolescents; hospital admission and intestinal parasites are independent risk factors.
Abstract: Extended-spectrum cephalosporin resistance (ESC-R) in Escherichia coli and Klebsiella pneumoniae is a healthcare threat; high gastrointestinal carriage rates are reported from South-east Asia. Colonisation prevalence data in Cambodia are lacking. The aim of this study was to determine gastrointestinal colonisation prevalence of ESC-resistant E. coli (ESC-R-EC) and K. pneumoniae (ESC-R-KP) in Cambodian children/adolescents and associated socio-demographic risk factors; and to characterise relevant resistance genes, their genetic contexts, and the genetic relatedness of ESC-R strains using whole genome sequencing (WGS). Faeces and questionnaire data were obtained from individuals < 16 years in north-western Cambodia, 2012. WGS of cultured ESC-R-EC/KP was performed (Illumina). Maximum likelihood phylogenies were used to characterise relatedness of isolates; ESC-R-associated resistance genes and their genetic contexts were identified from de novo assemblies using BLASTn and automated/manual annotation. 82/148 (55%) of children/adolescents were ESC-R-EC/KP colonised; 12/148 (8%) were co-colonised with both species. Independent risk factors for colonisation were hospitalisation (OR: 3.12, 95% CI [1.52–6.38]) and intestinal parasites (OR: 3.11 [1.29–7.51]); school attendance conferred decreased risk (OR: 0.44 [0.21–0.92]. ESC-R strains were diverse; the commonest ESC-R mechanisms were blaCTX-M 1 and 9 sub-family variants. Structures flanking these genes were highly variable, and for blaCTX-M-15, − 55 and − 27 frequently involved IS26. Chromosomal blaCTX-M integration was common in E. coli. Gastrointestinal ESC-R-EC/KP colonisation is widespread in Cambodian children/adolescents; hospital admission and intestinal parasites are independent risk factors. The genetic contexts of blaCTX-M are highly mosaic, consistent with rapid horizontal exchange. Chromosomal integration of blaCTX-M may result in stable propagation in these community-associated pathogens.

Posted ContentDOI
19 Jun 2019-bioRxiv
TL;DR: While further optimisation is required to improve sensitivity, this approach shows promise for the Nanopore platform to be used in the diagnosis and genetic analysis of influenza and other respiratory viruses.
Abstract: Influenza is a major global public health threat as a result of its highly pathogenic variants, large zoonotic reservoir, and pandemic potential. Metagenomic viral sequencing offers the potential of a diagnostic test for influenza which also provides insights on transmission, evolution and drug resistance, and simultaneously detects other viruses. We therefore set out to apply Oxford Nanopore Technology to metagenomic sequencing of respiratory samples. We generated influenza reads down to a limit of detection of 102-103 genome copies/ml in pooled samples, observing a strong relationship between the viral titre and the proportion of influenza reads (p = 4.7×10-5). Applying our methods to clinical throat swabs, we generated influenza reads for 27/27 samples with high-to-mid viral titres (Cycle threshold (Ct) values 99% complete sequence for all eight gene segments. We also detected Human Coronavirus and generated a near complete Human Metapneumovirus genome from clinical samples. While further optimisation is required to improve sensitivity, this approach shows promise for the Nanopore platform to be used in the diagnosis and genetic analysis of influenza and other respiratory viruses.

Journal ArticleDOI
11 Jul 2019-Trials
TL;DR: ARK-Hospital aims to provide a feasible, sustainable and generalisable mechanism for increasing antibiotic stopping in patients who no longer need to receive them at ‘review and revise’.
Abstract: To ensure patients continue to get early access to antibiotics at admission, while also safely reducing antibiotic use in hospitals, one needs to target the continued need for antibiotics as more diagnostic information becomes available. UK Department of Health guidance promotes an initiative called ‘Start Smart then Focus’: early effective antibiotics followed by active ‘review and revision’ 24–72 h later. However in 2017, < 10% of antibiotic prescriptions were discontinued at review, despite studies suggesting that 20–30% of prescriptions could be stopped safely. Antibiotic Review Kit for Hospitals (ARK-Hospital) is a complex ‘review and revise’ behavioural intervention targeting healthcare professionals involved in antibiotic prescribing or administration in inpatients admitted to acute/general medicine (the largest consumers of non-prophylactic antibiotics in hospitals). The primary study objective is to evaluate whether ARK-Hospital can safely reduce the total antibiotic burden in acute/general medical inpatients by at least 15%. The primary hypotheses are therefore that the introduction of the behavioural intervention will be non-inferior in terms of 30-day mortality post-admission (relative margin 5%) for an acute/general medical inpatient, and superior in terms of defined daily doses of antibiotics per acute/general medical admission (co-primary outcomes). The unit of observation is a hospital organisation, a single hospital or group of hospitals organised with one executive board and governance framework (National Health Service trusts in England; health boards in Northern Ireland, Wales and Scotland). The study comprises a feasibility study in one organisation (phase I), an internal pilot trial in three organisations (phase II) and a cluster (organisation)-randomised stepped-wedge trial (phase III) targeting a minimum of 36 organisations in total. Randomisation will occur over 18 months from November 2017 with a further 12 months follow-up to assess sustainability. The behavioural intervention will be delivered to healthcare professionals involved in antibiotic prescribing or administration in adult inpatients admitted to acute/general medicine. Outcomes will be assessed in adult inpatients admitted to acute/general medicine, collected through routine electronic health records in all patients. ARK-Hospital aims to provide a feasible, sustainable and generalisable mechanism for increasing antibiotic stopping in patients who no longer need to receive them at ‘review and revise’. ISRCTN Current Controlled Trials, ISRCTN12674243 . Registered on 10 April 2017.

Journal ArticleDOI
TL;DR: High sensitivity but poor specificity of mutations identified in a literature search is demonstrated in Mycobacterium abscessus isolates evaluating the ability of whole genome sequencing to predict clarithromycin resistance.
Abstract: In our recent study of 203 sequential isolates evaluating the ability of whole-genome sequencing (WGS) to predict clarithromycin resistance in Mycobacterium abscessus ([1][1]), we demonstrated high sensitivity but poor specificity of mutations identified in a literature search. Most of the

Posted ContentDOI
31 Mar 2019-bioRxiv
TL;DR: Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods, and particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database.
Abstract: BackgroundShotgun metagenomics is increasingly used to characterise microbial communities, particularly for the investigation of antimicrobial resistance (AMR) in different animal and environmental contexts. There are many different approaches for inferring the taxonomic composition and AMR gene content of complex community samples from shotgun metagenomic data, but there has been little work establishing the optimum sequencing depth, data processing and analysis methods for these samples. In this study we used shotgun metagenomics and sequencing of cultured isolates from the same samples to address these issues. We sampled three potential environmental AMR gene reservoirs (pig caeca, river sediment, effluent) and sequenced samples with shotgun metagenomics at high depth ([~]200 million reads per sample). Alongside this, we cultured single-colony isolates of Enterobacteriaceae from the same samples and used hybrid sequencing (short- and long-reads) to create high-quality assemblies for comparison to the metagenomic data. To automate data processing, we developed an open-source software pipeline, ResPipe.nnResultsTaxonomic profiling was much more stable to sequencing depth than AMR gene content. 1 million reads per sample was sufficient to achieve <1% dissimilarity to the full taxonomic composition. However, at least 80 million reads per sample were required to recover the full richness of different AMR gene families present in the sample, and additional allelic diversity of AMR genes was still being discovered in effluent at 200 million reads per sample. Normalising the number of reads mapping to AMR genes using gene length and an exogenous spike of Thermus thermophilus DNA substantially changed the estimated gene abundance distributions. While the majority of genomic content from cultured isolates from effluent was recoverable using shotgun metagenomics, this was not the case for pig caeca or river sediment.nnConclusionsSequencing depth and profiling method can critically affect the profiling of polymicrobial animal and environmental samples with shotgun metagenomics. Both sequencing of cultured isolates and shotgun metagenomics can recover substantial diversity that is not identified using the other methods. Particular consideration is required when inferring AMR gene content or presence by mapping metagenomic reads to a database. ResPipe, the open-source software pipeline we have developed, is freely available (https://gitlab.com/hsgweon/ResPipe).

Journal ArticleDOI
08 Nov 2019-PLOS ONE
TL;DR: A hybrid protocol to improve detection of resistance genes in Enterobacteriaceae by using a short period of culture enrichment prior to sequencing of DNA extracted directly from the enriched sample is developed.
Abstract: Metagenomic sequencing of fecal DNA can usefully characterise an individual’s intestinal resistome but is limited by its inability to detect important pathogens that may be present at low abundance, such as carbapenemase or extended-spectrum beta-lactamase producing Enterobacteriaceae. Here we aimed to develop a hybrid protocol to improve detection of resistance genes in Enterobacteriaceae by using a short period of culture enrichment prior to sequencing of DNA extracted directly from the enriched sample. Volunteer feces were spiked with carbapenemase-producing Enterobacteriaceae and incubated in selective broth culture for 6 hours before sequencing. Different DNA extraction methods were compared, including a plasmid extraction protocol to increase the detection of plasmid-associated resistance genes. Although enrichment prior to sequencing increased the detection of carbapenemase genes, the differing growth characteristics of the spike organisms precluded accurate quantification of their concentration prior to culture. Plasmid extraction increased detection of resistance genes present on plasmids, but the effects were heterogeneous and dependent on plasmid size. Our results demonstrate methods of improving the limit of detection of selected resistance mechanisms in a fecal resistome assay, but they also highlight the difficulties in using these techniques for accurate quantification and should inform future efforts to achieve this goal.

Posted ContentDOI
03 Oct 2019-bioRxiv
TL;DR: Large-scale whole genome sequencing over a five-year period in the UK is used to highlight the complexity of genetic structures facilitating the spread of an important carbapenem resistance gene (blaKPC) amongst a number of bacterial species that cause disease in humans.
Abstract: Carbapenem resistance in Enterobacterales is a public health threat. Klebsiella pneumoniae carbapenemase (encoded by alleles of the blaKPC family) is one of the commonest transmissible carbapenem resistance mechanisms worldwide. The dissemination of blaKPC has historically been associated with distinct K. pneumoniae lineages (clonal group 258 [CG258]), a particular plasmid family (pKpQIL), and a composite transposon (Tn4401). In the UK, blaKPC has caused a large-scale, persistent outbreak focused on hospitals in North-West England. This outbreak has evolved to be polyclonal and poly-species, but the genetic mechanisms underpinning this evolution have not been elucidated in detail; this study used short-read whole genome sequencing of 604 blaKPC-positive isolates (Illumina) and long-read assembly (PacBio)/polishing (Illumina) of 21 isolates for characterisation. We observed the dissemination of blaKPC (predominantly blaKPC-2; 573/604 [95%] isolates) across eight species and more than 100 known sequence types. Although there was some variation at the transposon level (mostly Tn4401a, 584/604 (97%) isolates; predominantly with ATTGA-ATTGA target site duplications, 465/604 [77%] isolates), blaKPC spread appears to have been supported by highly fluid, modular exchange of larger genetic segments amongst plasmid populations dominated by IncFIB (580/604 isolates), IncFII (545/604 isolates) and IncR replicons (252/604 isolates). The subset of reconstructed plasmid sequences also highlighted modular exchange amongst non-blaKPC and blaKPC plasmids, and the common presence of multiple replicons within blaKPC plasmid structures (>60%). The substantial genomic plasticity observed has important implications for our understanding of the epidemiology of transmissible carbapenem resistance in Enterobacterales, for the implementation of adequate surveillance approaches, and for control. IMPORTANCE Antimicrobial resistance is a major threat to the management of infections, and resistance to carbapenems, one of the “last line” antibiotics available for managing drug-resistant infections, is a significant problem. This study used large-scale whole genome sequencing over a five-year period in the UK to highlight the complexity of genetic structures facilitating the spread of an important carbapenem resistance gene (blaKPC) amongst a number of bacterial species that cause disease in humans. In contrast to a recent pan-European study from 2012-2013(1), which demonstrated the major role of spread of clonal blaKPC-Klebsiella pneumoniae lineages in continental Europe, our study highlights the substantial plasticity in genetic mechanisms underpinning the dissemination of blaKPC. This genetic flux has important implications for: the surveillance of drug resistance (i.e. making surveillance more difficult); detection of outbreaks and tracking hospital transmission; generalizability of surveillance findings over time and for different regions; and for the implementation and evaluation of control interventions.

Journal ArticleDOI
TL;DR: Levels of transmission detected by WGS were comparable to previously described rates in endemic settings; other explanations, such as variations in antimicrobial use, are required to explain the high levels of CDI.
Abstract: OBJECTIVES Rates of Clostridioides (Clostridium) difficile infection (CDI) are higher in North Wales than elsewhere in the UK. We used WGS to investigate if this is due to increased healthcare-associated transmission from other cases. METHODS Healthcare and community C. difficile isolates from patients across North Wales (February-July 2015) from glutamate dehydrogenase (GDH)-positive faecal samples underwent WGS. Data from patient records, hospital management systems and national antimicrobial use surveillance were used. RESULTS Of the 499 GDH-positive samples, 338 (68%) were sequenced and 299 distinct infections/colonizations were identified, 229/299 (77%) with toxin genes. Only 39/229 (17%) toxigenic isolates were related within ≤2 SNPs to ≥1 infections/colonizations from a previously sampled patient, i.e. demonstrated evidence of possible transmission. Independent predictors of possible transmission included healthcare exposure in the last 12 weeks (P = 0.002, with rates varying by hospital), infection with MLST types ST-1 (ribotype 027) and ST-11 (predominantly ribotype 078) compared with all other toxigenic STs (P < 0.001), and cephalosporin exposure in the potential transmission recipient (P = 0.02). Adjusting for all these factors, there was no additional effect of ward workload (P = 0.54) or failure to meet cleaning targets (P = 0.25). Use of antimicrobials is higher in North Wales compared with England and the rest of Wales. CONCLUSIONS Levels of transmission detected by WGS were comparable to previously described rates in endemic settings; other explanations, such as variations in antimicrobial use, are required to explain the high levels of CDI. Cephalosporins are a risk factor for infection with C. difficile from another infected or colonized case.