scispace - formally typeset
Search or ask a question
Author

Szymon T. Calus

Other affiliations: University of Birmingham
Bio: Szymon T. Calus is an academic researcher from University of Glasgow. The author has contributed to research in topics: Nanopore sequencing & Deep sequencing. The author has an hindex of 10, co-authored 15 publications receiving 2690 citations. Previous affiliations of Szymon T. Calus include University of Birmingham.

Papers
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass.
Abstract: The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. Concurrent sequencing of negative control samples is strongly advised.

2,459 citations

Journal ArticleDOI
TL;DR: This work investigated the potential of a newly released sequencing technology, the MinION from Oxford Nanopore Technologies, in the management of a hospital outbreak of Salmonella, and demonstrated that rapid MiSeq sequencing can reduce the time to answer compared to the standard sequencing protocol with no impact on the results.
Abstract: Foodborne outbreaks of Salmonella remain a pressing public health concern. We recently detected a large outbreak of Salmonella enterica serovar Enteritidis phage type 14b affecting more than 30 patients in our hospital. This outbreak was linked to community, national and European-wide cases. Hospital patients with Salmonella are at high risk, and require a rapid response. We initially investigated this outbreak by whole-genome sequencing using a novel rapid protocol on the Illumina MiSeq; we then integrated these data with whole-genome data from surveillance sequencing, thereby placing the outbreak in a national context. Additionally, we investigated the potential of a newly released sequencing technology, the MinION from Oxford Nanopore Technologies, in the management of a hospital outbreak of Salmonella. We demonstrate that rapid MiSeq sequencing can reduce the time to answer compared to the standard sequencing protocol with no impact on the results. We show, for the first time, that the MinION can acquire clinically relevant information in real time and within minutes of a DNA library being loaded. MinION sequencing permits confident assignment to species level within 20 min. Using a novel streaming phylogenetic placement method samples can be assigned to a serotype in 40 min and determined to be part of the outbreak in less than 2 h. Both approaches yielded reliable and actionable clinical information on the Salmonella outbreak in less than half a day. The rapid availability of such information may facilitate more informed epidemiological investigations and influence infection control practices.

278 citations

Journal ArticleDOI
TL;DR: Disease improvement following treatment with EEN is associated with extensive modulation of the gut microbiome, and the microbiota of CD patients had a broader functional capacity than healthy controls, but diversity decreased with E EN.

209 citations

Journal ArticleDOI
TL;DR: Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymers aware base calling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms.
Abstract: Background: Amplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences sequencing platforms overcome this limitation, their application has been limited due to higher error rates or lower data output. Results: In this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on the intramolecular-ligated nanopore consensus sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the INC-Seq protocol that reduces sample processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain accurate full-length 16S rRNA gene sequences. Conclusions: NanoAmpli-Seq accurately estimates the diversity of tested mock communities with average consensus sequence accuracy of 99.5% for 2D and 1D2 sequencing on the nanopore sequencing platform. Nearly all residual errors in NanoAmpli-Seq sequences originate from deletions in homopolymer regions, indicating that homopolymer aware base calling or error correction may allow for sequencing accuracy comparable to short-read sequencing platforms.

79 citations

Posted ContentDOI
16 Jul 2014-bioRxiv
TL;DR: It is demonstrated that contaminating DNA is ubiquitous in commonly used DNA extraction kits, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass.
Abstract: The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. Concurrent sequencing of negative control samples is strongly advised.

69 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.
Abstract: Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

3,096 citations

Journal ArticleDOI
01 Nov 2017-Nature
TL;DR: A meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project is presented, creating both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity.
Abstract: Our growing awareness of the microbial world’s importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity.

1,676 citations

Journal ArticleDOI
TL;DR: Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
Abstract: We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

1,425 citations

Journal ArticleDOI
TL;DR: In this paper, the authors review features of microbiome-immunity crosstalk and their roles in health and disease, while providing examples of molecular mechanisms orchestrating these interactions in the intestine and extra-intestinal organs.
Abstract: The interplay between the commensal microbiota and the mammalian immune system development and function includes multifold interactions in homeostasis and disease. The microbiome plays critical roles in the training and development of major components of the host’s innate and adaptive immune system, while the immune system orchestrates the maintenance of key features of host-microbe symbiosis. In a genetically susceptible host, imbalances in microbiota-immunity interactions under defined environmental contexts are believed to contribute to the pathogenesis of a multitude of immune-mediated disorders. Here, we review features of microbiome-immunity crosstalk and their roles in health and disease, while providing examples of molecular mechanisms orchestrating these interactions in the intestine and extra-intestinal organs. We highlight aspects of the current knowledge, challenges and limitations in achieving causal understanding of host immune-microbiome interactions, as well as their impact on immune-mediated diseases, and discuss how these insights may translate towards future development of microbiome-targeted therapeutic interventions.

1,328 citations

Journal ArticleDOI
TL;DR: These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.
Abstract: Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses. Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets ( 20 samples per group) but also critically the only method tested that has a good control of false discovery rate. These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.

1,292 citations