scispace - formally typeset
Search or ask a question
Author

Osvaldo Zagordi

Bio: Osvaldo Zagordi is an academic researcher from University of Zurich. The author has contributed to research in topics: Population & Deep sequencing. The author has an hindex of 20, co-authored 34 publications receiving 1736 citations. Previous affiliations of Osvaldo Zagordi include ETH Zurich & Swiss Institute of Bioinformatics.

Papers
More filters
Journal ArticleDOI
TL;DR: ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors, is developed.
Abstract: Background: With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated. Results: We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability. Conclusions: ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at http://www.cbg.ethz.ch/software/shorah.

301 citations

Journal ArticleDOI
TL;DR: It is concluded that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated and probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall.
Abstract: Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.

229 citations

Journal ArticleDOI
TL;DR: Analysis of ultra-deep sequencing data obtained from diverse virus populations is challenging because of PCR and sequencing errors and short read lengths, such that the experiment provides only indirect evidence of the underlying viral population structure.

179 citations

Journal ArticleDOI
TL;DR: A jumping hidden Markov model is presented that describes the generation of viral quasispecies and a method to infer its parameters from next-generation sequencing data and introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site.
Abstract: RNA viruses exist in their hosts as populations of different but related strains. The virus population, often called quasispecies, is shaped by a combination of genetic change and natural selection. Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of viral quasispecies and a method to infer its parameters from next-generation sequencing data. The model introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site. Recombination events are indicated by a change of state, allowing a single observed read to originate from multiple sequences. We present a specific implementation of the expectation maximization (EM) algorithm to find maximum a posteriori estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and applied to reads obtained from a clinical HIV sample.

142 citations

Journal ArticleDOI
TL;DR: This article found that the polyclonal antibody response elicited by wild-type rpH1N1 HA was likely directed against an immunodominant region, which could be shielded by glycosylation at position 144.
Abstract: With the global spread of the 2009 pandemic H1N1 (pH1N1) influenza virus, there are increasing worries about evolution through antigenic drift. One way previous seasonal H1N1 and H3N2 influenza strains have evolved over time is by acquiring additional glycosylations in the globular head of their hemagglutinin (HA) proteins; these glycosylations have been believed to shield antigenically relevant regions from antibody immune responses. We added additional HA glycosylation sites to influenza A/Netherlands/602/2009 recombinant (rpH1N1) viruses, reflecting their temporal appearance in previous seasonal H1N1 viruses. Additional glycosylations resulted in substantially attenuated infection in mice and ferrets, whereas deleting HA glycosylation sites from a pre-pandemic virus resulted in increased pathogenicity in mice. We then more directly investigated the interactions of HA glycosylations and antibody responses through mutational analysis. We found that the polyclonal antibody response elicited by wild-type rpH1N1 HA was likely directed against an immunodominant region, which could be shielded by glycosylation at position 144. However, rpH1N1 HA glycosylated at position 144 elicited a broader polyclonal response able to cross-neutralize all wild-type and glycosylation mutant pH1N1 viruses. Moreover, mice infected with a recent seasonal virus in which glycosylation sites were removed elicited antibodies that protected against challenge with the antigenically distant pH1N1 virus. Thus, acquisition of glycosylation sites in the HA of H1N1 human influenza viruses affected not only their pathogenicity and ability to escape from polyclonal antibodies elicited by previous influenza virus strains but also their ability to induce cross-reactive antibodies against drifted antigenic variants.

110 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors is presented, revealing a diversity of previously undetected Lactobacillus crispatus variants.
Abstract: We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

14,505 citations

Journal ArticleDOI
TL;DR: It is shown that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics.
Abstract: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.

1,018 citations

Journal ArticleDOI
TL;DR: It is determined that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced and that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage.
Abstract: Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of ∼1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when “deep sequencing” genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.

944 citations

Journal ArticleDOI
TL;DR: The understanding of viruses as quasispecies has led to new antiviral designs, such as lethal mutagenesis, whose aim is to drive viruses toward low fitness values with limited chances of fitness recovery.
Abstract: Summary: Evolution of RNA viruses occurs through disequilibria of collections of closely related mutant spectra or mutant clouds termed viral quasispecies. Here we review the origin of the quasispecies concept and some biological implications of quasispecies dynamics. Two main aspects are addressed: (i) mutant clouds as reservoirs of phenotypic variants for virus adaptability and (ii) the internal interactions that are established within mutant spectra that render a virus ensemble the unit of selection. The understanding of viruses as quasispecies has led to new antiviral designs, such as lethal mutagenesis, whose aim is to drive viruses toward low fitness values with limited chances of fitness recovery. The impact of quasispecies for three salient human pathogens, human immunodeficiency virus and the hepatitis B and C viruses, is reviewed, with emphasis on antiviral treatment strategies. Finally, extensions of quasispecies to nonviral systems are briefly mentioned to emphasize the broad applicability of quasispecies theory.

852 citations

Journal ArticleDOI
TL;DR: This Review demonstrates the breadth of questions that are being addressed by Pool-seq but also discusses its limitations and provides guidelines for users.
Abstract: The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.

642 citations