scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.

TL;DR: Evaluated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice found the best performing tools were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and RegAv3 when analyzing most of the CRFs.
About: This article is published in Infection, Genetics and Evolution.The article was published on 2013-10-01 and is currently open access. It has received 297 citations till now. The article focuses on the topics: Subtyping.
Citations
More filters
Journal ArticleDOI
TL;DR: An ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression is presented, which was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets.
Abstract: Viral sequence classification has wide applications in clinical, epidemiological, structural and functional categorization studies. Most existing approaches rely on an initial alignment step followed by classification based on phylogenetic or statistical algorithms. Here we present an ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression. This tool, named COMET, was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets (1 090 698 and 10 625 sequences, respectively). COMET’s sensitivity and specificity were comparable to or higher than the two other subtyping tools on both data sets for known subtypes. COMET also excelled in detecting and identifying new recombinant forms, a frequent feature of the HIV epidemic. Runtime comparisons showed that COMET was almost as fast as USEARCH. This study demonstrates the advantages of alignment-free classification of viral sequences, which feature high rates of variation, recombination and insertions/deletions. COMET is free to use via an online interface.

272 citations


Cites methods or result from "Automated subtyping of HIV-1 geneti..."

  • ...These results agree with a recent comparison of eight subtyping algorithms conducted by an unrelated research group that concluded that COMET is one of the best performing subtyping tools (34)....

    [...]

  • ...COMET’s performance was then compared with the REGAv2 (18) and SCUEAL (19) tools, commonly recognized as the current ‘best of breed’ of published subtyping tools (34)....

    [...]

Journal ArticleDOI
TL;DR: Most TDR strains in SSA and SSEA arose independently, suggesting that ARV regimens with a high genetic barrier to resistance combined with improved patient adherence may mitigate TDR increases by reducing the generation of new ARV-resistant strains.
Abstract: Regional and subtype-specific mutational patterns of HIV-1 transmitted drug resistance (TDR) are essential for informing first-line antiretroviral (ARV) therapy guidelines and designing diagnostic assays for use in regions where standard genotypic resistance testing is not affordable. We sought to understand the molecular epidemiology of TDR and to identify the HIV-1 drug-resistance mutations responsible for TDR in different regions and virus subtypes.We reviewed all GenBank submissions of HIV-1 reverse transcriptase sequences with or without protease and identified 287 studies published between March 1, 2000, and December 31, 2013, with more than 25 recently or chronically infected ARV-naive individuals. These studies comprised 50,870 individuals from 111 countries. Each set of study sequences was analyzed for phylogenetic clustering and the presence of 93 surveillance drug-resistance mutations (SDRMs). The median overall TDR prevalence in sub-Saharan Africa (SSA), south/southeast Asia (SSEA), upper-income Asian countries, Latin America/Caribbean, Europe, and North America was 2.8%, 2.9%, 5.6%, 7.6%, 9.4%, and 11.5%, respectively. In SSA, there was a yearly 1.09-fold (95% CI: 1.05–1.14) increase in odds of TDR since national ARV scale-up attributable to an increase in non-nucleoside reverse transcriptase inhibitor (NNRTI) resistance. The odds of NNRTI-associated TDR also increased in Latin America/Caribbean (odds ratio [OR] = 1.16; 95% CI: 1.06–1.25), North America (OR = 1.19; 95% CI: 1.12–1.26), Europe (OR = 1.07; 95% CI: 1.01–1.13), and upper-income Asian countries (OR = 1.33; 95% CI: 1.12–1.55). In SSEA, there was no significant change in the odds of TDR since national ARV scale-up (OR = 0.97; 95% CI: 0.92–1.02). An analysis limited to sequences with mixtures at less than 0.5% of their nucleotide positions—a proxy for recent infection—yielded trends comparable to those obtained using the complete dataset. Four NNRTI SDRMs—K101E, K103N, Y181C, and G190A—accounted for >80% of NNRTI-associated TDR in all regions and subtypes. Sixteen nucleoside reverse transcriptase inhibitor (NRTI) SDRMs accounted for >69% of NRTI-associated TDR in all regions and subtypes. In SSA and SSEA, 89% of NNRTI SDRMs were associated with high-level resistance to nevirapine or efavirenz, whereas only 27% of NRTI SDRMs were associated with high-level resistance to zidovudine, lamivudine, tenofovir, or abacavir. Of 763 viruses with TDR in SSA and SSEA, 725 (95%) were genetically dissimilar; 38 (5%) formed 19 sequence pairs. Inherent limitations of this study are that some cohorts may not represent the broader regional population and that studies were heterogeneous with respect to duration of infection prior to sampling.Most TDR strains in SSA and SSEA arose independently, suggesting that ARV regimens with a high genetic barrier to resistance combined with improved patient adherence may mitigate TDR increases by reducing the generation of new ARV-resistant strains. A small number of NNRTI-resistance mutations were responsible for most cases of high-level resistance, suggesting that inexpensive point-mutation assays to detect these mutations may be useful for pre-therapy screening in regions with high levels of TDR. In the context of a public health approach to ARV therapy, a reliable point-of-care genotypic resistance test could identify which patients should receive standard first-line therapy and which should receive a protease-inhibitor-containing regimen.

206 citations

Journal ArticleDOI
TL;DR: It is suggested that, in addition to the impact of protein multimerization and immune selective pressure on HIV-1 diversity, HIV-human protein interactions are facilitated by high variability within intrinsically disordered structures.
Abstract: The HIV pandemic is characterized by extensive genetic variability, which has challenged the development of HIV drugs and vaccines. Although HIV genomes have been classified into different types, groups, subtypes and recombinants, a comprehensive study that maps HIV genome-wide diversity at the population level is still lacking to date. This study aims to characterize HIV genomic diversity in large-scale sequence populations, and to identify driving factors that shape HIV genome diversity. A total of 2996 full-length genomic sequences from 1705 patients infected with 16 major HIV groups, subtypes and circulating recombinant forms (CRFs) were analyzed along with structural, immunological and peptide inhibitor information. Average nucleotide diversity of HIV genomes was almost 50% between HIV-1 and HIV-2 types, 37.5% between HIV-1 groups, 14.7% between HIV-1 subtypes, 8.2% within individual HIV-1 subtypes and less than 1% within single patients. Along the HIV genome, diversity patterns and compositions of nucleotides and amino acids were highly similar across different groups, subtypes and CRFs. Current HIV-derived peptide inhibitors were predominantly derived from conserved, solvent accessible and intrinsically ordered structures in the HIV-1 subtype B genome. We identified these conserved regions in Capsid, Nucleocapsid, Protease, Integrase, Reverse transcriptase, Vpr and the GP41 N terminus as potential drug targets. In the analysis of factors that impact HIV-1 genomic diversity, we focused on protein multimerization, immunological constraints and HIV-human protein interactions. We found that amino acid diversity in monomeric proteins was higher than in multimeric proteins, and diversified positions were preferably located within human CD4 T cell and antibody epitopes. Moreover, intrinsic disorder regions in HIV-1 proteins coincided with high levels of amino acid diversity, facilitating a large number of interactions between HIV-1 and human proteins. This first large-scale analysis provided a detailed mapping of HIV genomic diversity and highlighted drug-target regions conserved across different groups, subtypes and CRFs. Our findings suggest that, in addition to the impact of protein multimerization and immune selective pressure on HIV-1 diversity, HIV-human protein interactions are facilitated by high variability within intrinsically disordered structures.

99 citations

Journal ArticleDOI
TL;DR: A literature review of computational classification workflows for virus metagenomics provides two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral meetagenomics.
Abstract: Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.

96 citations

Journal ArticleDOI
TL;DR: Improved VL monitoring to prevent accumulation of mutations, and new drug classes to construct fully active regimens, are required.
Abstract: Objectives: Limited availability of viral load (VL) monitoring in HIV treatment programmes in sub-Saharan Africa can delay switching to second-line ART, leading to the accumulation of drug resistance mutations (DRMs). The objective of this study was to evaluate the accumulation of resistance to reverse transcriptase inhibitors after continued virological failure on first-line ART, among adults and children in sub-Saharan Africa. Methods: HIV-1-positive adults and children on an NNRTI-based first-line ART were included. Retrospective VL and, if VL ≥1000 copies/mL, pol genotypic testing was performed. Among participants with continued virological failure (≥2 VL ≥1000 copies/mL), drug resistance was evaluated. Results: At first virological failure, DRM(s) were detected in 87% of participants: K103N (38.7%), G190A (21.8%), Y181C (20.2%), V106M (8.4%), K101E (8.4%), any E138 (7.6%) and V108I (7.6%) associated with NNRTIs, and M184V (69.7%), any thymidine analogue mutation (9.2%), K65R (5.9%) and K70R (5.0%) associated with NRTIs. New DRMs accumulated with an average rate of 1.45 (SD 2.07) DRM per year; 0.62 (SD 1.11) NNRTI DRMs and 0.84 (SD 1.38) NRTI DRMs per year, respectively. The predicted susceptibility declined significantly after continued virological failure for all reverse transcriptase inhibitors (all P < 0.001). Acquired drug resistance patterns were similar in adults and children. Conclusions: Patterns of drug resistance after virological failure on first-line ART are similar in adults and children in sub-Saharan Africa. Improved VL monitoring to prevent accumulation of mutations, and new drug classes to construct fully active regimens, are required.

81 citations

References
More filters
Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations


"Automated subtyping of HIV-1 geneti..." refers methods in this paper

  • ...The MPhy of concordant sequences was performed by using the 2008 Los Alamos curated subtypes and CRFs reference dataset (available at http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html), the sequences were aligned with ClustalW (Thompson et al., 1994) and, if needed, the alignment was minimally edited with BioEdit (Hall, 1999)....

    [...]

  • ...…by using the 2008 Los Alamos curated subtypes and CRFs reference dataset (available at http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html), the sequences were aligned with ClustalW (Thompson et al., 1994) and, if needed, the alignment was minimally edited with BioEdit (Hall, 1999)....

    [...]

  • ...html), the sequences were aligned with ClustalW (Thompson et al., 1994) and, if needed, the alignment was minimally edited with BioEdit (Hall, 1999)....

    [...]

Journal ArticleDOI
TL;DR: The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site.
Abstract: Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

39,110 citations


"Automated subtyping of HIV-1 geneti..." refers methods in this paper

  • ...slow MPhy using Maximum Likelihood trees with 1000 bootstrap replicates and the best-fitting nucleotide substitution model (in this case GTR + I + C) (Posada, 2008; Tamura et al., 2011)....

    [...]

  • ...…of the discordant sequences (Kuhner and Felsenstein, 1994; Leitner et al., 1996), we used as gold standard a slow MPhy using Maximum Likelihood trees with 1000 bootstrap replicates and the best-fitting nucleotide substitution model (in this case GTR + I + C) (Posada, 2008; Tamura et al., 2011)....

    [...]

Journal ArticleDOI
TL;DR: UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML) that has been used to compute ML trees on two of the largest alignments to date.
Abstract: Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2--3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively. Availability: icwww.epfl.ch/~stamatak Contact: Alexandros.Stamatakis@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online.

14,847 citations


"Automated subtyping of HIV-1 geneti..." refers methods in this paper

  • ...Finally, to verify these assignment, all sequences thus assigned subtype G or CRF14_BG were pooled with all full genome CRF14_BG and full genome subtype G sequences from LANL, and a single unrooted tree was constructed using RAxML (Stamatakis, 2006) (supplementary Fig....

    [...]

Journal ArticleDOI
TL;DR: A new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves and a new test to assess the support of the data for internal branches of a phylogeny are introduced.
Abstract: PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

14,385 citations


"Automated subtyping of HIV-1 geneti..." refers methods in this paper

  • ...Therefore, in addition to fast MPhy for concordant sequences, all sequences that were assigned by any of the subtyping tools as either these CRFs or the parent pure subtype (even when concordant) were also analyzed with slow MPhy (Guindon et al., 2010), which included all complete genomes of the CRF and parent pure subtype as reference sequences....

    [...]

  • ...…sequences, all sequences that were assigned by any of the subtyping tools as either these CRFs or the parent pure subtype (even when concordant) were also analyzed with slow MPhy (Guindon et al., 2010), which included all complete genomes of the CRF and parent pure subtype as reference sequences....

    [...]

Related Papers (5)