SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences
read more
Citations
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin
Bacterial metabolism of bile acids promotes generation of peripheral regulatory T cells
Kombucha Beverage from Green, Black and Rooibos Teas: A Comparative Study Looking at Microbiology, Chemistry and Antioxidant Activity.
A few Ascomycota taxa dominate soil fungal communities worldwide
IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences
References
QIIME allows analysis of high-throughput community sequencing data.
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy
UPARSE: highly accurate OTU sequences from microbial amplicon reads
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
Related Papers (5)
QIIME allows analysis of high-throughput community sequencing data.
Frequently Asked Questions (10)
Q2. What is the phylum rank of the accRDP algorithm?
At phylum rank, EPQ is given as the measure of error rate since almost allphyla are known so OC cannot be measured reliably and MC ≈ EPQ.
Q3. How does QIIME use a subset of Greengenes?
By default, QIIME uses a subset of Greengenes clustered at 97% identity (GGQ, containing99k sequences in v13.8), and mothur recommends a subset of SILVA (SILVAM, containing172k sequences in v123).
Q4. What is the way to determine the bootstrap confidence of a given taxonomy?
If Uall1 ≫ Uall2 then Usubset1 will be greater than Usubset2 in most or all iterations and C1 will therefore have high bootstrap confidence.
Q5. What is the average number of non-singleton training sequences in RTS?
The average number of non-singletontraining sequences is 9 per genus in RTS and 14 per species in Warcup which suggests thatcorrect classification should be relatively easy for most queries, while in practice manygenera will be novel, and taxa that are rare in the database may be common in the query setand vice versa.
Q6. What is the way to determine the bootstrap confidence of a given sequence?
For a givenquery sequence, consider reference sequences ranked using all k-mers, i.e. in order of decreasing Uall(r) = |W(Q) ⋂ W(r)|.
Q7. What are the available databases for predicting the taxonomy of sequences?
Available databases include the RDP training sets, thefull RDP database (RDPDB) (Maidak et al., 2001), SILVA (Pruesse et al., 2007), Greengenes(DeSantis et al., 2006) and UNITE (Kõljalg et al., 2013).
Q8. What is the genus classification in RDPDB?
This result suggeststhat many of the genus annotations in RDPDB, most of which were predicted by RDP at80% bootstrap, may be false positives as 47% of the 3.2M RDPDB sequences have top-hitidentity <95% with RDPTS, implying that roughly half belong to novel genera.
Q9. What is the default method for classify?
The mothur classify.seqs command was run withmethod=wang (Mrdp, the default, a re-implementation of the RDP algorithm) andmethod=knn (Mknn).
Q10. What is the difference between a singleton and a ITS?
A singletoncannot be classified correctly in a leave-one-out test because no training sequences are leftfor its clade so that the maximum achievable AccRDP by an ideal algorithm is the fraction of non-singleton taxa, i.e. 92% for 16S genus and 87% for ITS species, rather than 100% aswould usually be expected for an accuracy measure.