Shotgun metagenomics, from sampling to analysis
read more
Citations
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
Metagenomic biomarker discovery and explanation
Best practices for analysing microbiomes.
Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle.
MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis.
References
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Trimmomatic: a flexible trimmer for Illumina sequence data
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Related Papers (5)
Structure, function and diversity of the healthy human microbiome
Fast and sensitive protein alignment using DIAMOND
QIIME allows analysis of high-throughput community sequencing data.
Frequently Asked Questions (17)
Q2. How can a beadbeating technique be used to extract DNA?
Vigorous extraction techniques such as beadbeating can result in shortened DNA fragments, which can contribute to DNA loss during library preparation methods that use fragment size selection techniques.
Q3. What are the main limitations facing researchers now?
The main limitations facing researchers now are the costs of training computational scientists for analyzing the complex metagenomic datasets, and of sequencing enough samples for properly powered study designs.
Q4. What are the main advantages of assembly-free methods?
Taxonomic profiling by selecting representative or discriminative genes (markers) from available reference sequences is another fast and accurate assembly-free approach that has been implemented with several variations.
Q5. What are the main factors in choosing a library preparation and sequencing method?
Choosing a library preparation and sequencing method hinges on availability of materials and services, cost, ease of automation, and DNA sample quantification.
Q6. What are the ways to assess the impact of such issues?
To help evaluate the extent of such issues, randomly chosen control wells containing known spiked-in organisms as positive controls, and template negative controls should be used to assess the impact of these issues.
Q7. What are the main advantages of assembly-free taxonomic profilers?
Assembly-free taxonomic profilers with species-level resolution utilize information available in reference genomes 74 and in environment-specific assemblies 75, and have been used in the largest human-associated metagenomics investigations performed so far 2,5,75-80.
Q8. How many reagents can be used to generate up to 10 gigabases per?
long read sequencing technologies such as the Oxford Nanopore MinION and Pacific Biosciences Sequel have scaled up output and can reliably generate up to 10 gigabases per run and may therefore soon start to see adoption for metagenomics studies.
Q9. How did the MetaHIT consortium characterize organisms in the human gut?
By looking at co-abundant markers from pre-assembled environment specific gene catalogs 84,85, for example, the MetaHIT consortium was able to characterize known and novel organisms in the human gut 5,75.
Q10. What are the main reasons for using marker-based methods?
For large datasets with hundreds of samples on which performing or interpreting metagenomics assembly is impractical, markerbased approaches are currently the method of choice especially for environments with a substantial fraction of microbial diversity covered by well-characterized sequenced species.
Q11. How did the first algorithms perform the clustering?
The first algorithms, e.g. extended self-organising maps 64, required human input to perform the clustering, which is based on coverage information and composition that could be visualized in 2D 65.
Q12. What is the informative metric for binning?
Metrics based on these k-mer frequencies can be used to bin contigs, with tetramers considered the most informative for binning of metagenomics data 58.
Q13. How many MAGs were recovered from acetate enriched and filtered groundwater?
The recovery of nearly a thousand MAGs from candidate phyla, with no cultured representatives, from acetate enriched and filtered groundwater samples showcased the potential of this approach8.
Q14. What is the main limiting factor in profiling the metabolic potential of a community?
Regardless of whether an assembly-free or assembly-based approach is adopted, the main limiting factor in profiling the metabolic potential of a community is the lack of annotations for accessory genes in most microbial species (with the exception of selected model organisms, Box 1).
Q15. What is the consensus on how well different assemblers perform?
There is little community consensus on how well different assemblers perform with respect to key metrics such as completeness, continuity and propensity to generate chimeric contigs.
Q16. How can a study be conducted to prevent the spread of microbes?
It is usually possible to mitigate against potential confounders in the study design by housing animals individually to prevent the spread of microbes between cage mates (although this may introduce behavioural changes, potentially resulting in different biases), mixing animals derived from different experimental cohorts together within the same cage, or repeating experiments with mouse lines obtained from different vendors or with different genetic backgrounds 25.
Q17. How many times do samples go through freeze-thaw cycles?
Since several studies have shown that factors such as length of time between sample collection and freezing 29 or the number of times samples go through freeze-thaw cycles can affect the microbial community profiles that are detected, both collection and storage protocols/conditions should be recorded (Supplementary Box 1).