scispace - formally typeset
Open AccessPosted ContentDOI

A comprehensive automated pipeline for human microbiome sampling, 16S rRNA gene sequencing and bioinformatics processing

Reads0
Chats0
TLDR
A flexible automated approach to process intestinal biopsies, fecal samples and vaginal swabs from sample collection to OTU table is described, and a set of guidelines and best practices for each of these steps are presented.
Abstract
The advent of affordable high-throughput DNA sequencing has opened up a golden age of studies in the human microbiome. In order to understand the role of the human microbiota, standardized methods for large-scale, population-level studies are needed to avoid underpowered or poorly designed studies. The biggest bottlenecks to population-level microbiomics are sample collection, storage and DNA extraction. Here, we describe a flexible automated approach to process intestinal biopsies, fecal samples and vaginal swabs from sample collection to OTU table. We have evaluated storage conditions, DNA extraction methods, PCR strategies and bioinformatic pipelines for these three sample types, and present here a set of guidelines and best practices for each of these steps.

read more

Content maybe subject to copyright    Report

TITLE
A comprehensive automated pipeline for human microbiome sampling, 16S rRNA gene
sequencing and bioinformatics processing
AUTHORS
Luisa W. Hugerth – Center for Translational Microbiome Research, Department of
Molecular, Tumour and Cell Biology, Karolinska Institutet, Science for Life Laboratory,
Stockholm, Sweden
Maike Seifert - Center for Translational Microbiome Research, Department of
Molecular, Tumour and Cell Biology, Karolinska Institutet, Science for Life Laboratory,
Stockholm, Sweden
Alexandra A. L. Pennhag - Center for Translational Microbiome Research,
Department of Molecular, Tumour and Cell Biology, Karolinska Institutet, Stockholm, Sweden
Juan Du - Center for Translational Microbiome Research, Department of Molecular,
Tumour and Cell Biology, Karolinska Institutet, Stockholm, Sweden
Marica C. Hamsten - Center for Translational Microbiome Research, Department of
Molecular, Tumour and Cell Biology, Karolinska Institutet, Science for Life Laboratory,
Stockholm, Sweden
Ina Schuppe-Koistinen - Center for Translational Microbiome Research, Department
of Molecular, Tumour and Cell Biology, Karolinska Institutet, Science for Life Laboratory,
Stockholm, Sweden
Lars Engstrand - Center for Translational Microbiome Research, Department of
Molecular, Tumour and Cell Biology, Karolinska Institutet, Science for Life Laboratory,
Stockholm, Sweden
POSTAL ADDRESS
Luisa W. Hugerth
Engstrand group
Institutionen för mikrobiologi, tumör- och cellbiologi (MTC)
Nobels väg 16
KI Solna Campus Karolinska Institutet
SE-171 77 Stockholm, Sweden
.CC-BY 4.0 International licenseavailable under a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 21, 2018. ; https://doi.org/10.1101/286526doi: bioRxiv preprint

ABSTRACT
The advent of affordable high-throughput DNA sequencing has opened up a golden
age of studies in the human microbiome. In order to understand the role of the human
microbiota, standardized methods for large-scale, population-level studies are needed to
avoid underpowered or poorly designed studies. The biggest bottlenecks to population-level
microbiomics are sample collection, storage and DNA extraction. Here, we describe a
flexible automated approach to process intestinal biopsies, fecal samples and vaginal swabs
from sample collection to OTU table. We have evaluated storage conditions, DNA extraction
methods, PCR strategies and bioinformatic pipelines for these three sample types, and
present here a set of guidelines and best practices for each of these steps.
AUTHOR LIST
Hugerth LW, Seifert M, Pennhag AAL, Du J, Hamsten MC, Schuppe-Koistinen I,
Engstrand L
INTRODUCTION
The advent of affordable high-throughput DNA sequencing has opened up a golden
age for studies in the human microbiome. Sampling strategies covering hundreds of subjects
[1–3] or comprehensive spatial or temporal sampling of a few individuals are now possible
[4,5]. The explosion of studies in microbiomics combined with the rapid adoption of this
research field by researchers of various backgrounds has increased the risk of publishing
underpowered or otherwise ill-designed studies. Today, large-scale, population or hospital-
based studies are often needed to increase our understanding of the role of the microbiome
in various diseases. With the time from sample preparation to sequencing results now
counted in days, the biggest bottleneck to population-level microbiomics are now sample
collection, storage and DNA extraction. In addition to being expensive and time-consuming,
a sub-optimal DNA extraction can lead to severe biases in the study, and ultimately false
conclusions [6].
One of the most common source materials for human microbiome studies are faecal
samples. The large intestine has the greatest concentration of bacteria in the human body
[7] and the fecal microbiome has been linked to a wide variety of gastrointestinal [8,9],
metabolic [10,11] and even neurological conditions [12,13]. Faecal samples can also be
collected at a moderate cost and non-invasively, making this a suitable and popular target for
studies of the human microbiome.
One problem with faecal samples is that they represent a large bulk volume which is
not in direct contact with the host’s mucosal lining. While it is reasonable to assume that
products of microbial metabolism in the luminal space, such as short chain fatty acids, can
affect host physiology [14] it is also true that bacteria living in intimate association with the
mucus layer in the gut lining likely have a stronger effect in modulating the host’s immune
response [15]. As these niches present quite different selection pressures, bacteria found
attached to the gut lining form a clearly separate community from those in the luminal space
[16] and can only be queried through the use of gut biopsies.
.CC-BY 4.0 International licenseavailable under a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 21, 2018. ; https://doi.org/10.1101/286526doi: bioRxiv preprint

While biopsies obtained from the gastrointestinal tract allow the investigation of
bacteria in tighter attachment and deeper layers than a simple swab, they are a different
type of material compared to faecal samples. Firstly, contrary to a faecal sample, the vast
majority of the DNA in a typical biopsy is from the host, rather than bacterial. Furthermore,
the bacteria are living in a complex three-dimensional biofilm, which might be harder to
disrupt. Available commercial kits for selectively removing human DNA are inappropriate
when the study design includes host genotype or eukaryotic microbe profiling and might
inadvertently remove part of the microbial diversity. Consequently, the risk of introducing bias
is obvious. Therefore, a DNA library preparation method of broad applicability needs to be
robust to overwhelming proportions of host DNA.
Other host surfaces, while not particularly rich in host DNA, present a chemically
complex extracellular matrix, which can hinder DNA purification, and, in some cases, a
relatively low bacterial cell count. Sputum and saliva are good examples of this, as is vaginal
mucus. The latter is a particularly important target in gynecological screening [17] and might
be an important prognostic tool for obstetric and neonatal health [3]. A human microbiome
pipeline of general applicability should also apply to mucus-associated microbes.
Faecal and mucus samples are relatively easy to retrieve and can often be collected
by the research subject at home. This raises the issue of the correct storage procedure for
these materials. Left at room temperature, a bacterial community can present significant
shifts after only a few minutes, due to overgrowth of oxygen-tolerant microorganisms.
Therefore, it is crucial to assess bacteriostatic and preservation strategies. It is important to
consider cost, ease of use for the research subject, non-toxicity, quality of sample
preservation and compatibility with downstream applications.
Once a good procedure for sample collection and DNA extraction has been
established, the next challenge for an amplicon-based study (eg 16S rRNA gene surveys) is
an appropriate PCR strategy [6]. The most crucial choice is the selection of broad-taxonomic
range primers compatible with the target community [18]. The thermodynamic characteristics
of the primer pair will compound to the biases, through preferential annealing or incomplete
melting of GC-rich sequences [19]. It is also crucial to work under appropriate molecular
biological conditions, considering that a single molecule of contaminating DNA can be
amplified to 1000 copies after only ten PCR cycles. Reducing the number of PCR cycles can
thus ensure a less biased picture of the community. Avoiding intermediate cleaning steps
also reduces the risk of sample spillover and cross-contamination. Finally, before
sequencing, sample pooling is another sensitive step, where the depth of sequencing for
each sample is determined.
Challenges still remain after DNA sequencing, though, since bioinformatic processing
presents its own set of challenges [6]. In the early days of metabarcoding, clustering was
necessary, partly to collapse erroneous sequences to true biological diversity and partly to
make sequence clusters (operational taxonomic units, OTU) large enough for quantitative
statistical methods to apply. The latter is not an issue with current high-throughput
technologies, which typically provide sufficient data for much finer clustering. Sequencing
errors and minor biological variation, however, do artificially inflate the number of unique
OTU, compared to the true number of sequences or strains [20]. However, many modern
error correction strategies eschew the need for an a priori similarity cut-off [21–23]. This is
crucial for vaginal microbiome studies. While most vaginal communities are dominated by
Lactobacilli, it has been shown that communities characterized by a dominance of L. iners
are less stable than those dominated by L. crispatus or L. gasseri. A species-level
identification is therefore crucial. This issue is even more extreme for skin microbiome
.CC-BY 4.0 International licenseavailable under a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 21, 2018. ; https://doi.org/10.1101/286526doi: bioRxiv preprint

studies, where it is necessary to differentiate Staphylococcus epidermidis from S. aureus,
although they differ by only 14 bp difference over their whole 16S rRNA gene, and by only 2
bp in the commonly used 341-806 region.
Using a high-resolution error correction method requires a taxonomic assignment
strategy of compatible sensitivity. While there are good tools available for general taxonomic
surveys [24,25], specific research questions sometimes require custom-made taxonomic
approaches [26,27].
Here, we describe a flexible automated approach to process intestinal biopsies,
faecal samples, vaginal swabs and saliva samples, from collection to OTU table. We have
eavlueated sample storage, DNA extraction, PCR and bioinformatic pipelines for these three
sample types. We present a set of guidelines and best practices for each of these steps,
which was shown to also work for saliva samples and can likely be extended to other swabs
and bodily fluids.
RESULTS AND DISCUSSION
Sample collection and storage
Biopsies are necessarily taken in a hospital or clinic, and therefore present the best
conditions for sample preservation. Fresh biopsies are sometimes placed in filter paper after
extraction. We notice that, in this case, the paper should also be submitted to DNA
extraction, together with the tissue. The same thing applies to biopsies or swabs preserved
in liquid medium, where both the liquid and the solid fractions should be taken for extraction.
We have compared freezing fresh samples at -80°C or freezing them in three distinct
preservation media, RNALater, Allprotect and DNA/RNA Shield (see Methods for details).
RNALater did not give as high DNA yields as the other two methods (data not shown).
DNA/RNA shield is compatible with all downstream steps and presented excellent storage
characteristics, as described below, and was therefore selected for further sampling.
Faecal samples are often collected at the patient's home, since it can be difficult to
produce the material at the time of clinical examination. This means that a -80°C freezer is
not available, although a -20°C often is. Even then, there is risk of thawing during
transportation, so a preservation medium might be required. We compared two faecal
samples from the same healthy volunteer: one sample was immediately frozen at -20°C,
while the other was divided into two fractions, whereof one was placed in DNA/RNA shield
and the other kept dry at -20°C, simulating patient self-collection. The next day, the sample
was briefly thawed and homogenized in DNA/RNA shield. Then, one fraction of the
homogenate was immediately extracted while the others were kept for 8 days at -20°C or
-80°C. There was a clear difference in the diversity of the sample kept in DNA/RNA shield as
compared to the one frozen dry (fig 1a). We hypothesize that this is due to overgrowth of
aerotolerant microbes prior to freezing and possibly during the intermediate thawing, leading
to a skewed community. The same did not happen in DNA/RNA shield, which inactivates
bacteria in seconds to minutes. Further preservation of the sample for up to 8 days in either
-20°C or -80°C didn’t affect the inferred community, showing that getting the sample quickly
from the patient’s home to the clinic is of minor concern, as long as the sample is efficiently
inactivated (fig 1b).
.CC-BY 4.0 International licenseavailable under a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 21, 2018. ; https://doi.org/10.1101/286526doi: bioRxiv preprint

Vaginal swabs can also be collected at home. We observed no difference between
samples collected by the patients themselves or samples collected by midwives in a clinical
setting (fig 1c). Swabs were kept at room temperature in DNA/RNA shield for up to three
days before being frozen at -80°C. Since bacteria might remain attached to the swab or
come loose in the medium, both of these were included in the DNA extraction.
DNA extraction
A good DNA extraction strategy should be effective, have minimal hands-on time, be
non-toxic, automatable and highly reproducible. Two commercial solutions were assessed,
MoBio’s PowerMag and Zymo Research’s ZR-96 Genomic DNA MagPrep . No conclusive
difference in data quality was found between the MoBio and the Zymo approaches.
However, since the latter doesn’t require centrifugation, making it more suitable for
automation, it was selected for further optimization. The MoBio kit also uses β-
mercaptoethanol, a strong smelling solvent which may require the use of a chemical hood.
Bead-beating is a crucial step for homogenizing samples, destroying extracellular
matrix and opening up cells with tough walls, such as Gram-positive bacteria. We therefore
paid special attention to this issue, considering the size of the beads used, the duration of
bead-beating and the amount of starting material. All samples were homogenised in an initial
bead-beating procedure. After digestion with lysozyme and proteinase K, we assessed
whether an extra bead beating step, with finer beads, could yield increased recovery of
Gram-positive bacteria. We found that the extra bead-beating increases slightly the DNA
yield for all samples, but doesn’t make a large difference in their overall composition (fig 2).
Due to considerations on time, cost and contamination risk, this additional bead-beating step
was not performed in subsequent experiments.
In addition to the physical steps of heating and bead-beating, a chemical digestion of
bacterial cell walls is needed for DNA extraction. Besides the proteinase K step, we have
assessed the efficiency of pure lysozyme compared to Molzym’s BugLysis kit. No difference
was observed, and Lysozyme was selected for further optimization. The time and
temperature of incubation in lysozyme was then optimized, showing the reaction to be fairly
temperature insensitive to temperature and time, with an incubation of 30-60 minutes
functioning equally well (fig 3), after which the quality of the extracted DNA might fall.
The final step of DNA extraction is to release the pure DNA molecules into solution for
storage and downstream applications. For this step, three possibilities were considered:
milliQ water, Tris-Cl/EDTA (TE) and Tris-Cl (EB). Since water doesn’t preserve DNA quality
as well as the buffers, and EDTA is incompatible with certain molecular applications, such as
the use of restriction enzymes, EB was selected as the elution buffer.
Finally, for shotgun metagenomics or eukaryotic marker gene amplification, it is tempting
to deplete the sample from host DNA, specially from biopsies. However, we have found that
the Molzym treatment for human DNA removal, developed for blood samples, also removes
bacterial DNA and shows preferential removal of specific clades, most notably Clostridiales,
when applied to intestinal biopsies (fig. 4).
During the preparation of this manuscript, Zymo Research phased out the ZR-96
Genomic DNA MagPrep kit and replaced with the kit Quick-DNA MagBead Plus. The DNA
yield for this kit is generally higher, specially for vaginal samples, but does not present a
higher level of background DNA (suppl. fig. 1a). This difference in DNA yield is likely due to
a better rupture of Gram-positive cells, such as Lactobacilli.
.CC-BY 4.0 International licenseavailable under a
not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 21, 2018. ; https://doi.org/10.1101/286526doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI

Colitis-induced colorectal cancer and intestinal epithelial estrogen receptor beta impact gut microbiota diversity

TL;DR: It is demonstrated that colitis‐induced CRC reduced the gut microbiota diversity and that loss of ERβ enhanced this process, and the data support that intestinal ERβ contributes to a more favorable microbiome that could attenuate CRC development.
Journal ArticleDOI

No distinct microbiome signature of irritable bowel syndrome found in a Swedish random population

TL;DR: The faecal and mucosa-associated microbiome (MAM) and health correlates on a community cohort of healthy and IBS individuals with no colonoscopic findings showed no distinct microbial signature was observed in IBS.
Journal ArticleDOI

Beyond Just Bacteria: Functional Biomes in the Gut Ecosystem Including Virome, Mycobiome, Archaeome and Helminths.

TL;DR: The recent evidence on the viruses, fungi, archaea, and helminths found in the mammalian gut are reviewed, detailing their interactions with the resident bacterial microbiota and the host, to explore the potential impact of the microbiome on host’s health.
Journal ArticleDOI

Effects of sampling strategy and DNA extraction on human skin microbiome investigations.

TL;DR: Analysis of sampling technique and DNA extraction data indicate that a reduction of human DNA from 90% to 57% is feasible without lowering the success of 16S rRNA library preparation and without introducing taxonomic bias.
References
More filters
Journal ArticleDOI

Cutadapt removes adapter sequences from high-throughput sequencing reads

TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Journal ArticleDOI

Search and clustering orders of magnitude faster than BLAST

Robert C. Edgar
- 01 Oct 2010 - 
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Journal ArticleDOI

DADA2: High-resolution sample inference from Illumina amplicon data

TL;DR: The open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors is presented, revealing a diversity of previously undetected Lactobacillus crispatus variants.
Journal ArticleDOI

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies

TL;DR: The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.
Journal ArticleDOI

Ribosomal Database Project: data and tools for high throughput rRNA analysis

TL;DR: RDP now includes a collection of fungal large subunit rRNA genes, and most tools are now available as open source packages for download and local use by researchers with high-volume needs or who would like to develop custom analysis pipelines.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A comprehensive automated pipeline for human microbiome sampling, 16s rrna gene sequencing and bioinformatics processing authors" ?

Here, the authors describe a flexible automated approach to process intestinal biopsies, fecal samples and vaginal swabs from sample collection to OTU table. The authors have evaluated storage conditions, DNA extraction methods, PCR strategies and bioinformatic pipelines for these three sample types, and present here a set of guidelines and best practices for each of these steps.