scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Sequencing technologies-the next generation

01 Jan 2010-Nature Reviews Genetics (Nature Publishing Group)-Vol. 11, Iss: 1, pp 31-46
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • DNA sequencing is one of the most important platforms for study in biological systems today.
  • The high-throughput-next generation sequencing technologies delivers fast, inexpensive, and accurate genome information.
  • Next generation sequencing can produce over 100 times more data than methods based on Sanger Sequencing.
  • The next generation sequencing technologies offered from Illumina / Solexa, ABI/SOLiD, 454/Roche, and Helicos has provided unprecedented opportunity for high-throughput functional genomic research.
  • Next generation sequence technologies offer novel and rapid ways for genome-wide characterization and profiling of mRNA's, transcription factor regions, and DNA patterns.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

TEMPLATE DESIGN © 2008
www.PosterPresentations.com
ABSTRACT
Conclusion and Future Work
Next Generation Sequencing
CONTACT INFO
Data Analysis Comparisons
Downstream Analysis
REFERENCES
DNA sequencing is one of the most important platforms for
study in biological systems today. The high-throughput-next
generation sequencing technologies delivers fast,
inexpensive, and accurate genome information. Next
generation sequencing can produce over 100 times more data
than methods based on Sanger Sequencing. The next
generation sequencing technologies offered from Illumina /
Solexa, ABI/SOLiD, 454/Roche, and Helicos has provided
unprecedented opportunity for high–throughput functional
genomic research. Next generation sequence technologies
offer novel and rapid ways for genome-wide characterization
and profiling of mRNAs, transcription factor regions, and DNA
patterns.
Fig. 7) This is a plot of the frequency of each percentage covered for all nodes.
BLAST is in blue, MUMmer is in green.
Sequencing Technologies – the Next Generation,
Micahel L. Metzkerh
Next Generation Sequencing Pipeline Development and Data Analysis
Fig. 9) This is a plot of the coverage of each Node. BLAST points are blue,
MUMmer points are red.
Fig. 6) This is a plot of the frequency of each percentage covered for all contigs.
BLAST is in blue, MUMmer is in green.
454/Roche – 454 Life Sciences is a Biotechnology company
that is a part of Roche and based in Branford, Connecticut.
The center develops ultra-fast high-throughput DNA
sequencing methods and tools.
Illumina/Solexa– Illumina is a company that develops and
manufactures integrated systems for the analysis of gene
variation. Solexa was founded to develop genome
sequencing technology.
ABI/SOLiD - (Sequencing by Oligonucleotide Ligation and
Detection) is a next-generation DNA sequencing technology
developed by Life Technologies and has been commercially
available since 2006. This next generation technology
generates hundreds of millions to billions of small sequence
reads at one time.
Helicos - Helicos's technology images the extension of
individual DNA molecules using a defined primer and
individual fluorescently labeled nucleotides, which contain a
"virtual terminator" preventing incorporation of multiple
nucleotides per cycle.
Julian Pierre
1
, Jordan Taylor
2
, Amit Upadhyay
3
, Bhanu Rekepalli
3
Fig. 8) This is a plot of the coverage of each Contig. BLAST points are blue,
MUMmer points are red.
Using the coverage of
each individual contig
ID, the results for both
BLAST and MUMmer
were plotted. While
BLAST hit more contigs,
there are more contigs
with a higher coverage
that were hit by
MUMmer.
Using the data gathered
from both BLAST and
MUMmer, the frequency
of the amount covered
for each contig was
plotted. From Fig 6), it
can be inferred that
MUMmer hit more
accurately for contigs.
Fig 4) from main.g2.bx.psu.edu
Once the results were found using both the BLAST and
MUMmer search tools, we created a program to see which
sequencing tool had the most hits per contig. The total
number of contigs in the database file is 160,749 and the
total number of nodes in the query file is 552,305. BLAST
returned a total of 123,070 hits and MUMmer returned a
total of 121,829 hits. From the results, MUMmer hit more
accurately than BLAST while BLAST hit more contigs than
MUMmer.
In Next-Generation Sequencing, data analysis is one of the
most expensive processes. While the cost of genome
sequencing goes down, the cost of analyzing data is still
expensive. In the future, the “$1,000 genome will come with
a $20,000 analysis price tag.”
The same process was
done with the Nodes.
From Fig 7), it can be
inferred that BLAST hit
more accurately with
nodes. However, there
are more BLAST results
with lower coverage.
The future of next generation sequencing can be broken
down into a variety of categories such as personalized
medicine, bio fuels, climate change, and other life science
fields.
Personalized Medicine is a medical model that proposes
the customization of medical decision to tailor an
individual
Bio Fuels present a source of alternative energy.
Microalgal biofuels use algae to synthesize the fuel. In
order to optimize the process, an understanding of the
gene-function relationship of algae would prove helpful.
Climate change is the active study of past and future
theoretical models which uses the past climate data to
make future projections.
In conclusion, we hope to contribute the knowledge we
have gained to contribute to fields such as these.
The same process was
done with the Nodes.
While BLAST hit more
Nodes, there are more
Nodes that hit with a
lower coverage using
BLAST.
1 Texas Southern University, 2 Austin Peay State University, 3 University of Tennessee
Next Gen Sequencing uses a wide array of tools to obtain results based
on the genome sequence. The most widely used Tools are BLAST,
HMMER, and MUMmer.
BLAST (Basic Local Alignment Search Tool) is a multi-sequence
alignment tool developed by NIH (National Institute of Health). It is
used find similar regions in different sequences and then compare
their similarities.
MUMmer (Maximum Unique Matches) is a rapid alignment system
used for rapidly aligning entire genomes. It can also align incomplete
genomes and can easily handle 1000’s of contigs from a shotgun
sequencing project.
HMMER (Hidden Markov Modeler) is used for searching sequence
databases for homologs of protein sequences, and for making protein
sequence alignments. It implements methods using probabilistic
models called profile hidden Markov models (HMMs)
Genome Assembly
Sequence Analysis refers to
the process of subjecting a
DNA, RNA or peptide
sequence to a wide range of
analytical methods to:
Compare sequences to find
similarities and infer if they
are Homologous
To identify the features of
the sequence such as gene
structure, distribution,
introns and exons, and
regulation of gene
expression
Identify Sequence
differences and variations
such as mutations
Fig. 1) This is figure shows three different Next Generation Sequencing methods. [2]
Fig. 2) Taken from A Hitchhiker’s Guide to Next-Generation Sequencing, by Gabe Rudy
Fig. 3) Taken from bio.davidson.edu/courses. Shows alignment results for yeast.
Fig 5) from jcvi.org shows the mapping of chr6 of a Human Genome
Julian Pierre – julz_pierre@yahoo.com
Jordan Taylor – jtaylor74@my.apsu.edu
Amit Upadhyay – aupadhy1@utk.edu
Bhanu Rekepalli – brekapal@utk.edu
http://www.roche.com/research_and_development/r_d_overview/
r_d_sites.htm?id=18
http://www.pnas.org/content/99/6/3712/F1.expansion.html
http://www.yerkes.emory.edu/nhp_genomics_core/Services/
Sequencing.html
http://www.illumina.com/technology/solexa_technology.ilmn
http://blast.ncbi.nlm.nih.gov/Blast.cgi
https://main.g2.bx.psu.edu/u/dan/p/fastq
http://ori.dhhs.gov/education/products/n_illinois_u/datamanagement/
datopic.htmll
http://www.jcvi.org/medicago/include/images/chr6.BamHI.maps.jpg
Gabe Rudy, (2010) A Hitchhikers Guide to Next-Generation
Sequencing, :1-9, Golden Helix
[1] John D. McPherson, (2009) Next-Generation Gap, 6:1-4, Nature
Methods Supplement
[2]Michael L. Metzker, (2010) Sequencing Technologies, - the next
generation, 11:1-5, Nature Reviews
Md. Fakruddin,Khanjada Shahnewaj Bin mannan, (2012) Next
Generation sequencing technologies – Principles and prospects,
6:1-9, Research and Reviews in Biosciences
Misra N., Panda P. K., Parida B. K., Mishra B. K., (2012)
Phylogenomic Study of Lipid Genes Involved in Mocroalgal Biofuel
Production – Candidate Gene Mining and Metabolic Pathway
Analyses, Evolutionary Bioinformatics 8:545-564, doi: 10.4137/
EBO.S10159
Galaxy is an open, web-based
platform for data intensive
biomedical research. It can be
used on its own free public
server where you can perform,
reproduce, and share complete
analyses.
An example of how Galaxy
reflects its data is shown in Fig 5.
Two FASTA files related to the same nucleotide sequence
were input into both BLAST and MUMmer and the results
were parsed into tables. Then, the coverage of all hit contigs
and nodes from both programs was found.
Citations
More filters
Journal ArticleDOI
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

14,103 citations

Journal ArticleDOI
TL;DR: The main innovations of the new version of the Arlequin program include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans.
Abstract: We present here a new version of the Arlequin program available under three different forms: a Windows graphical version (Winarl35), a console version of Arlequin (arlecore), and a specific console version to compute summary statistics (arlsumstat). The command-line versions run under both Linux and Windows. The main innovations of the new version include enhanced outputs in XML format, the possibility to embed graphics displaying computation results directly into output files, and the implementation of a new method to detect loci under selection from genome scans. Command-line versions are designed to handle large series of files, and arlsumstat can be used to generate summary statistics from simulated data sets within an Approximate Bayesian Computation framework.

13,581 citations


Cites methods from "Sequencing technologies-the next ge..."

  • ...Some software packages (e.g. plink Purcell et al. 2007) have been specifically developed to both handle such huge data sets and to directly perform statistical analyses on the data....

    [...]

Journal ArticleDOI
22 Apr 2013-PLOS ONE
TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

11,272 citations


Cites background from "Sequencing technologies-the next ge..."

  • ...High-throughput (HT) DNA sequencing [1] is allowing major...

    [...]

Journal ArticleDOI
04 May 2011-PLOS ONE
TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.
Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

5,163 citations


Cites methods from "Sequencing technologies-the next ge..."

  • ...Next-generation sequencing (NGS) technologies have been recently used for whole genome sequencing and for re-sequencing projects where the genomes of several specimens are sequenced to discover large numbers of single nucleotide polymorphisms (SNPs) for exploring within-species diversity, constructing haplotype maps and performing genome-wide association studies (GWAS) [13]....

    [...]

  • ...Read quantity and quality Because we are interested in enabling genome wide association studies (GWAS) in maize, a species where linkage disequilibrium decays within two to three kbp [30], we need to identify markers that cover around one million genomic locations....

    [...]

Journal ArticleDOI
TL;DR: This work has examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites, and over one-third of GENCODE protein-Coding genes aresupported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas.
Abstract: The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

4,281 citations

References
More filters
Journal ArticleDOI
TL;DR: VAAL detected ∼98% of differences (including large insertion-deletions) between pairs of strains from three species while calling no false positives, identifying an antibiotic's site of action by identifying sequence differences between drug-sensitive strains and drug-resistant derivatives.
Abstract: This variant ascertainment algorithm, or VAAL, uses short sequence reads of haploid bacterial genomes to first locally assemble the reads and then compare these assemblies to the reference genome. This allows VAAL to detect all types of variants ranging from single-nucleotide polymorphisms to large insertions or deletions. Our variant ascertainment algorithm, VAAL, uses massively parallel DNA sequence data to identify differences between bacterial genomes with high sensitivity and specificity. VAAL detected ∼98% of differences (including large insertion-deletions) between pairs of strains from three species while calling no false positives. VAAL also pinpointed a single mutation between Vibrio cholerae genomes, identifying an antibiotic's site of action by identifying sequence differences between drug-sensitive strains and drug-resistant derivatives.

71 citations

Journal ArticleDOI
TL;DR: A novel paradigm in RT chemistry is discovered, the attachment of a photocleavable, 2-nitrobenzyl group to the N6-position of 2′-deoxyadenosine triphosphate (dATP), which, upon incorporation, terminates DNA synthesis.
Abstract: The Human Genome Project has facilitated the sequencing of many species, yet the current Sanger method is too expensive, labor intensive and time consuming to accomplish medical resequencing of human genomes en masse. Of the ‘next-generation’ technologies, cyclic reversible termination (CRT) is a promising method with the goal of producing accurate sequence information at a fraction of the cost and effort. The foundation of this approach is the reversible terminator (RT), its chemical and biological properties of which directly impact the performance of the sequencing technology. Here, we have discovered a novel paradigm in RT chemistry, the attachment of a photocleavable, 2-nitrobenzyl group to the N 6 -position of 2’-deoxyadenosine triphosphate (dATP), which, upon incorporation, terminates DNA synthesis. The 3’-OH group of the N 6 -(2-nitrobenzyl)-dATP remains unblocked, providing favorable incorporation and termination properties for several commercially available DNA polymerases while maintaining good discrimination against mismatch incorporations. Upon removal of the 2-nitrobenzyl group with UV light, the natural nucleotide is restored without molecular scarring. A five-base experiment, illustrating the exquisite, stepwise addition through a homopolymer repeat, demonstrates the applicability of the N 6 -(2-nitrobenzyl)-dATP as an ideal RT for CRT sequencing.

57 citations

Journal ArticleDOI
TL;DR: Functional validation of candidate genes and the use of genome-wide techniques to gain mechanistic insights will be emphasized for the establishment of biological plausibility and as essential follow-up steps after the identification ofcandidate genes.
Abstract: Pharmacogenetics is the study of the role of inheritance in variation in drug response phenotypes. Those phenotypes can range from life-threatening adverse drugs reactions at one end of the spectrum to equally serious lack of therapeutic efficacy at the other. Over the past half century, pharmacogenetics has—like all of medical genetics—evolved from a discipline with a focus on monogenetic traits to become pharmacogenomics, with a genome-wide perspective. This article will briefly review recent examples of the application of genome-wide techniques to clinical pharmacogenomic studies and to pharmacogenomic model systems that vary from cell line-based model systems to yeast gene deletion libraries. Functional validation of candidate genes and the use of genome-wide techniques to gain mechanistic insights will be emphasized for the establishment of biological plausibility and as essential follow-up steps after the identification of candidate genes.

56 citations

Journal ArticleDOI
TL;DR: A massively scalable biochemistry, Cyclical Ligation and Cleavage (CycLiC) for contiguous base sequencing and its application directly to a template captured on a microarray format is described.
Abstract: Next generation sequencing methods that can be applied to both the resequencing of whole genomes and to the selective resequencing of specific parts of genomes are needed. We describe (i) a massively scalable biochemistry, Cyclical Ligation and Cleavage (CycLiC) for contiguous base sequencing and (ii) apply it directly to a template captured on a microarray. CycLiC uses four color-coded DNA/RNA chimeric oligonucleotide libraries (OL) to extend a primer, a base at a time, along a template. The cycles comprise the steps: (i) ligation of OLs, (ii) identification of extended base by label detection, and (iii) cleavage to remove label/terminator and undetermined bases. For proof-of-principle, we show that the method conforms to design and that we can read contiguous bases of sequence correctly from a template captured by hybridization from solution to a microarray probe. The method is amenable to massive scale-up, miniaturization and automation. Implementation on a microarray format offers the potential for both selection and sequencing of a large number of genomic regions on a single platform. Because the method uses commonly available reagents it can be developed further by a community of users.

55 citations

Journal ArticleDOI
TL;DR: Methods relying on dense arrays of synthetic oligodeoxynucleotides to target specific subsets of the human genome may enable routine resequencing of all human exons or multi-megabase-pair chromosomal regions.
Abstract: Methods relying on dense arrays of synthetic oligodeoxynucleotides to target specific subsets of the human genome may enable routine resequencing of all human exons or multi-megabase-pair chromosomal regions.

47 citations


"Sequencing technologies-the next ge..." refers background in this paper

  • ...Initial reports raised concerns as to the readiness of targeted capture for routine us...

    [...]