scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Sequencing technologies-the next generation

01 Jan 2010-Nature Reviews Genetics (Nature Publishing Group)-Vol. 11, Iss: 1, pp 31-46
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • DNA sequencing is one of the most important platforms for study in biological systems today.
  • The high-throughput-next generation sequencing technologies delivers fast, inexpensive, and accurate genome information.
  • Next generation sequencing can produce over 100 times more data than methods based on Sanger Sequencing.
  • The next generation sequencing technologies offered from Illumina / Solexa, ABI/SOLiD, 454/Roche, and Helicos has provided unprecedented opportunity for high-throughput functional genomic research.
  • Next generation sequence technologies offer novel and rapid ways for genome-wide characterization and profiling of mRNA's, transcription factor regions, and DNA patterns.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

TEMPLATE DESIGN © 2008
www.PosterPresentations.com
ABSTRACT
Conclusion and Future Work
Next Generation Sequencing
CONTACT INFO
Data Analysis Comparisons
Downstream Analysis
REFERENCES
DNA sequencing is one of the most important platforms for
study in biological systems today. The high-throughput-next
generation sequencing technologies delivers fast,
inexpensive, and accurate genome information. Next
generation sequencing can produce over 100 times more data
than methods based on Sanger Sequencing. The next
generation sequencing technologies offered from Illumina /
Solexa, ABI/SOLiD, 454/Roche, and Helicos has provided
unprecedented opportunity for high–throughput functional
genomic research. Next generation sequence technologies
offer novel and rapid ways for genome-wide characterization
and profiling of mRNAs, transcription factor regions, and DNA
patterns.
Fig. 7) This is a plot of the frequency of each percentage covered for all nodes.
BLAST is in blue, MUMmer is in green.
Sequencing Technologies – the Next Generation,
Micahel L. Metzkerh
Next Generation Sequencing Pipeline Development and Data Analysis
Fig. 9) This is a plot of the coverage of each Node. BLAST points are blue,
MUMmer points are red.
Fig. 6) This is a plot of the frequency of each percentage covered for all contigs.
BLAST is in blue, MUMmer is in green.
454/Roche – 454 Life Sciences is a Biotechnology company
that is a part of Roche and based in Branford, Connecticut.
The center develops ultra-fast high-throughput DNA
sequencing methods and tools.
Illumina/Solexa– Illumina is a company that develops and
manufactures integrated systems for the analysis of gene
variation. Solexa was founded to develop genome
sequencing technology.
ABI/SOLiD - (Sequencing by Oligonucleotide Ligation and
Detection) is a next-generation DNA sequencing technology
developed by Life Technologies and has been commercially
available since 2006. This next generation technology
generates hundreds of millions to billions of small sequence
reads at one time.
Helicos - Helicos's technology images the extension of
individual DNA molecules using a defined primer and
individual fluorescently labeled nucleotides, which contain a
"virtual terminator" preventing incorporation of multiple
nucleotides per cycle.
Julian Pierre
1
, Jordan Taylor
2
, Amit Upadhyay
3
, Bhanu Rekepalli
3
Fig. 8) This is a plot of the coverage of each Contig. BLAST points are blue,
MUMmer points are red.
Using the coverage of
each individual contig
ID, the results for both
BLAST and MUMmer
were plotted. While
BLAST hit more contigs,
there are more contigs
with a higher coverage
that were hit by
MUMmer.
Using the data gathered
from both BLAST and
MUMmer, the frequency
of the amount covered
for each contig was
plotted. From Fig 6), it
can be inferred that
MUMmer hit more
accurately for contigs.
Fig 4) from main.g2.bx.psu.edu
Once the results were found using both the BLAST and
MUMmer search tools, we created a program to see which
sequencing tool had the most hits per contig. The total
number of contigs in the database file is 160,749 and the
total number of nodes in the query file is 552,305. BLAST
returned a total of 123,070 hits and MUMmer returned a
total of 121,829 hits. From the results, MUMmer hit more
accurately than BLAST while BLAST hit more contigs than
MUMmer.
In Next-Generation Sequencing, data analysis is one of the
most expensive processes. While the cost of genome
sequencing goes down, the cost of analyzing data is still
expensive. In the future, the “$1,000 genome will come with
a $20,000 analysis price tag.”
The same process was
done with the Nodes.
From Fig 7), it can be
inferred that BLAST hit
more accurately with
nodes. However, there
are more BLAST results
with lower coverage.
The future of next generation sequencing can be broken
down into a variety of categories such as personalized
medicine, bio fuels, climate change, and other life science
fields.
Personalized Medicine is a medical model that proposes
the customization of medical decision to tailor an
individual
Bio Fuels present a source of alternative energy.
Microalgal biofuels use algae to synthesize the fuel. In
order to optimize the process, an understanding of the
gene-function relationship of algae would prove helpful.
Climate change is the active study of past and future
theoretical models which uses the past climate data to
make future projections.
In conclusion, we hope to contribute the knowledge we
have gained to contribute to fields such as these.
The same process was
done with the Nodes.
While BLAST hit more
Nodes, there are more
Nodes that hit with a
lower coverage using
BLAST.
1 Texas Southern University, 2 Austin Peay State University, 3 University of Tennessee
Next Gen Sequencing uses a wide array of tools to obtain results based
on the genome sequence. The most widely used Tools are BLAST,
HMMER, and MUMmer.
BLAST (Basic Local Alignment Search Tool) is a multi-sequence
alignment tool developed by NIH (National Institute of Health). It is
used find similar regions in different sequences and then compare
their similarities.
MUMmer (Maximum Unique Matches) is a rapid alignment system
used for rapidly aligning entire genomes. It can also align incomplete
genomes and can easily handle 1000’s of contigs from a shotgun
sequencing project.
HMMER (Hidden Markov Modeler) is used for searching sequence
databases for homologs of protein sequences, and for making protein
sequence alignments. It implements methods using probabilistic
models called profile hidden Markov models (HMMs)
Genome Assembly
Sequence Analysis refers to
the process of subjecting a
DNA, RNA or peptide
sequence to a wide range of
analytical methods to:
Compare sequences to find
similarities and infer if they
are Homologous
To identify the features of
the sequence such as gene
structure, distribution,
introns and exons, and
regulation of gene
expression
Identify Sequence
differences and variations
such as mutations
Fig. 1) This is figure shows three different Next Generation Sequencing methods. [2]
Fig. 2) Taken from A Hitchhiker’s Guide to Next-Generation Sequencing, by Gabe Rudy
Fig. 3) Taken from bio.davidson.edu/courses. Shows alignment results for yeast.
Fig 5) from jcvi.org shows the mapping of chr6 of a Human Genome
Julian Pierre – julz_pierre@yahoo.com
Jordan Taylor – jtaylor74@my.apsu.edu
Amit Upadhyay – aupadhy1@utk.edu
Bhanu Rekepalli – brekapal@utk.edu
http://www.roche.com/research_and_development/r_d_overview/
r_d_sites.htm?id=18
http://www.pnas.org/content/99/6/3712/F1.expansion.html
http://www.yerkes.emory.edu/nhp_genomics_core/Services/
Sequencing.html
http://www.illumina.com/technology/solexa_technology.ilmn
http://blast.ncbi.nlm.nih.gov/Blast.cgi
https://main.g2.bx.psu.edu/u/dan/p/fastq
http://ori.dhhs.gov/education/products/n_illinois_u/datamanagement/
datopic.htmll
http://www.jcvi.org/medicago/include/images/chr6.BamHI.maps.jpg
Gabe Rudy, (2010) A Hitchhikers Guide to Next-Generation
Sequencing, :1-9, Golden Helix
[1] John D. McPherson, (2009) Next-Generation Gap, 6:1-4, Nature
Methods Supplement
[2]Michael L. Metzker, (2010) Sequencing Technologies, - the next
generation, 11:1-5, Nature Reviews
Md. Fakruddin,Khanjada Shahnewaj Bin mannan, (2012) Next
Generation sequencing technologies – Principles and prospects,
6:1-9, Research and Reviews in Biosciences
Misra N., Panda P. K., Parida B. K., Mishra B. K., (2012)
Phylogenomic Study of Lipid Genes Involved in Mocroalgal Biofuel
Production – Candidate Gene Mining and Metabolic Pathway
Analyses, Evolutionary Bioinformatics 8:545-564, doi: 10.4137/
EBO.S10159
Galaxy is an open, web-based
platform for data intensive
biomedical research. It can be
used on its own free public
server where you can perform,
reproduce, and share complete
analyses.
An example of how Galaxy
reflects its data is shown in Fig 5.
Two FASTA files related to the same nucleotide sequence
were input into both BLAST and MUMmer and the results
were parsed into tables. Then, the coverage of all hit contigs
and nodes from both programs was found.
Citations
More filters
Journal ArticleDOI
TL;DR: The results indicate that selection plays a major role in determining the population genomic structure of D. magna, an emerging model system in genomics and a renowned ecological model system.
Abstract: The combined analysis of neutral and adaptive genetic variation is crucial to reconstruct the processes driving population genetic structure in the wild. However, such combined analysis is challenging because of the complex interaction among neutral and selective processes in the landscape. Overcoming this level of complexity requires an unbiased search for the evidence of selection in the genomes of populations sampled from their natural habitats and the identification of demographic processes that lead to present-day populations' genetic structure. Ecological model species with a suite of genomic tools and well-understood ecologies are best suited to resolve this complexity and elucidate the role of selective and demographic processes in the landscape genomic structure of natural populations. Here we investigate the water flea Daphnia magna, an emerging model system in genomics and a renowned ecological model system. We infer past and recent demographic processes by contrasting patterns of local and regional neutral genetic diversity at markers with different mutation rates. We assess the role of the environment in driving genetic variation in our study system by identifying correlates between biotic and abiotic variables naturally occurring in the landscape and patterns of neutral and adaptive genetic variation. Our results indicate that selection plays a major role in determining the population genomic structure of D. magna. First, environmental selection directly impacts genetic variation at loci hitchhiking with genes under selection. Second, priority effects enhanced by local genetic adaptation (cf. monopolization) affect neutral genetic variation by reducing gene flow among populations and genetic diversity within populations.

76 citations


Cites background from "Sequencing technologies-the next ge..."

  • ...…luisa.orsini@bio.kuleuven.be © 2012 Blackwell Publishing Ltd higher accessibility of next-generation sequencing technologies (Wang et al. 2009; Metzker 2010; Davey et al. 2011), in recent years nonmodel taxa have been mostly investigated in ‘forward genetics’ genome scan studies (Nosil et…...

    [...]

Journal ArticleDOI
TL;DR: This article describes a procedure for the dissociation of zebrafish embryos to produce a suspension of single cells that is suitable for fluorescence-activated cell sorting (FACS), and RNA can be extracted from the sorted cells and used for subsequent quantitative real-time PCR, microarrays, or next-generation sequencing (NGS) experiments.
Abstract: This article describes a procedure for the dissociation of zebrafish (Danio rerio) embryos to produce a suspension of single cells that is suitable for fluorescence-activated cell sorting (FACS). The method has been applied to embryos at stages from 14 h post fertilization (hpf) to larvae at 5 d post fertilization (dpf), and it has also been successfully used for isolating fluorescently tagged neurons from whole dissociated embryos and early larvae. The cell collection procedures described in this protocol may also be adapted for older embryos and juvenile zebrafish. RNA can be extracted from the sorted cells and used for subsequent quantitative real-time PCR (qRT-PCR), microarrays, or next-generation sequencing (NGS) experiments.

76 citations

Journal ArticleDOI
TL;DR: Recommendations are derived from the most recent studies identifying phenotype-genotype correlations following the discovery of causative RET gene mutations in MEN 2 eighteen years ago, which revolutionized the diagnostic and therapeutic strategies available for the management of these patients.
Abstract: Twenty-five percent of medullary thyroid cancers (MTC) are familial and inherited as an autosomal dominant trait. Three different phenotypes can be distinguished: multiple endocrine neoplasia (MEN) types 2A and 2B, in which the MTC is associated with other endocrine neoplasias, and familial MTC (FMTC), which occurs in isolation. The discovery that germline RET oncogene activating mutations are associated with 95–98% of MEN 2/FMTC syndromes and the availability of genotyping to identify mutations in affected patients and their relatives has revolutionized the diagnostic and therapeutic strategies available for the management of these patients. All patients with MTC, both those with a positive familial history and those apparently sporadic, should be submitted to RET genetic screening. Once an RET mutation has been confirmed in an index patient, first-degree relatives should be screened rapidly to identify the 50% who inherited the mutation and are therefore at risk for development of MTC. Relatives in whom no RET mutation is identified can be reassured and discharged from further follow-up, whereas RET-positive subjects (i.e. gene carriers) must be investigated and a therapeutic strategy initiated. These guideline recommendations are derived from the most recent studies identifying phenotype-genotype correlations following the discovery of causative RET gene mutations in MEN 2 eighteen years ago. Three major points will be discussed: (a) identification of patients and relatives who should have genetic screening for RET mutations, (b) management of asymptomatic gene carriers, and (c) ethics.

76 citations

Journal ArticleDOI
TL;DR: Three major challenges facing comparativephysiology in the 21st century are expanded upon: vertical integration of physiological processes across organizational levels within or-ganisms, horizontal integration ofphysiological processes across organ-isms within ecosystems, and temporalintegration of physiological pro-cesses during evolutionary change.
Abstract: Schwenketal.(2009)providedanover-view of five major challenges in organ-ismal biology: (1) understanding theorganism’s role in organism–environ-ment linkages; (2) utilizing the func-tional diversity of organisms; (3)integrating living and physical systemsanalysis; (4) understanding howgenomes produce organisms; and (5)understanding how organisms walkthe tightrope between stability andchange. Subsequent ‘‘GrandChallenges’’ papers have expanded onthese topics from different viewpoints,including ecomechanics (Denny andHelmuth 2009), endocrinology(Denver et al. 2009), development ofadditional model organisms (Satterlieet al. 2009), and development oftheoretical and financial resources(Halanych and Goertzen 2009). This isthe sixth paper in the ‘‘GrandChallenges’’ series, which offers theview from comparative physiology.In this article, we expand upon threemajor challenges facing comparativephysiology in the 21st century: verticalintegration of physiological processesacross organizational levels within or-ganisms, horizontal integration ofphysiological processes across organ-isms within ecosystems, and temporalintegration of physiological pro-cesses during evolutionary change.‘‘Integration’’ is a key. It defines thescope of the challenges and must beconsidered in any solution. Reductiveand inductive approaches both havebeen used with great success in biology.The reductive approach employs a sim-plified system to study a complexprocess. There is no question thatsuch an approach has yielded a greaterunderstanding of the molecular mech-anismsofcellularprocesses.Theinduc-tive approach depends on observationto develop universal principles. CharlesDarwin, after all, used this approach todevelop the theory of natural selection.All too often these approaches areviewed as mutually exclusive, when, infact, they are complementary and areused, to varying extents, by most biol-ogists working today. Yet, we havefallen short of full integration acrossdisciplinesandlevelsofbiologicalorga-nization. A major impediment for fur-ther advancement has been thelimitations in tools and resources.However, recent technological ad-vances (e.g., systems biology) give usan opportunity to combine reductiveand inductive approaches to studyemergent properties (Boogerd et al.2007) and now allow us to entertain

75 citations


Cites background from "Sequencing technologies-the next ge..."

  • ...Innovations in ‘‘next generation’’ sequencing technology have reduced costs and increased efficiencies in ob taining and cataloging genomic se quences (Metzker 2010)....

    [...]

  • ...The estimated cost for the first complete human genomic sequence that was published in 2004 is $300M; in contrast, current estimates for a ‘‘personal genome’’ are as low as $5K (Metzker 2010)....

    [...]

Journal ArticleDOI
TL;DR: A deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing and shows that the signals generated by this context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model.
Abstract: Motivation Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals. Results Here we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83 to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection. Availability and implementation The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator. Supplementary information Supplementary data are available at Bioinformatics online.

75 citations


Cites background from "Sequencing technologies-the next ge..."

  • ...Next-generation sequencing (NGS) technologies allow researchers to sequence DNA and RNA in a high-throughput manner, which have facilitated numerous breakthroughs in genomics, transcriptomics, and epigenomics (Metzker, 2010; MacLean et al., 2009; Wu et al., 2017)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

11,528 citations


"Sequencing technologies-the next ge..." refers background in this paper

  • ...For example, in gene-expression studies microarrays are now being replaced by seq-based methods , which can identify and quantify rare transcripts without prior knowledge of a particular gene and can provide information regarding alternative splicing and sequence variation in identified gene...

    [...]

Journal ArticleDOI
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

9,389 citations

Journal ArticleDOI
15 Sep 2005-Nature
TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.
Abstract: The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

8,434 citations

Journal ArticleDOI
20 Feb 2009-Cell
TL;DR: This work has revealed unexpected diversity in their biogenesis pathways and the regulatory mechanisms that they access, which has direct implications for fundamental biology as well as disease etiology and treatment.

4,490 citations


"Sequencing technologies-the next ge..." refers background in this paper

  • ...and to elucidate the role of non-coding RNAs in health and diseas...

    [...]

Journal ArticleDOI
20 Feb 2009-Cell
TL;DR: The evolution of long noncoding RNAs and their roles in transcriptional regulation, epigenetic gene regulation, and disease are reviewed.

4,277 citations


"Sequencing technologies-the next ge..." refers background in this paper

  • ...and to elucidate the role of non-coding RNAs in health and diseas...

    [...]