Sequencing technologies-the next generation

doi:10.1038/NRG2626

www.PosterPresentations.com

ABSTRACT

Conclusion and Future Work

Next Generation Sequencing

CONTACT INFO

Data Analysis Comparisons

Downstream Analysis

REFERENCES

DNA sequencing is one of the most important platforms for

study in biological systems today. The high-throughput-next

generation sequencing technologies delivers fast,

inexpensive, and accurate genome information. Next

generation sequencing can produce over 100 times more data

than methods based on Sanger Sequencing. The next

generation sequencing technologies offered from Illumina /

Solexa, ABI/SOLiD, 454/Roche, and Helicos has provided

unprecedented opportunity for high–throughput functional

genomic research. Next generation sequence technologies

offer novel and rapid ways for genome-wide characterization

and profiling of mRNA’s, transcription factor regions, and DNA

patterns.

Fig. 7) This is a plot of the frequency of each percentage covered for all nodes.

BLAST is in blue, MUMmer is in green.

Sequencing Technologies – the Next Generation,

Micahel L. Metzkerh

Next Generation Sequencing Pipeline Development and Data Analysis

Fig. 9) This is a plot of the coverage of each Node. BLAST points are blue,

MUMmer points are red.

Fig. 6) This is a plot of the frequency of each percentage covered for all contigs.

BLAST is in blue, MUMmer is in green.

•  454/Roche – 454 Life Sciences is a Biotechnology company

that is a part of Roche and based in Branford, Connecticut.

The center develops ultra-fast high-throughput DNA

sequencing methods and tools.

•  Illumina/Solexa– Illumina is a company that develops and

manufactures integrated systems for the analysis of gene

variation. Solexa was founded to develop genome

sequencing technology.

•  ABI/SOLiD - (Sequencing by Oligonucleotide Ligation and

Detection) is a next-generation DNA sequencing technology

developed by Life Technologies and has been commercially

available since 2006. This next generation technology

generates hundreds of millions to billions of small sequence

reads at one time.

•  Helicos - Helicos's technology images the extension of

individual DNA molecules using a defined primer and

individual fluorescently labeled nucleotides, which contain a

"virtual terminator" preventing incorporation of multiple

nucleotides per cycle.

Julian Pierre

1

, Jordan Taylor

2

, Amit Upadhyay

3

, Bhanu Rekepalli

3

Fig. 8) This is a plot of the coverage of each Contig. BLAST points are blue,

MUMmer points are red.

Using the coverage of

each individual contig

ID, the results for both

BLAST and MUMmer

were plotted. While

BLAST hit more contigs,

there are more contigs

with a higher coverage

that were hit by

MUMmer.

Using the data gathered

from both BLAST and

MUMmer, the frequency

of the amount covered

for each contig was

plotted. From Fig 6), it

can be inferred that

MUMmer hit more

accurately for contigs.

Fig 4) from main.g2.bx.psu.edu

Once the results were found using both the BLAST and

MUMmer search tools, we created a program to see which

sequencing tool had the most hits per contig. The total

number of contigs in the database file is 160,749 and the

total number of nodes in the query file is 552,305. BLAST

returned a total of 123,070 hits and MUMmer returned a

total of 121,829 hits. From the results, MUMmer hit more

accurately than BLAST while BLAST hit more contigs than

MUMmer.

In Next-Generation Sequencing, data analysis is one of the

most expensive processes. While the cost of genome

sequencing goes down, the cost of analyzing data is still

expensive. In the future, the “$1,000 genome will come with

a $20,000 analysis price tag.”

The same process was

done with the Nodes.

From Fig 7), it can be

inferred that BLAST hit

more accurately with

nodes. However, there

are more BLAST results

with lower coverage.

The future of next generation sequencing can be broken

down into a variety of categories such as personalized

medicine, bio fuels, climate change, and other life science

fields.

•  Personalized Medicine is a medical model that proposes

the customization of medical decision to tailor an

individual

•  Bio Fuels present a source of alternative energy.

Microalgal biofuels use algae to synthesize the fuel. In

order to optimize the process, an understanding of the

gene-function relationship of algae would prove helpful.

•  Climate change is the active study of past and future

theoretical models which uses the past climate data to

make future projections.

In conclusion, we hope to contribute the knowledge we

have gained to contribute to fields such as these.

The same process was

done with the Nodes.

While BLAST hit more

Nodes, there are more

Nodes that hit with a

lower coverage using

BLAST.

1 Texas Southern University, 2 Austin Peay State University, 3 University of Tennessee

Next Gen Sequencing uses a wide array of tools to obtain results based

on the genome sequence. The most widely used Tools are BLAST,

HMMER, and MUMmer.

•  BLAST (Basic Local Alignment Search Tool) is a multi-sequence

alignment tool developed by NIH (National Institute of Health). It is

used find similar regions in different sequences and then compare

their similarities.

•  MUMmer (Maximum Unique Matches) is a rapid alignment system

used for rapidly aligning entire genomes. It can also align incomplete

genomes and can easily handle 1000’s of contigs from a shotgun

sequencing project.

•  HMMER (Hidden Markov Modeler) is used for searching sequence

databases for homologs of protein sequences, and for making protein

sequence alignments. It implements methods using probabilistic

models called profile hidden Markov models (HMMs)

Genome Assembly

Sequence Analysis refers to

the process of subjecting a

DNA, RNA or peptide

sequence to a wide range of

analytical methods to:

•  Compare sequences to find

similarities and infer if they

are Homologous

•  To identify the features of

the sequence such as gene

structure, distribution,

introns and exons, and

regulation of gene

expression

•  Identify Sequence

differences and variations

such as mutations

Fig. 1) This is figure shows three different Next Generation Sequencing methods. [2]

Fig. 2) Taken from A Hitchhiker’s Guide to Next-Generation Sequencing, by Gabe Rudy

Fig. 3) Taken from bio.davidson.edu/courses. Shows alignment results for yeast.

Fig 5) from jcvi.org shows the mapping of chr6 of a Human Genome

Julian Pierre – julz_pierre@yahoo.com

Jordan Taylor – jtaylor74@my.apsu.edu

Amit Upadhyay – aupadhy1@utk.edu

Bhanu Rekepalli – brekapal@utk.edu

•  http://www.roche.com/research_and_development/r_d_overview/

r_d_sites.htm?id=18

•  http://www.pnas.org/content/99/6/3712/F1.expansion.html

•  http://www.yerkes.emory.edu/nhp_genomics_core/Services/

Sequencing.html

•  http://www.illumina.com/technology/solexa_technology.ilmn

•  http://blast.ncbi.nlm.nih.gov/Blast.cgi

•  https://main.g2.bx.psu.edu/u/dan/p/fastq

•  http://ori.dhhs.gov/education/products/n_illinois_u/datamanagement/

datopic.htmll

•  http://www.jcvi.org/medicago/include/images/chr6.BamHI.maps.jpg

•  Gabe Rudy, (2010) A Hitchhiker’s Guide to Next-Generation

Sequencing, :1-9, Golden Helix

•  [1] John D. McPherson, (2009) Next-Generation Gap, 6:1-4, Nature

Methods Supplement

•  [2]Michael L. Metzker, (2010) Sequencing Technologies, - the next

generation, 11:1-5, Nature Reviews

•  Md. Fakruddin,Khanjada Shahnewaj Bin mannan, (2012) Next

Generation sequencing technologies – Principles and prospects,

6:1-9, Research and Reviews in Biosciences

•  Misra N., Panda P. K., Parida B. K., Mishra B. K., (2012)

Phylogenomic Study of Lipid Genes Involved in Mocroalgal Biofuel

Production – Candidate Gene Mining and Metabolic Pathway

Analyses, Evolutionary Bioinformatics 8:545-564, doi: 10.4137/

EBO.S10159

Galaxy is an open, web-based

platform for data intensive

biomedical research. It can be

used on its own free public

server where you can perform,

reproduce, and share complete

analyses.

An example of how Galaxy

reflects its data is shown in Fig 5.

Two FASTA files related to the same nucleotide sequence

were input into both BLAST and MUMmer and the results

were parsed into tables. Then, the coverage of all hit contigs

and nodes from both programs was found.

Sequencing technologies-the next generation

Citations

featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features

Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows

phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species

GENCODE: The reference human genome annotation for The ENCODE Project

References

Reagents, methods, and libraries for bead-based sequencing

De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae

Virtual terminator nucleotides for next-generation DNA sequencing

DNA polymerase fluorescent substrates with reversible 3′-tags

Synthesis and properties of fluorescent nucleotide substrates for DNA-dependent RNA polymerases.

Related Papers (5)

DNA sequencing with chain-terminating inhibitors

Fast and accurate short read alignment with Burrows–Wheeler transform

The Sequence Alignment/Map format and SAMtools

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Initial sequencing and analysis of the human genome.