scispace - formally typeset
Search or ask a question

Showing papers on "Sequence assembly published in 1997"


Journal ArticleDOI
TL;DR: A new computer program known as PolyPhred is described that automatically detects the presence of heterozygous single nucleotide substitutions by fluorescencebased sequencing of PCR products and generates a high throughput system for detecting DNA polymorphisms and mutations by large scale fluorescence-based resequencing.
Abstract: Fluorescence-based sequencing is playing an increasingly important role in efforts to identify DNA polymorphisms and mutations of biological and medical interest. The application of this technology in generating the reference sequence of simple and complex genomes is also driving the development of new computer programs to automate base calling (Phred), sequence assembly (Phrap) and sequence assembly editing (Consed) in high throughput settings. In this report we describe a new computer program known as PolyPhred that automatically detects the presence of heterozygous single nucleotide substitutions by fluorescencebased sequencing of PCR products. Its operations are integrated with the use of the Phred, Phrap and Consed programs and together these tools generate a high throughput system for detecting DNA polymorphisms and mutations by large scale fluorescence-based resequencing. Analysis of sequences containing known DNA variants demonstrates that the accuracy of PolyPhred with single pass data is >99% when the sequences are generated with fluorescent dye-labeled primers and approximately 90% for those prepared with dye-labeled terminators.

1,009 citations


Journal ArticleDOI
TL;DR: Methods for whole genome sequencing and analysis are reviewed and how this information can be exploited to better understand microbial physiology and evolution are examined.
Abstract: The introduction of methods for automated DNA sequence analysis nearly a decade ago, together with more recent advances in the field of bioinformatics, have revolutionized biology and medicine and have ushered in a new era of genomic science, the study of genes and genomes. These new technologies have had an impact on many areas of research, including the association between genes and disease, in DNA-based diagnostics, and in the sequencing of genomes from human and other model organisms. The demonstration in 1995, that automated DNA sequencing methods could be used to decipher the entire genome sequence of a free-living organism, Haemophilus influenzae, was a milestone in both the genomics and microbial fields [1]. Since the first report of the complete sequence of H. influenzae, these methodologies have been adopted by laboratories around the world. The complete genomic sequence of five eubacterial species [1–5], one archaea [6], and the eukaryote, Saccharomyces cerevisiae [7], have been reported in the last 18 months. At the beginning of 1997 more than a dozen microbial genome projects are at or near completion, with many others in progress. It is likely that in the next few years we will see the complete sequence of perhaps as many as 30–40 microbial genomes. In this article, we will review methods for whole genome sequencing and analysis and examine how this information can be exploited to better understand microbial physiology and evolution.

86 citations


Journal ArticleDOI
TL;DR: A strategy to efficiently and accurately sequence repetitive DNA in the nematode Caenorhabditis elegans using integrated artificial transposons and automated fluorescent sequencing and a website database to track and cross-reference transposon-like repetitive elements is established.
Abstract: Repetitive DNA is a significant component of eukaryotic genomes. We have developed a strategy to efficiently and accurately sequence repetitive DNA in the nematode Caenorhabditis elegans using integrated artificial transposons and automated fluorescent sequencing. Mapping and assembly tools represent important components of this strategy and facilitate sequence assembly in complex regions. We have applied the strategy to several cosmid assembly gaps resulting from repetitive DNA and have accurately recovered the sequences of these regions. Analysis of these regions revealed six novel transposon-like repetitive elements, IR-1, IR-2, IR-3, IR-4, IR-5, and TR-1. Each of these elements represents a middle-repetitive DNA family in C. elegans containing at least 3-140 copies per genome. Copies of IR-1, IR-2, IR-4, and IR-5 are located on all (or most) of the six nematode chromosomes, whereas IR-3 is predominantly located on chromosome X. These elements are almost exclusively interspersed between predicted genes or within the predicted introns of these genes, with the exception of a single IR-5 element, which is located within a predicted exon. IR-1, IR-2, and IR-3 are flanked by short sequence duplications resembling the target site duplications of transposons. We have established a website database (http:(/)/www.welch.jhu.edu/approximately devine/RepDNAdb.html) to track and cross-reference these transposon-like repetitive elements that contains detailed information on individual element copies and provides links to appropriate GenBank records. This set of tools may be used to sequence, track, and study repetitive DNA in model organisms and humans.

50 citations


Proceedings ArticleDOI
19 Jan 1997
TL;DR: Algorithm for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.
Abstract: We develop several algorithms for the problem of aligning DNA sequence with a protein sequence. Our methods account for frameshift errors, but not for introns in the DNA sequence. Thus, they are particularly appropriate for comparing a cDNA sequence that suffers from sequencing errors with an amino acid sequence or a protein sequence database. We describe algorithms for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.

41 citations


Book
15 Jan 1997
TL;DR: This chapter discusses sequence alignment and database searches, strategies for new sequence determination, and how to identify non-radioactive methods in DNA sequencing.
Abstract: What is DNA Sequencing? Chemical degradation (Maxam and Gilbert) method. Chain termination (Sanger dideoxy) method. Instrumentation and reagents. Template preparation. Gel electrophoresis. Non-radioactive methods. Troubleshooting. Confirmatory sequencing. Sequencing PCR products. Strategies for new sequence determination. Introduction to Bioinformatics and the Internet. Sequence databases. Sequence alignment and database searches. Sequencing projects and Contig analysis. Protein function prediction. Protein structure prediction. Appendices.

40 citations


Journal ArticleDOI
TL;DR: Fine tuning of a 10K size test problem leads to a considerably improved solution to a 35K problem of sequence assembly that is of significant biological interest.
Abstract: Synoptic AbstractExperimental design and response surface methodology is applied to tuning the parameters of an optimization program employing genetic algorithms. Attention is directed to the combinatorially challenging DNA sequence assembly problem. Fine tuning of a 10K size test problem leads to a considerably improved solution to a 35K problem of sequence assembly that is of significant biological interest.

33 citations


Journal ArticleDOI
TL;DR: Algorithm for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.
Abstract: We develop several algorithms for the problem of aligning DNA sequence with a protein sequence. Our methods account for frameshift errors, but not for introns in the DNA sequence. Thus, they are particularly appropriate for comparing a cDNA sequence that suffers from sequencing errors with an amino acid sequence or a protein sequence database. We describe algorithms for computing optimal alignments for several definitions of DNA-protein alignment, verify sufficient conditions for equivalence of certain definitions, describe techniques for efficient implementation, and discuss experience with these ideas in a new release of the FASTA suite of database-searching programs.

27 citations


Journal ArticleDOI
TL;DR: The results show that of the shotgun sequencing of large fragments, and that the method compares favorably to alternative strategies, such as primer walking.
Abstract: This study describes an efficient method for the rapid sequencing of DNA fragments in the size range 1–5 kb. Individual fragments, here cDNA inserts, are purified by restriction enzyme digestion and gel purification, pooled and concatenated by ligation. The concatamers are sheared and cloned randomly into M13, followed by random sequencing. The sequences of the individual cDNA inserts are obtained at the assembly stage using restriction enzyme sites as ‘tags’ for the ends of each fragment. In this study the sequencing of two libraries containing 7 and 16 cDNA inserts with an average length of 2.5 kb is described. The results show that the efficiency of the procedure is comparable to that of the shotgun sequencing of large fragments, and that the method compares favorably to alternative strategies, such as primer walking.

26 citations


Journal ArticleDOI
TL;DR: DEXAS permits direct sequence determination from whole genomic DNA and thus eliminates the need for template amplification and preparation and can be applied to single as well as multi-copy genomic sequences, and can easily be automated.
Abstract: In order to supply a sufficient amount of template molecules for DNA sequence determination, cloning into plasmids and subsequent plasmid purification, or amplification via the PCR, are generally used. Here, we present a method-'direct exponential amplification and sequencing' or 'DEXAS'-that permits direct sequence determination from whole genomic DNA and thus eliminates the need for template amplification and preparation. It relies on the simultaneous amplification of a target sequence and the determination of its sequence using dideoxyterminators in a two-step cycling reaction. DEXAS can be applied to single as well as multi-copy genomic sequences, and can easily be automated.

13 citations


Journal ArticleDOI
15 Sep 1997-Gene
TL;DR: This is the first report of successful sequencing of long genomic fragments by the use of overlapping deletions, and the nucleotide sequences of two cosmid inserts from chromosome IV of Drosophila that were determined by this method are presented.

9 citations



Journal ArticleDOI
10 Oct 1997
TL;DR: A novel approach for efficiently reconstructing an original DNA sequence from erroneous copies is suggested and shown to be efficient and scalable.
Abstract: We suggest a novel approach for efficiently reconstructing an original DNA sequence from erroneous copies.


Journal ArticleDOI
TL;DR: A chemical method is developed that yields a one-reaction one-lane sequence determination, thus allowing fast, multiple, simultaneous sequencing and direct sequence comparisons in the same electrophoretic lane.