scispace - formally typeset
Search or ask a question

Showing papers on "Hybrid genome assembly published in 1997"


01 Jan 1997
TL;DR: A heuristic to speed up fragment assembly and implement it using a data structure called suffix array, which greatly improves the speed of overlap detection by up to 1,000 times while maintaining a high accuracy, and it is shown that data structures are powerful in many pattern matching applications.
Abstract: This thesis is concerned with computational approaches to genome analysis. We discuss three biological applications: genomic rearrangements, gene recognitions, and genome sequencing, all of whose practical solutions involve interesting algorithm problems. In the genomic rearrangements, we seek to reconstruct the evolutionary history of the genome. We study the distance between genomes using fixed-length inversions and give a complete theoretical characterization for both linear and circular genomes. We also prove upper and lower bounds to the minimum distance. Pattern recognition is central to many gene recognition systems. We apply linear discriminant analysis in a special program called Pombe to identify protein coding regions in the Schizosaccharomyces pombe genome. The accuracy of gene structures we predicted is 97.2% correlation coefficient at the nucleotide level by cross validation. In a large scale genome sequencing project, we show that data structures are powerful in many pattern matching applications. We introduce a heuristic to speed up fragment assembly and implement it using a data structure called suffix array, which greatly improves the speed of overlap detection by up to 1,000 times while maintaining a high accuracy. Finally, we report a recent progress on this sequencing project and the assembly program STROLL. Compared with other widely used assemblers, STROLL is significantly faster and more reliable to handle repeat regions. In the last chapter, we point our future research to some open problems which are of great interest to both computer scientists and biologists.

9 citations


Patent
23 Sep 1997
TL;DR: In this paper, an efficient method for sequencing large fragments of DNA is described, where a subclone path through the fragment is first identified; the collection of subclones that define this path is then sequenced using transposon-mediated direct sequencing techniques to an extent sufficient to provide the complete sequence of the fragment.
Abstract: An efficient method for sequencing large fragments of DNA is described. A subclone path through the fragment is first identified; the collection of subclones that define this path is then sequenced using transposon-mediated direct sequencing techniques to an extent sufficient to provide the complete sequence of the fragment.

1 citations