scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Tandem repeats finder: a program to analyze DNA sequences

01 Jan 1999-Nucleic Acids Research (Oxford University Press)-Vol. 27, Iss: 2, pp 573-580
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Abstract: A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm’s speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human β T cell receptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface at c3.biomath.mssm.edu/trf.html has been established for automated use of the program.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

9,605 citations


Cites methods from "Tandem repeats finder: a program to..."

  • ...The human genome was run through RepeatMasker (Smit 1999; Jurka 2000) and Tandem Repeat Finder ( Benson 1999 ) before the alignments....

    [...]

  • ...The human genome was run through RepeatMasker (Smit 1999; Jurka 2000) and Tandem Repeat Finder (Benson 1999) before the alignments....

    [...]

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations

Journal ArticleDOI
TL;DR: Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms that contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else.
Abstract: Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.

2,921 citations


Cites background from "Tandem repeats finder: a program to..."

  • ...They can virtually detect all possible arrays of tandem sequences and several independent programs were developed for this purpose, e.g. Tandem Repeat Finder (Benson, 1999); Satellites (Sagot and Myers, 1998); Etandem, Equicktandem (Rice et al., 2000); TROLL (Castelo et al., 2002)....

    [...]

  • ...Tandem Repeat Finder (Benson, 1999); Satellites (Sagot and Myers, 1998); Etandem, Equicktandem (Rice et al....

    [...]

Journal ArticleDOI
Shusei Sato, Satoshi Tabata, Hideki Hirakawa, Erika Asamizu  +320 moreInstitutions (51)
31 May 2012-Nature
TL;DR: A high-quality genome sequence of domesticated tomato is presented, a draft sequence of its closest wild relative, Solanum pimpinellifolium, is compared, and the two tomato genomes are compared to each other and to the potato genome.
Abstract: Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera1 and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium2, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.

2,687 citations

Journal ArticleDOI
15 Jan 2015-Cell
TL;DR: The genetic findings provide evidence for immunoediting in tumors and uncover mechanisms of tumor-intrinsic resistance to cytolytic activity, suggesting immune-mediated elimination.

2,600 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Abstract: We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

12,432 citations

01 Jan 1950
TL;DR: A First Course in Probability (8th ed.) by S. Ross is a lively text that covers the basic ideas of probability theory including those needed in statistics.
Abstract: Office hours: MWF, immediately after class or early afternoon (time TBA). We will cover the mathematical foundations of probability theory. The basic terminology and concepts of probability theory include: random experiments, sample or outcome spaces (discrete and continuous case), events and their algebra, probability measures, conditional probability A First Course in Probability (8th ed.) by S. Ross. This is a lively text that covers the basic ideas of probability theory including those needed in statistics. Theoretical concepts are introduced via interesting concrete examples. In 394 I will begin my lectures with the basics of probability theory in Chapter 2. However, your first assignment is to review Chapter 1, which treats elementary counting methods. They are used in applications in Chapter 2. I expect to cover Chapters 2-5 plus portions of 6 and 7. You are encouraged to read ahead. In lectures I will not be able to cover every topic and example in Ross, and conversely, I may cover some topics/examples in lectures that are not treated in Ross. You will be responsible for all material in my lectures, assigned reading, and homework, including supplementary handouts if any.

10,221 citations


"Tandem repeats finder: a program to..." refers background in this paper

  • ...It can be shown ( 33 ) that 95% of the time, Wd,pI ranges between ±2.3 p Id . We set Δdmax = 2.3 p Id . For example if pI = 0.1 and d = 100, then Δdmax = 7....

    [...]

Journal ArticleDOI
26 Mar 1993-Cell
TL;DR: In this article, the authors used haplotype analysis of linkage disequilibrium to spotlight a small segment of 4p16.3 as the likely location of the defect, which is expanded and unstable on HD chromosomes.

7,224 citations