Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation
TLDR
The Spectral Repeat Finder program circumvents problems by using a discrete Fourier transformation to identify significant periodicities present in a sequence and shows efficient and complete detection of repeats.Abstract:
Motivation: Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing the intra-genomic dynamics as an important feature of comparative genomics. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length.
Results: The Spectral Repeat Finder program circumvents these problems by using a discrete Fourier transformation to identify significant periodicities present in a sequence. The specific regions of the sequence that contribute to a given periodicity are located through a sliding window analysis, and an exact search method is then used to find the repetitive units. Efficient and complete detection of repeats is provided together with interactive and detailed visualization of the spectral analysis of input sequence. We demonstrate the utility of our method with various examples that contain previously unannotated repeats. A Web server has been developed for convenient access to the automated program.
Availability: The Web server is available at http://www.imtech.res.in/raghava/srf and http://www2.imtech.res.in/raghava/srfread more
Citations
More filters
Journal ArticleDOI
Mining microsatellites in eukaryotic genomes
TL;DR: This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microSatellites in eukaryotic genomes.
Journal ArticleDOI
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs.
TL;DR: This review is intended to provide an overview as comprehensive as possible of the automated methods currently used to annotate and classify TEs in sequenced genomes, among which are transposable elements (TEs).
Journal ArticleDOI
TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads
TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.
BookDOI
Data Mining Techniques for the Life Sciences
Oliviero Carugo,Frank Eisenhaber +1 more
TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Journal ArticleDOI
Understanding Long-range Correlations in DNA Sequences
TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
Initial sequencing and analysis of the human genome.
Eric S. Lander,Lauren Linton,Bruce W. Birren,Chad Nusbaum,Michael C. Zody,Jennifer Baldwin,Keri Devon,Ken Dewar,Michael Doyle,William Fitzhugh,Roel Funke,Diane Gage,Katrina Harris,Andrew Heaford,John Howland,Lisa Kann,Jessica A. Lehoczky,Rosie Levine,Paul A. McEwan,Kevin McKernan,James Meldrim,Jill P. Mesirov,Cher Miranda,William Morris,Jerome Naylor,Christina Raymond,Mark Rosetti,Ralph Santos,Andrew Sheridan,Carrie Sougnez,Nicole Stange-Thomann,Nikola Stojanovic,Aravind Subramanian,Dudley Wyman,Jane Rogers,John Sulston,R Ainscough,Stephan Beck,David Bentley,John Burton,C M Clee,Nigel P. Carter,Alan Coulson,Rebecca Deadman,Panos Deloukas,Andrew Dunham,Ian Dunham,Richard Durbin,Lisa French,Darren Grafham,Simon G. Gregory,Tim Hubbard,Sean Humphray,Adrienne Hunt,Matthew Jones,Christine Lloyd,Amanda McMurray,Lucy Matthews,Simon Mercer,Sarah Milne,James C. Mullikin,Andrew J. Mungall,Robert W. Plumb,Mark T. Ross,Ratna Shownkeen,Sarah Sims,Robert H. Waterston,Richard K. Wilson,LaDeana W. Hillier,John Douglas Mcpherson,Marco A. Marra,Elaine R. Mardis,Lucinda Fulton,Asif T. Chinwalla,Kymberlie H. Pepin,Warren Gish,Stephanie L. Chissoe,Michael C. Wendl,Kim D. Delehaunty,Tracie L. Miner,Andrew Delehaunty,Jason B. Kramer,Lisa Cook,Robert S. Fulton,Douglas L. Johnson,Patrick Minx,Sandra W. Clifton,Trevor Hawkins,Elbert Branscomb,Paul Predki,Paul G. Richardson,Sarah Wenning,Tom Slezak,Norman A. Doggett,Jan Fang Cheng,Anne S. Olsen,Susan Lucas,Christopher J. Elkin,Edward Uberbacher,Marvin Frazier,Richard A. Gibbs,Donna M. Muzny,Steven E. Scherer,John Bouck,Erica Sodergren,Kim C. Worley,Catherine M. Rives,James H. Gorrell,Michael L. Metzker,Susan L. Naylor,Raju Kucherlapati,David L. Nelson,George M. Weinstock,Yoshiyuki Sakaki,Asao Fujiyama,Masahira Hattori,Tetsushi Yada,Atsushi Toyoda,Takehiko Itoh,Chiharu Kawagoe,Hidemi Watanabe,Yasushi Totoki,Todd D. Taylor,Jean Weissenbach,Roland Heilig,William Saurin,François Artiguenave,Philippe Brottier,Thomas Brüls,Eric Pelletier,Catherine Robert,Patrick Wincker,André Rosenthal,Matthias Platzer,Gerald Nyakatura,Stefan Taudien,Andreas Rump,Douglas R. Smith,Lynn Doucette-Stamm,Marc Rubenfield,Keith Weinstock,Mei Lee Hong,Joann Dubois,Huanming Yang,Jun Yu,Jian Wang,Guyang Huang,Jun Gu,Leroy Hood,Lee Rowen,Anup Madan,Shizen Qin,Ronald W. Davis,Nancy A. Federspiel,A. Pia Abola,Michael Proctor,Bruce A. Roe,Feng Chen,Huaqin Pan,Juliane Ramser,Hans Lehrach,Richard Reinhardt,W. Richard McCombie,Melissa De La Bastide,Neilay Dedhia,H. Blöcker,K. Hornischer,Gabriele Nordsiek,Richa Agarwala,L. Aravind,Jeffrey A. Bailey,Alex Bateman,Serafim Batzoglou,Ewan Birney,Peer Bork,Daniel G. Brown,Christopher B. Burge,Lorenzo Cerutti,Hsiu Chuan Chen,Deanna M. Church,Michele Clamp,Richard R. Copley,Tobias Doerks,Sean R. Eddy,Evan E. Eichler,Terrence S. Furey,James E. Galagan,James G. R. Gilbert,Cyrus L. Harmon,Yoshihide Hayashizaki,David Haussler,Henning Hermjakob,Karsten Hokamp,Wonhee Jang,L. Steven Johnson,Thomas A. Jones,Simon Kasif,Arek Kaspryzk,Scot Kennedy,W. James Kent,Paul Kitts,Eugene V. Koonin,Ian F Korf,David Kulp,Doron Lancet,Todd M. Lowe,Aoife McLysaght,Tarjei S. Mikkelsen,John V. Moran,Nicola Mulder,Victor J. Pollara,Chris P. Ponting,Greg Schuler,Jörg Schultz,Guy Slater,Arian F.A. Smit,Elia Stupka,Joseph Szustakowki,Danielle Thierry-Mieg,Jean Thierry-Mieg,Lukas Wagner,John W. Wallis,Raymond Wheeler,Alan Williams,Yuri I. Wolf,Kenneth H. Wolfe,Shiaw Pyng Yang,Ru Fang Yeh,Francis S. Collins,Mark S. Guyer,Jane Peterson,Adam Felsenfeld,Kris A. Wetterstrand,Richard M. Myers,Jeremy Schmutz,Mark Dickson,Jane Grimwood,David R. Cox,Maynard V. Olson,Rajinder Kaul,Christopher K. Raymond,Nobuyoshi Shimizu,Kazuhiko Kawasaki,Shinsei Minoshima,Glen A. Evans,Maria Athanasiou,Roger A. Schultz,Aristides Patrinos,Michael J. Morgan +248 more
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Journal ArticleDOI
Improved tools for biological sequence comparison.
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI
Tandem repeats finder: a program to analyze DNA sequences
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Journal ArticleDOI
REPuter: the manifold applications of repeat analysis on a genomic scale.
Stefan Kurtz,Jomuna V. Choudhuri,Enno Ohlebusch,Chris Schleiermacher,Jens Stoye,Robert Giegerich +5 more
TL;DR: The wide scope of repeat analysis is circumscribes using applications in five different areas of sequence analysis: checking fragment assemblies, searching for low copy repeats, finding unique sequences, comparing gene structures and mapping of cDNA/EST sequences.