Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm
Matko Glunčić,Vladimir Paar +1 more
Reads0
Chats0
TLDR
This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.Abstract:
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).read more
Citations
More filters
Journal ArticleDOI
TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads
TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.
BookDOI
Data Mining Techniques for the Life Sciences
Oliviero Carugo,Frank Eisenhaber +1 more
TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Journal ArticleDOI
Understanding Long-range Correlations in DNA Sequences
TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
Journal ArticleDOI
Satellite DNA evolution: old ideas, new approaches.
TL;DR: Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide and how their applications are furthering knowledge of satellite evolution and function is described.
Journal ArticleDOI
The in vivo genetic program of murine primordial lung epithelial progenitors
Laertis Ikonomou,Laertis Ikonomou,Michael J. Herriges,Michael J. Herriges,Sara L. Lewandowski,Sara L. Lewandowski,Robert Marsland,Carlos Villacorta-Martin,Ignacio S. Caballero,David B. Frank,Reeti M. Sanghrajka,Reeti M. Sanghrajka,Keri Dame,Keri Dame,Maciej M. Kańduła,Julia Hicks-Berthet,Matthew L. Lawton,Matthew L. Lawton,Constantina Christodoulou,Attila J. Fabian,Eric D. Kolaczyk,Xaralabos Varelas,Edward E. Morrisey,John M. Shannon,Pankaj Mehta,Darrell N. Kotton,Darrell N. Kotton +26 more
TL;DR: Bulk RNA-sequencing is used to describe the unique genetic program of in vivo murine lung primordial progenitors and computationally identify signaling pathways that are involved in their cell-fate determination from pre-specified embryonic foregut.
References
More filters
Journal ArticleDOI
Microsatellites: simple sequences with complex evolution
TL;DR: Few genetic markers, if any, have found such widespread use as microsatellites, or simple/short tandem repeats, but features such as hypervariability and ubiquitous occurrence explain their usefulness, but these features also pose several questions.
Journal ArticleDOI
Gene Regulation for Higher Cells: A Theory
Roy J. Britten,Eric H. Davidson +1 more
TL;DR: Direct support for the idea that regulation of gene activity underlies cell differentiation comes from evidence that much of the genome in higher cell types is inactive and that different ribonucleic acids are synthesized in different cell types.
Journal ArticleDOI
The evolutionary dynamics of repetitive DNA in eukaryotes
TL;DR: Features of the organization of repetitive sequences in eukaryotic genomes, and their distribution in natural populations, reflect the evolutionary forces acting on selfish DNA.
Journal ArticleDOI
REPuter: the manifold applications of repeat analysis on a genomic scale.
Stefan Kurtz,Jomuna V. Choudhuri,Enno Ohlebusch,Chris Schleiermacher,Jens Stoye,Robert Giegerich +5 more
TL;DR: The wide scope of repeat analysis is circumscribes using applications in five different areas of sequence analysis: checking fragment assemblies, searching for low copy repeats, finding unique sequences, comparing gene structures and mapping of cDNA/EST sequences.
Journal ArticleDOI
Alu repeats and human genomic diversity
TL;DR: During the past 65 million years, Alu elements have propagated to more than one million copies in primate genomes, which has resulted in the generation of a series of Alu subfamilies of different ages.