Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm
Matko Glunčić,Vladimir Paar +1 more
Reads0
Chats0
TLDR
This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.Abstract:
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).read more
Citations
More filters
Journal ArticleDOI
TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads
TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.
BookDOI
Data Mining Techniques for the Life Sciences
Oliviero Carugo,Frank Eisenhaber +1 more
TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Journal ArticleDOI
Understanding Long-range Correlations in DNA Sequences
TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
Journal ArticleDOI
Satellite DNA evolution: old ideas, new approaches.
TL;DR: Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide and how their applications are furthering knowledge of satellite evolution and function is described.
Journal ArticleDOI
The in vivo genetic program of murine primordial lung epithelial progenitors
Laertis Ikonomou,Laertis Ikonomou,Michael J. Herriges,Michael J. Herriges,Sara L. Lewandowski,Sara L. Lewandowski,Robert Marsland,Carlos Villacorta-Martin,Ignacio S. Caballero,David B. Frank,Reeti M. Sanghrajka,Reeti M. Sanghrajka,Keri Dame,Keri Dame,Maciej M. Kańduła,Julia Hicks-Berthet,Matthew L. Lawton,Matthew L. Lawton,Constantina Christodoulou,Attila J. Fabian,Eric D. Kolaczyk,Xaralabos Varelas,Edward E. Morrisey,John M. Shannon,Pankaj Mehta,Darrell N. Kotton,Darrell N. Kotton +26 more
TL;DR: Bulk RNA-sequencing is used to describe the unique genetic program of in vivo murine lung primordial progenitors and computationally identify signaling pathways that are involved in their cell-fate determination from pre-specified embryonic foregut.
References
More filters
Journal ArticleDOI
Human centromeric DNAs
TL;DR: An overview of currently identified human centromeres: their discoveries, molecular characterization, and organization with respect to other centromeric repetitive DNA families is presented.
Journal ArticleDOI
Mining microsatellites in eukaryotic genomes
TL;DR: This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microSatellites in eukaryotic genomes.
Journal ArticleDOI
Empirical comparison of ab initio repeat finding programs
TL;DR: Side-by-side evaluations of six of the most widely used ab initio repeat finding programs reveal profound differences in the utility with some identifying virtually their entire substrate as repetitive, others making reasonable estimates of repetition, and some missing almost all repeats.
Journal ArticleDOI
A measure of DNA periodicity
B.D. Silverman,R. Linsker +1 more
TL;DR: The transform is invariant to the labelling of the bases and can therefore be used as a measure of periodicity for segments of DNA with differing base content and can also be conveniently used to search for base periodicities within large DNA data bases.
Proceedings ArticleDOI
Genomic signal processing
TL;DR: It is shown that the variation of genomic data along nucleotide sequences can be visualized adequately as simple graphic lines for low and large scales, while for medium scales (thousands to tens of thousands of base pairs) the statistical descriptions have to be used.