scispace - formally typeset
Open AccessJournal ArticleDOI

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

Matko Glunčić, +1 more
- 01 Jan 2013 - 
- Vol. 41, Iss: 1
Reads0
Chats0
TLDR
This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.
Abstract
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

read more

Citations
More filters
Journal ArticleDOI

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.
BookDOI

Data Mining Techniques for the Life Sciences

TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Journal ArticleDOI

Understanding Long-range Correlations in DNA Sequences

TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
Journal ArticleDOI

Satellite DNA evolution: old ideas, new approaches.

TL;DR: Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide and how their applications are furthering knowledge of satellite evolution and function is described.
References
More filters
Journal ArticleDOI

Human centromeric DNAs

TL;DR: An overview of currently identified human centromeres: their discoveries, molecular characterization, and organization with respect to other centromeric repetitive DNA families is presented.
Journal ArticleDOI

Mining microsatellites in eukaryotic genomes

TL;DR: This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microSatellites in eukaryotic genomes.
Journal ArticleDOI

Empirical comparison of ab initio repeat finding programs

TL;DR: Side-by-side evaluations of six of the most widely used ab initio repeat finding programs reveal profound differences in the utility with some identifying virtually their entire substrate as repetitive, others making reasonable estimates of repetition, and some missing almost all repeats.
Journal ArticleDOI

A measure of DNA periodicity

TL;DR: The transform is invariant to the labelling of the bases and can therefore be used as a measure of periodicity for segments of DNA with differing base content and can also be conveniently used to search for base periodicities within large DNA data bases.
Proceedings ArticleDOI

Genomic signal processing

TL;DR: It is shown that the variation of genomic data along nucleotide sequences can be visualized adequately as simple graphic lines for low and large scales, while for medium scales (thousands to tens of thousands of base pairs) the statistical descriptions have to be used.
Related Papers (5)