scispace - formally typeset
Open AccessJournal ArticleDOI

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

Matko Glunčić, +1 more
- 01 Jan 2013 - 
- Vol. 41, Iss: 1
Reads0
Chats0
TLDR
This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.
Abstract
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

read more

Citations
More filters
Journal ArticleDOI

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.
BookDOI

Data Mining Techniques for the Life Sciences

TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Journal ArticleDOI

Understanding Long-range Correlations in DNA Sequences

TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
Journal ArticleDOI

Satellite DNA evolution: old ideas, new approaches.

TL;DR: Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide and how their applications are furthering knowledge of satellite evolution and function is described.
References
More filters
Journal ArticleDOI

Microsatellites: simple sequences with complex evolution

TL;DR: Few genetic markers, if any, have found such widespread use as microsatellites, or simple/short tandem repeats, but features such as hypervariability and ubiquitous occurrence explain their usefulness, but these features also pose several questions.
Journal ArticleDOI

Gene Regulation for Higher Cells: A Theory

Roy J. Britten, +1 more
- 25 Jul 1969 - 
TL;DR: Direct support for the idea that regulation of gene activity underlies cell differentiation comes from evidence that much of the genome in higher cell types is inactive and that different ribonucleic acids are synthesized in different cell types.
Journal ArticleDOI

The evolutionary dynamics of repetitive DNA in eukaryotes

TL;DR: Features of the organization of repetitive sequences in eukaryotic genomes, and their distribution in natural populations, reflect the evolutionary forces acting on selfish DNA.
Journal ArticleDOI

REPuter: the manifold applications of repeat analysis on a genomic scale.

TL;DR: The wide scope of repeat analysis is circumscribes using applications in five different areas of sequence analysis: checking fragment assemblies, searching for low copy repeats, finding unique sequences, comparing gene structures and mapping of cDNA/EST sequences.
Journal ArticleDOI

Alu repeats and human genomic diversity

TL;DR: During the past 65 million years, Alu elements have propagated to more than one million copies in primate genomes, which has resulted in the generation of a series of Alu subfamilies of different ages.
Related Papers (5)