Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

doi:10.1093/NAR/GKS721

Open AccessJournal ArticleDOI

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

Matko Glunčić, +1 more

- 01 Jan 2013 -

Nucleic Acids Research

- Vol. 41, Iss: 1

Chats0

TLDR

This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.

Abstract:

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

Citations

PDF

Open Access

More filters

Journal ArticleDOI

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

Petr Novák, +5 more

- 07 Jul 2017 -

Nucleic Acids Research

TL;DR: A novel computational pipeline that circumvents the problem of difficult to assemble satellite DNA characterization by detecting satellite repeats directly from unassembled short reads by employing graph-based sequence clustering to identify groups of reads that represent repetitive elements.

...read moreread less

BookDOI

Data Mining Techniques for the Life Sciences

Oliviero Carugo, +1 more

TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.

...read moreread less

Journal ArticleDOI

Understanding Long-range Correlations in DNA Sequences

Wentian Li, +2 more

- 22 Mar 1994 -

arXiv: Chaotic Dynamics

TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.

...read moreread less

Journal ArticleDOI

Satellite DNA evolution: old ideas, new approaches.

Sarah Sander Lower, +3 more

- 23 Mar 2018 -

Current Opinion in Genetics & Developmen...

TL;DR: Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide and how their applications are furthering knowledge of satellite evolution and function is described.

...read moreread less

Journal ArticleDOI

The in vivo genetic program of murine primordial lung epithelial progenitors

Laertis Ikonomou, +26 more

- 31 Jan 2020 -

Nature Communications

TL;DR: Bulk RNA-sequencing is used to describe the unique genetic program of in vivo murine lung primordial progenitors and computationally identify signaling pathways that are involved in their cell-fate determination from pre-specified embryonic foregut.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Human centromeric DNAs

Charles Lee, +4 more

- 02 Jun 1997 -

Human Genetics

TL;DR: An overview of currently identified human centromeres: their discoveries, molecular characterization, and organization with respect to other centromeric repetitive DNA families is presented.

...read moreread less

Journal ArticleDOI

Mining microsatellites in eukaryotic genomes

Prakash C. Sharma, +2 more

- 01 Nov 2007 -

Trends in Biotechnology

TL;DR: This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microSatellites in eukaryotic genomes.

...read moreread less

Journal ArticleDOI

Empirical comparison of ab initio repeat finding programs

Surya Saha, +3 more

- 01 Apr 2008 -

Nucleic Acids Research

TL;DR: Side-by-side evaluations of six of the most widely used ab initio repeat finding programs reveal profound differences in the utility with some identifying virtually their entire substrate as repetitive, others making reasonable estimates of repetition, and some missing almost all repeats.

...read moreread less

Journal ArticleDOI

A measure of DNA periodicity

B.D. Silverman, +1 more

- 07 Feb 1986 -

Journal of Theoretical Biology

TL;DR: The transform is invariant to the labelling of the bases and can therefore be used as a measure of periodicity for segments of DNA with differing base content and can also be conveniently used to search for base periodicities within large DNA data bases.

...read moreread less

Proceedings ArticleDOI

Genomic signal processing

P.D. Cristea

TL;DR: It is shown that the variation of genomic data along nucleotide sequences can be visualized adequately as simple graphic lines for low and large scales, while for medium scales (thousands to tens of thousands of base pairs) the statistical descriptions have to be used.

...read moreread less