Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats.

doi:10.1186/1471-2105-9-466

Home
/
Papers
/
Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats.

Journal Article•DOI•

Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats.

Vladimir Paar¹, Nenad Pavin², Ivan Basar¹, Marija Rosandić¹, Matko Glunčić¹, Nils Paar¹ - Show less +2 more•Institutions (2)

University of Zagreb¹, Max Planck Society²

03 Nov 2008-BMC Bioinformatics (BioMed Central)-Vol. 9, Iss: 1, pp 466-466

TL;DR: DFT provides a robust detection method for higher order periodicity and is robust with respect to monomer insertions and deletions, random sequence insertions etc.

read less

Abstract: Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•DOI•

Data Mining Techniques for the Life Sciences

[...]

Oliviero Carugo, Frank Eisenhaber

14 Dec 2009

TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.

...read moreread less

Abstract: Whereas getting exact data about living systems and sophisticated experimental procedures have primarily absorbed the minds of researchers previously, the development of high-throughput technologies has caused the weight to increasingly shift to the problem of interpreting accumulated data in terms of biological function and biomolecular mechanisms. In "Data Mining Techniques for the Life Sciences", experts in the field contribute valuable information about the sources of information and the techniques used for "mining" new insights out of databases. Beginning with a section covering the concepts and structures of important groups of databases for biomolecular mechanism research, the book then continues with sections on formal methods for analyzing biomolecular data and reviews of concepts for analyzing biomolecular sequence data in context with other experimental results that can be mapped onto genomes. As a volume of the highly successful Methods in Molecular Biology series, this work provides the kind of detailed description and implementation advice that is crucial for getting optimal results. Authoritative and easy to reference, "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.

...read moreread less

135 citations

Cites background or methods from "Hierarchical structure of cascade o..."

...More recently, new technologies of third and fourth generation sequencing [5] such as single cell molecule [6], nanopore-based [7] have been applied to whole-transcriptome analysis that opened a possibility for profiling rare or heterogeneous populations of cells....
[...]
...The frequency of stabilizing and destabilizing mutations in all single mutants [5] showed that most of the mutational experiments have been carried out with hydrophobic substitutions (replacement of one hydrophobic residue with another, e....
[...]
...The stability data for a set of 180 double mutants have been collected from ProTherm database [3, 5] and related them with sequence based features such as wild-type residue, mutant residue, and three neighboring residues on both directions of the mutant site....
[...]
...org/) [5], with the aim of coordinating and synchronizing the curation effort of all the participants and to offer a unified, freely available, consistently annotated and nonredundant molecular interaction dataset....
[...]
...The data come in three different formats: old-style PDB-format files, macromolecular Crystallographic Information File (mmCIF) format [5], and a XMLstyle format called PDBML/XML [6]....
[...]

Journal Article•DOI•

Understanding Long-range Correlations in DNA Sequences

[...]

Wentian Li¹, Thomas G. Marr¹, Kunihiko Kaneko²•Institutions (2)

Cold Spring Harbor Laboratory¹, University of Tokyo²

22 Mar 1994-arXiv: Chaotic Dynamics

TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.

...read moreread less

Abstract: In this paper, we review the literature on statistical long-range correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f-like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexity of DNA sequences, a fruitful approach to understand long-range correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called ``expansion-modification system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1/f spectra. We emphasize the importance of DNA duplication in its contribution to the observed long-range correlation in DNA sequences.

...read moreread less

130 citations

Measure representation and multifractal analysis of complete genomes

[...]

Zu-Guo Yu¹, Zu-Guo Yu², Vo Anh², Ka-Sing Lau³•Institutions (3)

Xiangtan University¹, Queensland University of Technology², The Chinese University of Hong Kong³

01 Aug 2001

TL;DR: Spectral analyses performed indicate that these measure representations, considered as time series, exhibit strong long-range correlation and the multifractal property of the measure representation and the classification of bacteria.

...read moreread less

Abstract: This paper introduces the notion of measure representation of DNA sequences. Spectral analysis and multifractal analysis are then performed on the measure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure representation and the classification of bacteria. From the measure representations and the values of the Dq spectra and related Cq curves, it is concluded that these complete genomes are not random sequences. In fact, spectral analyses performed indicate that these measure representations, considered as time series, exhibit strong long-range correlation. Here the long-range correlation is for the K-strings with dictionary ordering, and it is different from the base pair correlations introduced by other people. For substrings with length K=8, the Dq spectra of all organisms studied are multifractal-like and sufficiently smooth for the Cq curves to be meaningful. With the decreasing value of K, the multifractality lessens. The Cq curves of all bacteria resemble a classical phase transition at a critical point. But the ‘‘analogous’’ phase transitions of chromosomes of nonbacteria organisms are different. Apart from chromosome 1 of C. elegans, they exhibit the shape of double-peaked specific heat function. A classification of genomes of bacteria by assigning to each sequence a point in two-dimensional space (D_{-1} ,D1) and in three-dimensional space (D_{-1} ,D1 ,D_{-2}) was given. Bacteria that are close phylogenetically are almost close in the spaces (D_{-1} ,D1) and (D_{-1} ,D1 ,D_{-2}).

...read moreread less

102 citations

Journal Article•DOI•

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

[...]

Matko Glunčić¹, Vladimir Paar¹•Institutions (1)

University of Zagreb¹

01 Jan 2013-Nucleic Acids Research

TL;DR: This work presents several case studies of GRM use, and presents the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram.

...read moreread less

Abstract: The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012 .exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of a-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/ or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

...read moreread less

27 citations

Cites result from "Hierarchical structure of cascade o..."

...These GRM results are in accordance with the pattern of previous results obtained by using heuristic algorithms (96)....
[...]

Journal Article•DOI•

Coexistence of different base periodicities in prokaryotic genomes as related to DNA curvature, supercoiling, and transcription.

[...]

G.I. Kravatskaya¹, Y. V. Kravatsky¹, V. R. Chechetkin¹, Vladimir G. Tumanyan¹•Institutions (1)

Engelhardt Institute of Molecular Biology¹

01 Sep 2011-Genomics

TL;DR: The comparison with available experimental data indicates that promoters with the most pronounced periodicities may be related to the supercoiling-sensitive genes.

...read moreread less

23 citations

Cites background from "Hierarchical structure of cascade o..."

...This sum is invariant with respect to complementary inversion of a sequence [34] and is more convenient for comparing the helical periodicities in the complete genome and in the promoter sequences (in the latter case, the promoters on two chains were always compiled as 5 ́–3 ́ sequences)....
[...]
...through sets of equidistant peaks [31, 34, 35]....
[...]

1
2
3
4
…
5
6

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Equivalence of two Fourier methods for biological sequences

[...]

Eivind Coward¹•Institutions (1)

Norwegian University of Science and Technology¹

15 Nov 1997-Journal of Mathematical Biology

TL;DR: In this article, two methods for defining Fourier power spectra for DNA sequences or other biological sequences are compared, and it is shown that the Fourier transform of the two methods are essentially the same.

...read moreread less

Abstract: Two methods for defining Fourier power spectra for DNA sequences or other biological sequences are compared. The first method uses indicator sequences for each letter. The second method by Silverman and Linsker assigns to each letter a vertex of a regular tetrahedron in space, and this can be generalized to any dimension. While giving different Fourier transforms, it is shown that the power spectra of the two methods are essentially the same. This is also true if one replaces the Fourier transform in both methods with another linear transform, such as the Walsh transform.

...read moreread less

47 citations

"Hierarchical structure of cascade o..." refers methods in this paper

...Different computational techniques have been used: Fourier spectral analysis [4-20], wavelet transform [21], DNA walk analysis [22-25], information theory measures [26-28], informational decomposition [29,30], quaternionic periodicity transform [31], exactly periodic subspace decomposition [32,33], portrait method [34], enhance algorithm for distance frequency distribution [35], etc....
[...]

Journal Article•DOI•

Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences

[...]

Jianbo Gao¹, Yan Qi, Yinhe Cao, Wen-wen Tung•Institutions (1)

University of Florida¹

30 Jun 2005-BioMed Research International

TL;DR: A novel way to develop new and efficient codon indices by simultaneously characterizing the fractal and periodic features of a DNA sequence by studying all of the 16 yeast chromosomes is described.

...read moreread less

Abstract: Most codon indices used today are based on highly biased nonrandom usage of codons in coding regions. The background of a coding or noncoding DNA sequence, however, is fairly random, and can be characterized as a random fractal. When a gene-finding algorithm incorporates multiple sources of information about coding regions, it becomes more successful. It is thus highly desirable to develop new and efficient codon indices by simultaneously characterizing the fractal and periodic features of a DNA sequence. In this paper, we describe a novel way of achieving this goal. The efficiency of the new codon index is evaluated by studying all of the 16 yeast chromosomes. In particular, we show that the method automatically and correctly identifies which of the three reading frames is the one that contains a gene.

...read moreread less

40 citations

Journal Article•DOI•

Organization, polymorphism, and molecular cytogenetics of chromosome-specific α-satellite DNA from the centromere of chromosome 2

[...]

Thomas Haaf¹, Huntington F. Willard¹•Institutions (1)

Stanford University¹

01 May 1992-Genomics

TL;DR: The isolation and characterization of an alpha-satellite subset specific for human chromosome 2 is described, organized as a series of diverged 680-bp tetramers, revealed after digestion of genomic DNA with HaeIII, HindIII, HinfI, StuI, and XbaI.

...read moreread less

39 citations

Journal Article•DOI•

Multifractal analysis of DNA sequences using a novel chaos-game representation

[...]

José M. Gutiérrez¹, Miguel A. Rodríguez², G. Abramson•Institutions (2)

University of Cantabria¹, Spanish National Research Council²

01 Nov 2001-Physica A-statistical Mechanics and Its Applications

TL;DR: No general statement can be made on the influence of coding and non-coding content on the correlation length of a given sequence, and the multifractal spectrum is shown to be more sensitive for detecting dependence structures within the DNA sequence than the averaged contribution given by redundancy.

...read moreread less

Abstract: We present a generalization of the standard chaos-game representation method introduced by Jeffrey. To this aim, a DNA symbolic sequence is mapped onto a singular measure on the attractor of a particular IFS model, which is a perfect statistical representation of the sequence. A multifractal analysis of the resulting measure is introduced and an interpretation of singularities in terms of mutual information and redundancy (statistical dependence) among subsequence symbols within the DNA sequence is provided. The multifractal spectrum is also shown to be more sensitive for detecting dependence structures within the DNA sequence than the averaged contribution given by redundancy. This method presents several advantages with respect to other representations such as walks or interfaces, which may introduce spurious effects. In contrast with the results obtained by other standard methods, here we note that no general statement can be made on the influence of coding and non-coding content on the correlation length of a given sequence.

...read moreread less

39 citations

Journal Article•DOI•

Distance, correlation and mutual information among portraits of organisms based on complete genomes

[...]

Zu-Guo Yu¹, Zu-Guo Yu², Po Jiang³•Institutions (3)

Academia Sinica¹, Xiangtan University², University of Michigan³

16 Jul 2001-Physics Letters A

TL;DR: In this paper, four parameters (distance, correlation coefficient, entropy and mutual information) are introduced to provide exact measures of the difference between a real genome and the white noise genome of it.

...read moreread less

38 citations

"Hierarchical structure of cascade o..." refers methods in this paper

...Different computational techniques have been used: Fourier spectral analysis [4-20], wavelet transform [21], DNA walk analysis [22-25], information theory measures [26-28], informational decomposition [29,30], quaternionic periodicity transform [31], exactly periodic subspace decomposition [32,33], portrait method [34], enhance algorithm for distance frequency distribution [35], etc....
[...]