scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats.

03 Nov 2008-BMC Bioinformatics (BioMed Central)-Vol. 9, Iss: 1, pp 466-466
TL;DR: DFT provides a robust detection method for higher order periodicity and is robust with respect to monomer insertions and deletions, random sequence insertions etc.
Abstract: Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2004
TL;DR: In this paper, an improved method, called Alternative Spectral Rotation (ASR) measure, for predicting protein coding regions in rice DNA has been developed, and its accuracy is higher than that of the SR measure and the Spectral Content (SC)measure.
Abstract: An improved method, called Alternative Spectral Rotation (ASR) measure, for predicting protein coding regions in rice DNA has been developed. The method is based on the Spectral Rotation (SR) measure proposed by Kotlar and Lavner, and its accuracy is higher than that of the SR measure and the Spectral Content (SC)measure proposed by Tiwari et al. In order to increase the identifying accuracy,we chose three different coding characters, namely the asymmetric, purine, and stop-codon variables as parameters, and an approving result was presented by the method of Linear Discriminant Analysis (LDA).

1 citations

Journal Article
TL;DR: The application of spectral analysis and spectrograms using a novel numerical representation to identify and study alpha satellite higher order repeats in human chromosomes 7 and 17 is investigated.
Abstract: Detection of tandem repeats can be used for phylogenic studies and disease diagnosis. The numerical representation of genomic signals is very important, as many of the methods for detecting repeated sequences are part of the DSP field. These methods involve the application of a kind of transformation. Applying a transform technique requires mapping the symbolic domain into the numeric domain in such a way that no additional structure is placed on the symbolic sequence beyond that inherent to it. Here we investigate the application of spectral analysis and spectrograms using a novel numerical representation to identify and study alpha satellite higher order repeats in human chromosomes 7 and 17.

1 citations


Cites background or methods from "Hierarchical structure of cascade o..."

  • ...One way to do this was proposed in [4] [5] [14] as quartic mapping....

    [...]

  • ...In case of AC017075 high-order repeats were identified in the central domain (positions 31338 to 177434, total length 148147bp) while in the front domain of genomic sequence (31337 bp) and in the back domain (15843 bp), alpha satellite monomers were found [2] [14]....

    [...]

  • ...[14] Paar V, Pavin N, Basar I, Rosandic M, Gluncic M, Paar N, Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats, BMC Bioinformatics, 2008 Nov 3; 9(1):466....

    [...]

Book ChapterDOI
TL;DR: Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences and examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.
Abstract: Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.

1 citations

Journal ArticleDOI
TL;DR: In this article, a quasi-regular large-scale segmentation in genomic sequences of different ssRNA, ssDNA, and dsDNA viruses was performed by combining discrete direct and double Fourier transforms.
Abstract: The assembly and maturation of viruses with icosahedral capsids must be coordinated with icosahedral symmetry. The icosahedral symmetry imposes also the restrictions on the cooperative specific interactions between genomic RNA/DNA and coat proteins that should be reflected in quasi-regular segmentation of viral genomic sequences. Combining discrete direct and double Fourier transforms, we studied the quasi-regular large-scale segmentation in genomic sequences of different ssRNA, ssDNA, and dsDNA viruses. The particular representatives included satellite tobacco mosaic virus and the strains of satellite tobacco necrosis virus, STNV-C, STNV-1, STNV-2, Escherichia phages MS2, phiX174, alpha3, and HK97, and Simian virus 40. In all their genomes, we found the significant quasi-regular segmentation of genomic sequences related to the virion assembly and the genome packaging within icosahedral capsid. We also found good correspondence between our results and available cryo-electron microscopy data on capsid structures and genome packaging in these viruses. Fourier analysis of genomic sequences provides the additional insight into mechanisms of hierarchical genome packaging and may be used for verification of the concepts of 3-fold or 5-fold intermediates in virion assembly. The results of sequence analysis should be taken into account at the choice of models and data interpretation. They also may be helpful for the development of antiviral drugs.

1 citations

Proceedings ArticleDOI
08 Aug 2011
TL;DR: Results obtained by combining a customized dot plot analysis with a numerical representation to isolate position and length of DNA repeats from human alpha satellite DNA are presented.
Abstract: Detection of DNA repeats can be used for phylogenetic studies and disease diagnosis. A major difficulty in identification of DNA repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length. Numerical representation of genomic signals is very important as many of the methods for detecting repeated sequences are part of the digital signal processing field. This paper presents results obtained by combining a customized dot plot analysis with a numerical representation to isolate position and length of DNA repeats from human alpha satellite DNA.

Cites background from "Hierarchical structure of cascade o..."

  • ...RESULTS AND DISCUSSION Our case study was the 16mer high order repeat in AC136363 from human chromosome 17 (GenBank) which contains dispersed alphoid sequences, both higher-order and monomeric alpha-satellite [6]....

    [...]

References
More filters
Journal ArticleDOI
12 Mar 1992-Nature
TL;DR: This work proposes a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which it refers to as a 'DNA walk', and uncovers a remarkably long-range power law correlation.
Abstract: DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

1,314 citations


"Hierarchical structure of cascade o..." refers background or methods in this paper

  • ...Statistical studies of DNA sequences have been instigated by finding of the 1/fβ long-range power-law correlations in human genomic sequences, indicating the presence of scale invariant structure [4,5,22], implying that the underlying system shows fractal properties [25,76,77]....

    [...]

  • ...Different computational techniques have been used: Fourier spectral analysis [4-20], wavelet transform [21], DNA walk analysis [22-25], information theory measures [26-28], informational decomposition [29,30], quaternionic periodicity transform [31], exactly periodic subspace decomposition [32,33], portrait method [34], enhance algorithm for distance frequency distribution [35], etc....

    [...]

  • ...A single binary sequence was used by mapping genomic sequence into purine/pyrimidine representation [22], or into weak bond/strong bond representation [109]....

    [...]

  • ...A sharp peak of period three was found in a search for periodic regularities on a sample set of human exons [5,9,10,22,54,60,64]....

    [...]

Journal ArticleDOI
Ian Dunham1, Nobuyoshi Shimizu1, Bruce A. Roe1, S. Chissoe1  +220 moreInstitutions (15)
02 Dec 1999-Nature
TL;DR: The sequence of the euchromatic part of human chromosome 22 is reported, which consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.
Abstract: Knowledge of the complete genomic DNA sequence of an organism allows a systematic approach to defining its genetic components. The genomic sequence provides access to the complete structures of all genes, including those without known function, their control elements, and, by inference, the proteins they encode, as well as all other biologically important sequences. Furthermore, the sequence is a rich and permanent source of information for the design of further biological studies of the organism and for the study of evolution through cross-species sequence comparison. The power of this approach has been amply demonstrated by the determination of the sequences of a number of microbial and model organisms. The next step is to obtain the complete sequence of the entire human genome. Here we report the sequence of the euchromatic part of human chromosome 22. The sequence obtained consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.

1,075 citations


"Hierarchical structure of cascade o..." refers methods in this paper

  • ...The relative height of the corresponding peak in Fourier spectrum is a good discriminator of coding potential and has been used to detect coding regions [9,14,37,45,49,65-75]....

    [...]

Journal ArticleDOI
TL;DR: The test has been thoroughly proven on 400,000 bases of sequence data: it misclassifies 5% of the regions tested and gives an answer of "No Opinion" one fifth of the time.
Abstract: We give a test for protein coding regions which is based on simple and universal differences between protein-coding and noncoding DNA. The test is simple enough to use without a computer and is completely objective. The test has been thoroughly proven on 400,000 bases of sequence data: it misclassifies 5% of the regions tested and gives an answer of "No Opinion" one fifth of the time. We predict some new coding and noncoding regions in published sequences.

875 citations


"Hierarchical structure of cascade o..." refers methods or result in this paper

  • ...This is in accordance with previous conclusions that the period-3 feature is usually lacking or is weak in noncoding regions [7,9,37,39,41,66]....

    [...]

  • ...The relative height of the corresponding peak in Fourier spectrum is a good discriminator of coding potential and has been used to detect coding regions [9,14,37,45,49,65-75]....

    [...]

Journal ArticleDOI
24 May 1985-Science
TL;DR: This approach has revealed that the distribution of genes, integrated viral sequences, and interspersed repeats is highly nonuniform in the genome, and that the base composition and ratio of CpG to GpC in both coding and noncoding sequences, as well as codon usage, mainly depend on the GC content of the isochores harboring the sequences.
Abstract: Most of the nuclear genome of warm-blooded vertebrates is a mosaic of very long (much greater than 200 kilobases) DNA segments, the isochores; these isochores are fairly homogeneous in base composition and belong to a small number of major classes distinguished by differences in guanine-cytosine (GC) content. The families of DNA molecules derived from such classes can be separated and used to study the genome distribution of any sequence which can be probed. This approach has revealed (i) that the distribution of genes, integrated viral sequences, and interspersed repeats is highly nonuniform in the genome, and (ii) that the base composition and ratio of CpG to GpC in both coding and noncoding sequences, as well as codon usage, mainly depend on the GC content of the isochores harboring the sequences. The compositional compartmentalization of the genome of warm-blooded vertebrates is discussed with respect to its evolutionary origin, its causes, and its effects on chromosome structure and function.

860 citations


"Hierarchical structure of cascade o..." refers background in this paper

  • ...It has been pointed out that the mosaic structure of genome is presumably responsible for long-range correlations [79,85,86]....

    [...]