scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Digital signal processing for gene prediction

TL;DR: A harmonic suppression filter and parametric minimum variance spectrum estimation technique for gene prediction and it is shown that both the filtering techniques are able to detect smaller exon regions and adaptive MV filter minimizes the power in introns (non-coding regions) giving more suppression to the intron regions.
Abstract: Identification of gene locations in a DNA sequence is one of the important problems in the area of genomics. Nucleotides in exons of a DNA sequence show f = 1/3 periodicity. The period-3 property in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency-domain methods to predict these regions. Identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. Existing non-parametric filtering techniques are less effective in detecting small exons. This paper presents a harmonic suppression filter and parametric minimum variance spectrum estimation technique for gene prediction. We show that both the filtering techniques are able to detect smaller exon regions and adaptive MV filter minimizes the power in introns (non-coding regions) giving more suppression to the intron regions. Furthermore, 2-simplex mapping is used to reduce the computational complexity.
Citations
More filters
Journal ArticleDOI
TL;DR: This work provides an accessible introduction and comparative review of DSP methods for the identification of protein-coding regions by breaking down the approaches into four steps, and suggests new combinations that may be worthy of future study.
Abstract: The identification of regions of DNA sequences that code for proteins is one of the most fundamental applications in bioinformatics. These protein-coding regions are in contrast to other DNA regions that encode functional RNA molecules, provide structural stability of chromosomes, serve as genetic raw materials, represent molecular fossils, or have no known purpose (sometimes called “junk DNA”). A number of approaches have been suggested for differentiating between the protein-coding and non-protein-coding regions of DNA. A selection of these approaches is based on digital signal processing (DSP) techniques. These DSP techniques rely on the phenomenon that protein-coding regions have a prominent power spectrum peak at frequency f = ⅓ arising from the length of codons (three nucleic acids). This article partitions the identification of protein-coding regions into four discrete steps. Based on this partitioning, DSP techniques can be easily described and compared based on their unique implementatio...

75 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: Performance of anti-notch filter with cascaded lattice structure and harmonic suppressor with comb filter have been compared here for identification of coding regions of C-elegan F56F11.4a chromosome.
Abstract: Gene prediction is an important topic in genomic research Various techniques are in use for identification of protein coding regions in genes Application of Fourier technique is one of the most popular methods of gene prediction, in which prediction algorithm is based on period-3 property of DNA where it exhibits a prominent peak in coding region Spectrum estimation by Fourier method generates various harmonics, generally known as 1/f noise along with sharp peaks, which may lead to false prediction of coding regions Researchers used various parametric and non-parametric filters to tackle this problem and improve the accuracy of prediction Performance of anti-notch filter with cascaded lattice structure on one hand and harmonic suppressor with comb filter on the other hand have been compared here for identification of coding regions of C-elegan F56F114a chromosome The authors have analyzed the performance in terms of standard deviation and signal-to-noise ratio A Matlab simulink environment has been used for filter realization and performance analysis

5 citations


Cites methods from "Digital signal processing for gene ..."

  • ...Tomar et al [11] used harmonic suppression and adaptive MV filters for minimization of power in intron regions and was able to detect smaller exons....

    [...]

Proceedings ArticleDOI
01 Dec 2019
TL;DR: In this paper, an adaptive short time Fourier transform (ASTFT), period-3 measure, and principal component analysis (PCA) based model independent method for the acceptor site prediction has been proposed.
Abstract: Signal Processing plays a very important role in the annotation of genome data. It helps to find out different structural features present in the DNA sequences like exonic regions, intronic regions, untranslated regions, promoter regions, CpG islands, etc. The detection of the exonic regions (protein coding regions) is very important for accurate gene prediction. The accurate identification of exonic regions is associated with the prediction of acceptor and donor splice sites. In this work, adaptive short time Fourier transform (ASTFT), period-3 measure, and principal component analysis (PCA) based model independent method for the acceptor site prediction has been proposed. The performance of the proposed method has been compared with the windowed discrete Fourier transform (WDFT).

3 citations


Cites methods from "Digital signal processing for gene ..."

  • ...developed a model independent method based on windowed discrete Fourier transform (WDFT) [2], for the prediction of ASS using period-3 feature [9],[10],[11],[12],[13], [14],[15],[16] present in the exonic regions....

    [...]

01 Dec 2013
TL;DR: In this paper, a novel approach was applied by combining Principal Component Analysis with Minimum Variance Estimator for effective gene prediction, which reduced the dimension of the data set by projecting the raw data onto a few prominent eigenvectors with large eigenvalues.
Abstract: The problem under consideration concerns information extraction from eukaryotic DNA sequences regarding existence of protein coding regions. Spectral Analysis using classical Fourier Transform techniques such as Discrete Fourier Transform (DFT) has long been used for this purpose with the help of period-3 peaks. Since this method has low Signal to Noise Ratio (SNR), the spectral peaks are difficult to distinguish in the background of noise. Researchers have designed various types of filters to suppress this noise so that the period-3 peaks are revealed prominently. In this article, a novel approach was applied by combining Principal Component Analysis with Minimum Variance Estimator for effective gene prediction. Here PSD of DNA sequence has been estimated using Minimum Variance method in which noise reduction has been accomplished by Principal Component Analysis of correlation matrix. In the process, the dimension of the data-set was reduced by projecting the raw data onto a few prominent eigenvectors with large eigenvalues. The resulting reduced-rank approximation to correlation matrix was then used for spectrum estimation. The results were compared with those of Blackman-Tukey Power Spectrum Estimator which is a modified form of Periodogram method. Eukaryotic genes from various organisms taken from NCBI Genbank have been used as test samples. A single sequence mapping method comprising real and imaginary values towards nucleotide bases was employed. The superiority of PCA based Minimum Variance method over Blackman-Tukey method was established with the help of spectral plots in perspective of both resolutuion and quality factor.

2 citations

Journal ArticleDOI
TL;DR: In this work, the software module has been implemented using MATLAB 2009a which supports bioin formatics toolbox and the DSP techniques such as Fast Fourier Transform (FFT) and Hamming window are incorporated in the algorithm.
Abstract: Digital Signal Processing (DSP) applications in bio informatics have received great attention in recent years, where new effective methods for genomic sequence analysis, such as the detection of coding regions, have been developed. R heumatic Arthritis (RA) is a chronic systemic inflammatory disease involving pri marily the peripheral synovial joints. In this work , the software module has been implemented using MATLAB 2009a which supports bioin formatics toolbox. The DSP techniques such as Fast Fourier Transform (FFT) and Hamming window are incorporated in the al gorithm. Quantitative analysis is performed using m ean amplitude and mean normalized frequency parameters computed from the gpower spectrum. The algorithm is tested fo r different normal and abnormal DNA sequences available in National center of Biotechnology Information (NCBI) database.

2 citations

References
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
01 Aug 1969
TL;DR: In this article, a high-resolution frequency-wavenumber power spectral density estimation method was proposed, which employs a wavenumber window whose shape changes and is a function of the wave height at which an estimate is obtained.
Abstract: The output of an array of sansors is considered to be a homogeneous random field. In this case there is a spectral representation for this field, similar to that for stationary random processes, which consists of a superposition of traveling waves. The frequency-wavenumber power spectral density provides the mean-square value for the amplitudes of these waves and is of considerable importance in the analysis of propagating waves by means of an array of sensors. The conventional method of frequency-wavenumber power spectral density estimation uses a fixed-wavenumber window and its resolution is determined essentially by the beam pattern of the array of sensors. A high-resolution method of estimation is introduced which employs a wavenumber window whose shape changes and is a function of the wavenumber at which an estimate is obtained. It is shown that the wavenumber resolution of this method is considerably better than that of the conventional method. Application of these results is given to seismic data obtained from the large aperture seismic array located in eastern Montana. In addition, the application of the high-resolution method to other areas, such as radar, sonar, and radio astronomy, is indicated.

5,415 citations


"Digital signal processing for gene ..." refers methods in this paper

  • ...In this section, we develop the Minimum Variance (MV) method of spectrum estimation, which is an adaptation of the Maximum Likelihood Method (MLM) developed by Capon for the analysis of twodimensional power spectral densities [23]....

    [...]

Book
01 Jan 1997
TL;DR: In this paper, the authors introduce suffix trees and their use in sequence alignment, core string edits, alignments and dynamic programming, and extend the core problems to extend the main problems.
Abstract: Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suffix trees Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology 11. Core string edits, alignments and dynamic programming 12. Refining core string edits and alignments 13. Extending the core problems 14. Multiple string comparison: the Holy Grail 15. Sequence database and their uses: the motherlode Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings 17. Strings and evolutionary trees 18. Three short topics 19. Models of genome-level mutations.

3,904 citations

Book
19 Apr 1996
TL;DR: The main thrust is to provide students with a solid understanding of a number of important and related advanced topics in digital signal processing such as Wiener filters, power spectrum estimation, signal modeling and adaptive filtering.
Abstract: From the Publisher: The main thrust is to provide students with a solid understanding of a number of important and related advanced topics in digital signal processing such as Wiener filters, power spectrum estimation, signal modeling and adaptive filtering. Scores of worked examples illustrate fine points, compare techniques and algorithms and facilitate comprehension of fundamental concepts. Also features an abundance of interesting and challenging problems at the end of every chapter.

2,549 citations


"Digital signal processing for gene ..." refers background in this paper

  • ...trix of exponential vector e, Rx is the p p autocorrelation toeplitz matrix of the samples in the current window and g is the impulse response of the Minimum Variance filter with band-pass frequency ! = 2= 3. Refer to [ 22 ] for detailed derivation....

    [...]