scispace - formally typeset
Search or ask a question
Author

Changchuan Yin

Other affiliations: University of Phoenix
Bio: Changchuan Yin is an academic researcher from University of Illinois at Chicago. The author has contributed to research in topics: Multiple sequence alignment & Discrete Fourier transform. The author has an hindex of 12, co-authored 20 publications receiving 630 citations. Previous affiliations of Changchuan Yin include University of Phoenix.

Papers
More filters
Journal ArticleDOI
TL;DR: A new method to predict protein coding regions is developed based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature.

169 citations

Journal ArticleDOI
TL;DR: The results presented in this paper provide an efficient way to compute the Fourier power spectrum at N/3 and the noise signal in gene-finding methods by calculating the nucleotide distributions in the three codon positions.
Abstract: The 3-base periodicity, identified as a pronounced peak at the frequency N/3 (N is the length of the DNA sequence) of the Fourier power spectrum of protein coding regions, is used as a marker in gene-finding algorithms to distinguish protein coding regions (exons) and noncoding regions (introns) of genomes. In this paper, we reveal the explanation of this phenomenon which results from a nonuniform distribution of nucleotides in the three coding positions. There is a linear correlation between the nucleotide distributions in the three codon positions and the power spectrum at the frequency N/3. Furthermore, this study indicates the relationship between the length of a DNA sequence and the variance of nucleotide distributions and the average Fourier power spectrum, which is the noise signal in gene-finding methods. The results presented in this paper provide an efficient way to compute the Fourier power spectrum at N/3 and the noise signal in gene-finding methods by calculating the nucleotide distributions ...

89 citations

Journal ArticleDOI
TL;DR: This work proposes a new alignment-free similarity measure of DNA sequences using the Discrete Fourier Transform (DFT), and assesses the accuracy of the similarity metric in hierarchical clustering using simulated DNA and virus sequences.

72 citations

Journal ArticleDOI
TL;DR: Experimental results on various datasets show that the proposed clustering method provides an efficient tool to classify genes and genomes and is remarkably faster than other multiple sequence alignment and alignment-free methods.

71 citations

Journal ArticleDOI
TL;DR: The breakthrough of the subject is that the moment vectors from DNA sequences are constructed using this new graphical method and it is proved that the correspondence between moment vectors and DNA sequences is one-to-one.
Abstract: A genome space is a moduli space of genomes. In this space, each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Currently, there is no method to represent genomes by a point in a space without losing biological information. Here, we propose a new graphical representation for DNA sequences. The breakthrough of the subject is that we can construct the moment vectors from DNA sequences using this new graphical method and prove that the correspondence between moment vectors and DNA sequences is one-to-one. Using these moment vectors, we have constructed a novel genome space as a subspace in R N . It allows us to show that the SARS-CoV is most closely related to a coronavirus from the palm civet not from a bird as initially suspected, and the newly discovered human coronavirus HCoV-HKU1 is more closely related to SARS than to any other known member of group 2 coronavirus. Furthermore, we reconstructed the phylogenetic tree for 34 lentiviruses (including human immunodeficiency virus) based on their whole genome sequences. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.

66 citations


Cited by
More filters
Book
01 Jan 2002
TL;DR: In this paper, the value of the variable in each equation is determined by a linear combination of the values of the variables in the equation and the variable's value in the solution.
Abstract: Determine the value of the variable in each equation.

635 citations

Journal Article
TL;DR: In this article, the authors discuss evidence that copy-number variants affect phenotypes, directions for basic knowledge to support clinical study of CNVs, the challenge of genotyping CNPs in clinical cohorts, the use of SNPs as markers for CNPs and statistical challenges in testing CNVs for association with disease.
Abstract: The central goal of human genetics is to understand the inherited basis of human variation in phenotypes, elucidating human physiology, evolution and disease. Rare mutations have been found underlying two thousand mendelian diseases; more recently, it has become possible to assess systematically the contribution of common SNPs to complex disease. The known role of copy-number alterations in sporadic genomic disorders, combined with emerging information about inherited copy-number variation, indicate the importance of systematically assessing copy-number variants (CNVs), including common copy-number polymorphisms (CNPs), in disease. Here we discuss evidence that CNVs affect phenotypes, directions for basic knowledge to support clinical study of CNVs, the challenge of genotyping CNPs in clinical cohorts, the use of SNPs as markers for CNPs and statistical challenges in testing CNVs for association with disease. Critical needs are high-resolution maps of common CNPs and techniques that accurately determine the allelic state of affected individuals.

583 citations

Journal ArticleDOI
TL;DR: This work provides a guide to the currently available alignment-free sequence analysis tools and addresses questions about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research.
Abstract: Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.

367 citations

Journal Article
TL;DR: Better discrimination between exon areas and non-coding areas of a number of genomes when the sequences are mapped to EIIP indicator sequences and the power spectra of the same are taken in a sliding Kaiser window, compared to the existing method using a rectangular window which utilizes binary indicator sequences.
Abstract: In this paper, a revision for the existing method of locating exons by genomic signal processing technique employing four binary indicator sequences is presented. The existing method relies on the pronounced period three peaks observed in the Fourier power spectrum of the exon regions which are absent in non-coding regions. The authors have abandoned the four sequences all together and adopted a single 'EIIP indicator sequence' which is formed by substituting the electron-ion interaction pseudopotentials (EIIP) of the nucleotides A, G, C and T in the DNA sequence, reducing the computational overhead by 75%. The power spectrum of this sequence reveals period three peaks for exon regions. Also a number of exons have been identified which exhibit period three peaks when mapped to 'EIIP indicator sequence' and which do not show the same when the binary indicator sequences are employed. We could get better discrimination between exon areas and non-coding areas of a number of genomes when the sequences are mapped to EIIP indicator sequences and the power spectra of the same are taken in a sliding Kaiser window, compared to the existing method using a rectangular window which utilizes binary indicator sequences.

186 citations

Journal ArticleDOI
TL;DR: A new method to predict protein coding regions is developed based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature.

169 citations