scispace - formally typeset
Search or ask a question

Showing papers by "Christian J. Michel published in 1997"


Journal ArticleDOI
TL;DR: The code X0(MIT) has four important properties: a length of the minimal window to automatically retrieve frame 0 which is equal to five nucleotides; an occurrence probability equal to 6.3 x 10(-5); a low frequency (12% in average) of misplaced trinucleotides in the shifted frames; and an occurrence of four types of nucleotide in the first and second trin DNA sites but no nucleotide G in the third trin nucleotide site.

36 citations


Journal ArticleDOI
TL;DR: A strong correlation between the usage of the trinucleotides of T0 in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by T0, have as expected the lowest frequencies in tissues of both prokaryotes and eukaryotes.
Abstract: A statistical analysis with 12,288 autocorrelation functions applied in protein (coding) genes of prokaryotes and eukaryotes identifies three subsets of trinucleotides in their three frames: T0 = X0 [symbol: see text] {AAA, TTT} with X0 = {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} in frame 0 (the reading frame established by the ATG start trinucleotide), T1 = X1 [symbol: see text] {CCC} in frame 1 and T2 = X2 [symbol: see text] {GGG} in frame 2 (the frames 1 and 2 being the frame 0 shifted by one and two nucleotides, respectively, to the right). These three subsets are identical in these two gene populations and have five important properties: (i) the property of maximal (20 trinucleotides) circular code for X0 (resp. X1, X2) allowing to retrieve automatically the frame 0 (resp. 1, 2) in any region of the gene without start codon; (ii) the DNA complementarity property C (e.g. C(AAC) = GTT): C(T0) = T0, C(T1) = T2 and C(T2) = T1 allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation property P (e.g. P(AAC) = ACA): P(X0) = X1 and P(X1) = X2 implying that the two subsets X1 and X2 can be deduced from X0; (iv) the rarity property with an occurrence probability of X0 = 6 x 10(-8); and (v) the concatenation properties in favour of an evolutionary code: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to retrieve automatically the frame and an occurrence of the four types of nucleotides in the three trinucleotide sites. In Discussion, a simulation based on an independent mixing of the trinucleotides of T0 allows to retrieve the two subsets T1 and T2. Then, the identified subsets T0, T1 and T2 replaced in the 2-letter genetic alphabet {R, Y} (R = purine = A or G, Y = pyrimidine = C or T) allow to retrieve the RNY model (N = R or Y) and to explain previous works in the alphabet {R, Y}. Then, these three subsets are related to the genetic code. The trinucleotides of T0 code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr and Val. Finally, a strong correlation between the usage of the trinucleotides of T0 in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded by T0, have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.

31 citations


Journal ArticleDOI
TL;DR: A quantitative study of three subsets X0, X1 and X2 in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences, related to a new property of the C3 code X0 involving substitutions.

11 citations