scispace - formally typeset
Search or ask a question

Showing papers by "Christian J. Michel published in 2019"


Journal ArticleDOI
01 Dec 2019-RNA
TL;DR: It is proposed that error-correcting circular codes represented an important step in the emergence of the modern genetic code, and would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emerged of more sophisticated start codon recognition and translation initiation mechanisms.
Abstract: The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution , we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.

30 citations


Journal ArticleDOI
TL;DR: In a large scale analysis involving complete genomes from four mammals and nine different yeast species, specific evolutionary pressures on the X motifs in the genes of all the genomes are highlighted, and important new properties of X motif conservation at the level of the encoded amino acids are identified.
Abstract: A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arques and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C 3 self-complementary trinucleotide circular code (Arques and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome ( Michel et al., 2017 ). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.

20 citations


Journal ArticleDOI
10 Feb 2019-Life
TL;DR: The distribution of new classes of motifs in genes are studied and the complementarity property involved in the antiparallel (DNA double helix, RNA stem) and parallel sequences could also be fundamental for coding genes with an unambiguous trinucleotide decoding in the two 5′–3′ and 3′–5′ directions or the 5′-3′ direction only.
Abstract: We study the distribution of new classes of motifs in genes, a research field that has not been investigated to date. A single-frame motif SF has no trinucleotide in reading frame (frame 0) that occurs in a shifted frame (frame 1 or 2), e.g., the dicodon AAACAA is SF as the trinucleotides AAA and CAA do not occur in a shifted frame. A motif which is not single-frame SF is multiple-frame MF. Several classes of MF motifs are defined and analysed. The distributions of single-frame SF motifs (associated with an unambiguous trinucleotide decoding in the two 5'⁻3' and 3'⁻5' directions) and 5' unambiguous motifs 5'U (associated with an unambiguous trinucleotide decoding in the 5'⁻3' direction only) are analysed without and with constraints. The constraints studied are: initiation and stop codons, periodic codons AAA,CCC,GGG,TTT, antiparallel complementarity and parallel complementarity. Taken together, these results suggest that the complementarity property involved in the antiparallel (DNA double helix, RNA stem) and parallel sequences could also be fundamental for coding genes with an unambiguous trinucleotide decoding in the two 5'⁻3' and 3'⁻5' directions or the 5'⁻3' direction only. Furthermore, the single-frame motifs SF with a property of trinucleotide decoding and the framing motifs F (also called circular code motifs; first introduced by Michel (2012)) with a property of reading frame decoding may have been involved in the early life genes to build the modern genetic code and the extant genes. They could have been involved in the stage without anticodon-amino acid interactions or in the Implicated Site Nucleotides (ISN) of RNA interacting with the amino acids. Finally, the SF and MF dipeptides associated with the SF and MF dicodons, respectively, are studied and their importance for biology and the origin of life discussed.

11 citations


Journal ArticleDOI
TL;DR: Any maximal dinucleotides circular code of size 6 can be embedded into a maximal mixed (di,tri)-nucleotide circular code such that its trinucleotide component is a maximal C3-comma-free code.
Abstract: By an extensive statistical analysis in genes of bacteria, archaea, eukaryotes, plasmids and viruses, a maximal C3-self-complementary trinucleotide circular code has been found to have the highest average occurrence in the reading frame of the ribosome during translation. Circular codes may play an important role in maintaining the correct reading frame. On the other hand, as several evolutionary theories propose primeval codes based on dinucleotides, trinucleotides and tetranucleotides, mixed circular codes were investigated. By using a graph-theoretical approach of circular codes recently developed, we study mixed circular codes, which are the union of a dinucleotide circular code, a trinucleotide circular code and a tetranucleotide circular code. Maximal mixed circular codes of (di,tri)-nucleotides, (tri,tetra)-nucleotides and (di,tri,tetra)-nucleotides are constructed, respectively. In particular, we show that any maximal dinucleotide circular code of size 6 can be embedded into a maximal mixed (di,tri)-nucleotide circular code such that its trinucleotide component is a maximal C3-comma-free code. The growth function of self-complementary mixed circular codes of dinucleotides and trinucleotides is given. Self-complementary mixed circular codes could have been involved in primitive genetic processes.

7 citations


22 Nov 2019
TL;DR: New methods to study comma-free codes achieving the maximum size, given the cardinality of the alphabet and the length of the words are developed, and a characterisation of-letter non-overlapping codes is provided, which allows for the number of such codes that are not contained in any strictly larger one.
Abstract: Comma-free codes have been widely studied in the last sixty years, from points of view as diverse as biology, information theory and combinatorics. We develop new methods to study comma-free codes achieving the maximum size, given the cardinality of the alphabet and the length of the words. Specifically, we are interested in counting the number of such codes. We provide (two different proofs for) a closed-formula. The approach introduced is further developed to tackle well-known sub-families of comma-free codes, such as self-complementary and (generalisations of) non-overlapping codes. We also study codes that are not contained in strictly larger ones. For instance, we determine the maximal size of self-complementary comma-free codes and the number of codes reaching the bound. We provide a characterisation of-letter non-overlapping codes (over an alphabet of cardinality n), which allows us to devise the number of such codes that are not contained in any strictly larger one. Our approach mixes combinatorial and graph-theoretical arguments.

1 citations