scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2011"


Journal ArticleDOI
05 Aug 2011-Cell
TL;DR: It is proposed that this "competing endogenous RNA" (ceRNA) activity forms a large-scale regulatory network across the transcriptome, greatly expanding the functional genetic information in the human genome and playing important roles in pathological conditions, such as cancer.

5,334 citations


Journal ArticleDOI
TL;DR: The homologous nature of the OBP and CSP gene families, dating back their most recent common ancestor after the terrestrialization of Arthropoda (380--450 Ma) are suggested and a scenario for the origin and diversification of these families is proposed.
Abstract: Chemoreception is a biological process essential for the survival of animals, as it allows the recognition of important volatile cues for the detection of food, egg-laying substrates, mates, or predators, among other purposes. Furthermore, its role in pheromone detection may contribute to evolutionary processes, such as reproductive isolation and speciation. This key role in several vital biological processes makes chemoreception a particularly interesting system for studying the role of natural selection in molecular adaptation. Two major gene families are involved in the perireceptor events of the chemosensory system: the odorantbinding protein (OBP) and chemosensory protein (CSP) families. Here, we have conducted an exhaustive comparative genomic analysis of these gene families in 20 Arthropoda species. We show that the evolution of the OBP and CSP gene families is highly dynamic, with a high number of gains and losses of genes, pseudogenes, and independent origins of subfamilies. Taken together, our data clearly support the birth-and-death model for the evolution of these gene families with an overall high gene turnover rate. Moreover, we show that the genome organization of the two families is significantly more clustered than expected by chance and, more important, that this pattern appears to be actively maintained across the Drosophila phylogeny. Finally, we suggest the homologous nature of the OBP and CSP gene families, dating back their most recent common ancestor after the terrestrialization of Arthropoda (380‐450 Ma) and we propose a scenario for the origin and diversification of these families.

380 citations


Journal ArticleDOI
01 May 2011-RNA
TL;DR: The ways in which Pseudogenes exert their effect on coding genes are described and the role of pseudogenes in the increasingly complex web of noncoding RNA that contributes to normal cellular regulation is explored.
Abstract: Pseudogenes have long been labeled as ‘‘junk’’ DNA, failed copies of genes that arise during the evolution of genomes. However, recent results are challenging this moniker; indeed, some pseudogenes appear to harbor the potential to regulate their protein-coding cousins. Far from being silent relics, many pseudogenes are transcribed into RNA, some exhibiting a tissuespecific pattern of activation. Pseudogene transcripts can be processed into short interfering RNAs that regulate coding genes through the RNAi pathway. In another remarkable discovery, it has been shown that pseudogenes are capable of regulating tumor suppressors and oncogenes by acting as microRNA decoys. The finding that pseudogenes are often deregulated during cancer progression warrants further investigation into the true extent of pseudogene function. In this review, we describe the ways in which pseudogenes exert their effect on coding genes and explore the role of pseudogenes in the increasingly complex web of noncoding RNA that contributes to normal cellular regulation.

366 citations


Journal ArticleDOI
TL;DR: The update of ALDH genes in several recently sequenced vertebrates is provided and the associated records found in the National Center for Biotechnology Information (NCBI) gene database are clarified, highlighting where and when likely gene-duplication and gene-loss events have occurred.
Abstract: Members of the aldehyde dehydrogenase gene (ALDH) superfamily play an important role in the enzymic detoxification of endogenous and exogenous aldehydes and in the formation of molecules that are important in cellular processes, like retinoic acid, betaine and gamma-aminobutyric acid. ALDHs exhibit additional, non-enzymic functions, including the capacity to bind to some hormones and other small molecules and to diminish the effects of ultraviolet irradiation in the cornea. Mutations in ALDH genes leading to defective aldehyde metabolism are the molecular basis of several diseases, including gamma-hydroxybutyric aciduria, pyridoxine-dependent seizures, Sjogren-Larsson syndrome and type II hyperprolinaemia. Interestingly, several ALDH enzymes appear to be markers for normal and cancer stem cells. The superfamily is evolutionarily ancient and is represented within Archaea, Eubacteria and Eukarya taxa. Recent improvements in DNA and protein sequencing have led to the identification of many new ALDH family members. To date, the human genome contains 19 known ALDH genes, as well as many pseudogenes. Whole-genome sequencing allows for comparison of the entire complement of ALDH family members among organisms. This paper provides an update of ALDH genes in several recently sequenced vertebrates and aims to clarify the associated records found in the National Center for Biotechnology Information (NCBI) gene database. It also highlights where and when likely gene-duplication and gene-loss events have occurred. This information should be useful to future studies that might wish to compare the role of ALDH members among species and how the gene superfamily as a whole has changed throughout evolution.

290 citations


Journal ArticleDOI
TL;DR: Gene networks involved in generating key differences between the queen and worker castes show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor.
Abstract: We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.

266 citations


Journal ArticleDOI
TL;DR: Genomics reveal extreme genome reduction and massive gene loss in highly vertebrate‐pathogenic Rickettsia compared to less virulent or endosymbiotic species, challenges traditional concepts of pathogenesis that focused primarily on the acquisition of virulence factors.
Abstract: Rickettsia are best known as strictly intracellular vector-borne bacteria that cause mild to severe diseases in humans and other animals. Recent advances in molecular tools and biological experiments have unveiled a wide diversity of Rickettsia spp. that include species with a broad host range and some species that act as endosymbiotic associates. Molecular phylogenies of Rickettsia spp. contain some ambiguities, such as the position of R. canadensis and relationships within the spotted fever group. In the modern era of genomics, with an ever-increasing number of sequenced genomes, there is enhanced interest in the use of whole-genome sequences to understand pathogenesis and assess evolutionary relationships among rickettsial species. Rickettsia have small genomes (1.1-1.5 Mb) as a result of reductive evolution. These genomes contain split genes, gene remnants and pseudogenes that, owing to the colinearity of some rickettsial genomes, may represent different steps of the genome degradation process. Genomics reveal extreme genome reduction and massive gene loss in highly vertebrate-pathogenic Rickettsia compared to less virulent or endosymbiotic species. Information gleaned from rickettsial genomics challenges traditional concepts of pathogenesis that focused primarily on the acquisition of virulence factors. Another intriguing phenomenon about the reduced rickettsial genomes concerns the large fraction of non-coding DNA and possible functionality of these "non-coding" sequences, because of the high conservation of these regions. Despite genome streamlining, Rickettsia spp. contain gene families, selfish DNA, repeat palindromic elements and genes encoding eukaryotic-like motifs. These features participate in sequence and functional diversity and may play a crucial role in adaptation to the host cell and pathogenesis. Genome analyses have identified a large fraction of mobile genetic elements, including plasmids, suggesting the possibility of lateral gene transfer in these intracellular bacteria. Phylogenetic analyses have identified several candidates for horizontal gene acquisition among Rickettsia spp. including tra, pat2, and genes encoding for the type IV secretion system and ATP/ADP translocase that may have been acquired from bacteria living in amoebae. Gene loss, gene duplication, DNA repeats and lateral gene transfer all have shaped rickettsial genome evolution. A comprehensive analysis of the entire genome, including genes and non-coding DNA, will help to unlock the mysteries of rickettsial evolution and pathogenesis.

235 citations


Journal ArticleDOI
TL;DR: This work uses survey sequence to examine the genic content of hexaploid wheat group 1 chromosomes, in comparison with barley, and other model grass genomes, finding that wheat and barley accumulate dramatically more nonsyntenic genes, many of which appear to be pseudogenes.
Abstract: All six arms of the group 1 chromosomes of hexaploid wheat (Triticum aestivum) were sequenced with Roche/454 to 1.3- to 2.2-fold coverage and compared with similar data sets from the homoeologous chromosome 1H of barley (Hordeum vulgare). Six to ten thousand gene sequences were sampled per chromosome. These were classified into genes that have their closest homologs in the Triticeae group 1 syntenic region in Brachypodium, rice (Oryza sativa), and/or sorghum (Sorghum bicolor) and genes that have their homologs elsewhere in these model grass genomes. Although the number of syntenic genes was similar between the homologous groups, the amount of nonsyntenic genes was found to be extremely diverse between wheat and barley and even between wheat subgenomes. Besides a small core group of genes that are nonsyntenic in other grasses but conserved among Triticeae, we found thousands of genic sequences that are specific to chromosomes of one single species or subgenome. By examining in detail 50 genes from chromosome 1H for which BAC sequences were available, we found that many represent pseudogenes that resulted from transposable element activity and double-strand break repair. Thus, Triticeae seem to accumulate nonsyntenic genes frequently. Since many of them are likely to be pseudogenes, total gene numbers in Triticeae are prone to pronounced overestimates.

207 citations


Journal ArticleDOI
TL;DR: The S. symbiotica genome provides a rare opportunity to study genome evolution in a recently derived heritable symbiont and exhibits several of the hallmarks of genome evolution observed in more ancient symbionts, including elevated rates of evolution and reduction in genome size.
Abstract: All vertically transmitted bacterial symbionts undergo a process of genome reduction over time, resulting in tiny, gene-dense genomes. Comparison of genomes of ancient bacterial symbionts gives only limited information about the early stages in the transition from a free-living to symbiotic lifestyle because many changes become obscured over time. Here, we present the genome sequence for the recently evolved aphid symbiont Serratia symbiotica. The S. symbiotica genome exhibits several of the hallmarks of genome evolution observed in more ancient symbionts, including elevated rates of evolution and reduction in genome size. The genome also shows evidence for massive genomic decay compared with free-living relatives in the same genus of bacteria, including large deletions, many pseudogenes, and a slew of rearrangements, perhaps promoted by mobile DNA. Annotation of pseudogenes allowed examination of the past and current metabolic capabilities of S. symbiotica and revealed a somewhat random process of gene inactivation with respect to function. Analysis of mutational patterns showed that deletions are more common in neutral DNA. The S. symbiotica genome provides a rare opportunity to study genome evolution in a recently derived heritable symbiont.

191 citations


Journal ArticleDOI
TL;DR: Most of the classical small ncRNA genes have now been provided with a unique nomenclature, and work on naming the long (> 200 nucleotides) non-coding RNAs (lncRNAs) is ongoing.
Abstract: Previously, the majority of the human genome was thought to be 'junk' DNA with no functional purpose. Over the past decade, the field of RNA research has rapidly expanded, with a concomitant increase in the number of non-protein coding RNA (ncRNA) genes identified in this 'junk'. Many of the encoded ncRNAs have already been shown to be essential for a variety of vital functions, and this wealth of annotated human ncRNAs requires standardised naming in order to aid effective communication. The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 30,000 approved gene symbols currently listed in the HGNC database (http://www.genenames.org/search), the majority represent protein-coding genes; however, they also include pseudogenes, phenotypic loci and some genomic features. In recent years the list has also increased to include almost 3,000 named human ncRNA genes. HGNC is actively engaging with the RNA research community in order to provide unique symbols and names for each sequence that encodes an ncRNA. Most of the classical small ncRNA genes have now been provided with a unique nomenclature, and work on naming the long (> 200 nucleotides) non-coding RNAs (lncRNAs) is ongoing.

181 citations


Journal ArticleDOI
TL;DR: Combined phenotypic and expression analysis indicated that, whereas 5AQ plays a major role in conferring domestication-related traits, 5Dq contributes directly and 5Bq indirectly to suppression of the speltoid phenotype, all contributing to the domestication traits.
Abstract: The Q gene encodes an AP2-like transcription factor that played an important role in domestication of polyploid wheat. The chromosome 5A Q alleles (5AQ and 5Aq) have been well studied, but much less is known about the q alleles on wheat homoeologous chromosomes 5B (5Bq) and 5D (5Dq). We investigated the organization, evolution, and function of the Q/q homoeoalleles in hexaploid wheat (Triticum aestivum L.). Q/q gene sequences are highly conserved within and among the A, B, and D genomes of hexaploid wheat, the A and B genomes of tetraploid wheat, and the A, S, and D genomes of the diploid progenitors, but the intergenic regions of the Q/q locus are highly divergent among homoeologous genomes. Duplication of the q gene 5.8 Mya was likely followed by selective loss of one of the copies from the A genome progenitor and the other copy from the B, D, and S genomes. A recent V(329)-to-I mutation in the A lineage is correlated with the Q phenotype. The 5Bq homoeoalleles became a pseudogene after allotetraploidization. Expression analysis indicated that the homoeoalleles are coregulated in a complex manner. Combined phenotypic and expression analysis indicated that, whereas 5AQ plays a major role in conferring domestication-related traits, 5Dq contributes directly and 5Bq indirectly to suppression of the speltoid phenotype. The evolution of the Q/q loci in polyploid wheat resulted in the hyperfunctionalization of 5AQ, pseudogenization of 5Bq, and subfunctionalization of 5Dq, all contributing to the domestication traits.

150 citations


Journal ArticleDOI
TL;DR: This paper reports on the genome sequence of a second species of Erodium, E. carvifolium, representing the second major clade (clade II) in the phylogeny of this genus, and describes the most recent loss of functional ndh genes among photosynthetic seed plants and the second such loss among angiosperms.
Abstract: Plastid genomes in the flowering plant family Geraniaceae are known to be highly rearranged based on complete sequences representing the four major genera Erodium, Geranium, Monsonia, and Pelargonium. In this paper we report on the genome sequence of a second species of Erodium, E. carvifolium, representing the second major clade (clade II) in the phylogeny of this genus. Comparison of this genome sequence to the previously published sequence of E. texanum from clade I demonstrates that the plastid genomes of these two species encode the same number of proteins but differ greatly in their relative degree of rearrangement; 14 kb of additional sequence in E. texanum contains complex repeats associated with rearrangement endpoints, whereas the plastid genome of E. carvifolium is streamlined at 116 kb and displays no unique alterations in gene order. Furthermore, these species from both major clades of Erodium contain intact NADH dehydrogenase (ndh) genes, but the 11 ndh genes are represented as pseudogenes in a small clade of 13 species. It is unclear whether plastid-encoded ndh genes have been lost entirely or functionally transferred to the nucleus. This is the third report of the absence of functional ndh genes, and the current study describes the most recent loss of these genes among photosynthetic seed plants and the second such loss among angiosperms. The other ndh losses from Pinaceae/Gnetales and Orchidaceae are much more ancient. Comparative biochemistry between Erodium species with and without plastid-encoded ndh genes may elucidate changes in photosynthetic function and the role of the Ndh complex.

Journal ArticleDOI
TL;DR: A family of 22 telomere-associated lncRNAs in P. falciparum are characterized, and it is found that homologous lncRNA-TARE loci are coordinately expressed after parasite DNA replication, and are poised to play an important role in P.'s telomeres maintenance, virulence gene regulation, and potentially other processes of parasite chromosome end biology.
Abstract: Mounting evidence suggests a major role for epigenetic feedback in Plasmodium falciparum transcriptional regulation. Long non-coding RNAs (lncRNAs) have recently emerged as a new paradigm in epigenetic remodeling. We therefore set out to investigate putative roles for lncRNAs in P. falciparum transcriptional regulation. We used a high-resolution DNA tiling microarray to survey transcriptional activity across 22.6% of the P. falciparum strain 3D7 genome. We identified 872 protein-coding genes and 60 putative P. falciparum lncRNAs under developmental regulation during the parasite's pathogenic human blood stage. Further characterization of lncRNA candidates led to the discovery of an intriguing family of lncRNA telomere-associated repetitive element transcripts, termed lncRNA-TARE. We have quantified lncRNA-TARE expression at 15 distinct chromosome ends and mapped putative transcriptional start and termination sites of lncRNA-TARE loci. Remarkably, we observed coordinated and stage-specific expression of lncRNA-TARE on all chromosome ends tested, and two dominant transcripts of approximately 1.5 kb and 3.1 kb transcribed towards the telomere. We have characterized a family of 22 telomere-associated lncRNAs in P. falciparum. Homologous lncRNA-TARE loci are coordinately expressed after parasite DNA replication, and are poised to play an important role in P. falciparum telomere maintenance, virulence gene regulation, and potentially other processes of parasite chromosome end biology. Further study of lncRNA-TARE and other promising lncRNA candidates may provide mechanistic insight into P. falciparum transcriptional regulation.

Journal ArticleDOI
TL;DR: A PCR-based method using unique regions in the human mitochondrial genome not duplicated in the nuclear genome and template treatment to remove dilution bias, to accurately quantify MtDNA from human samples is described.

Journal ArticleDOI
28 Jan 2011-PLOS ONE
TL;DR: It appears that the FBX superfamily has independently undergone substantial birth/death in many plant lineages, with its size and rapid evolution potentially reflecting a central role for ubiquitylation in driving plant fitness.
Abstract: The emergence of multigene families has been hypothesized as a major contributor to the evolution of complex traits and speciation. To help understand how such multigene families arose and diverged during plant evolution, we examined the phylogenetic relationships of F-Box (FBX) genes, one of the largest and most polymorphic superfamilies known in the plant kingdom. FBX proteins comprise the target recognition subunit of SCF-type ubiquitin-protein ligases, where they individually recruit specific substrates for ubiquitylation. Through the extensive analysis of 10,811 FBX loci from 18 plant species, ranging from the alga Chlamydomonas reinhardtii to numerous monocots and eudicots, we discovered strikingly diverse evolutionary histories. The number of FBX loci varies widely and appears independent of the growth habit and life cycle of land plants, with a little as 198 predicted for Carica papaya to as many as 1350 predicted for Arabidopsis lyrata. This number differs substantially even among closely related species, with evidence for extensive gains/losses. Despite this extraordinary inter-species variation, one subset of FBX genes was conserved among most species examined. Together with evidence of strong purifying selection and expression, the ligases synthesized from these conserved loci likely direct essential ubiquitylation events. Another subset was much more lineage specific, showed more relaxed purifying selection, and was enriched in loci with little or no evidence of expression, suggesting that they either control more limited, species-specific processes or arose from genomic drift and thus may provide reservoirs for evolutionary innovation. Numerous FBX loci were also predicted to be pseudogenes with their numbers tightly correlated with the total number of FBX genes in each species. Taken together, it appears that the FBX superfamily has independently undergone substantial birth/death in many plant lineages, with its size and rapid evolution potentially reflecting a central role for ubiquitylation in driving plant fitness.

Journal ArticleDOI
TL;DR: A novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time is presented.
Abstract: Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2–derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).

Journal ArticleDOI
03 Oct 2011-Mycology
TL;DR: The application of the latest technologies and tools for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes are highlighted to improve the quality of predicted gene sets.
Abstract: Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.

Journal ArticleDOI
TL;DR: Comparative genomics of four different strains revealed remarkable conservation of the genome yet uncovered 215 polymorphic sites, mainly single nucleotide polymorphisms, and a handful of new pseudogenes, which helped retrace the evolution of M. leprae.
Abstract: Leprosy, which has afflicted human populations for millenia, results from infection with Mycobacterium leprae, an unculturable pathogen with an exceptionally long generation time. Considerable insight into the biology and drug resistance of the leprosy bacillus has been obtained from genomics. M. leprae has undergone reductive evolution and pseudogenes now occupy half of its genome. Comparative genomics of four different strains revealed remarkable conservation of the genome (99.995% identity) yet uncovered 215 polymorphic sites, mainly single nucleotide polymorphisms, and a handful of new pseudogenes. Mapping these polymorphisms in a large panel of strains defined 16 single nucleotide polymorphism-subtypes that showed strong geographical associations and helped retrace the evolution of M. leprae.

Journal ArticleDOI
20 Jun 2011-PLOS ONE
TL;DR: Deletion-coupled insertions show that Hqr.
Abstract: Background Haloquadratum walsbyi commonly dominates the microbial flora of hypersaline waters. Its cells are extremely fragile squares requiring >14%(w/v) salt for growth, properties that should limit its dispersal and promote geographical isolation and divergence. To assess this, the genome sequences of two isolates recovered from sites at near maximum distance on Earth, were compared. Principal Findings Both chromosomes are 3.1 MB in size, and 84% of each sequence was highly similar to the other (98.6% identity), comprising the core sequence. ORFs of this shared sequence were completely synteneic (conserved in genomic orientation and order), without inversion or rearrangement. Strain-specific insertions/deletions could be precisely mapped, often allowing the genetic events to be inferred. Many inferred deletions were associated with short direct repeats (4–20 bp). Deletion-coupled insertions are frequent, producing different sequences at identical positions. In cases where the inserted and deleted sequences are homologous, this leads to variant genes in a common synteneic background (as already described by others). Cas/CRISPR systems are present in C23T but have been lost in HBSQ001 except for a few spacer remnants. Numerous types of mobile genetic elements occur in both strains, most of which appear to be active, and with some specifically targetting others. Strain C23T carries two ∼6 kb plasmids that show similarity to halovirus His1 and to sequences nearby halovirus/plasmid gene clusters commonly found in haloarchaea. Conclusions Deletion-coupled insertions show that Hqr. walsbyi evolves by uptake and precise integration of foreign DNA, probably originating from close relatives. Change is also driven by mobile genetic elements but these do not by themselves explain the atypically low gene coding density found in this species. The remarkable genome conservation despite the presence of active systems for genome rearrangement implies both an efficient global dispersal system, and a high selective fitness for this species.

Journal ArticleDOI
TL;DR: The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications.
Abstract: Background: Streptococcus thermophilus represents the only species among the streptococci that has “Generally Regarded As Safe” status and that plays an economically important role in the fermentation of yogurt and cheeses. We conducted comparative genome analysis of S. thermophilus LMD-9 to identify unique gene features as well as features that contribute to its adaptation to the dairy environment. In addition, we investigated the transcriptome response of LMD-9 during growth in milk in the presence of Lactobacillus delbrueckii ssp. bulgaricus, a companion culture in yogurt fermentation, and during lytic bacteriophage infection. Results: The S. thermophilus LMD-9 genome is comprised of a 1.8 Mbp circular chromosome (39.1% GC; 1,834 predicted open reading frames) and two small cryptic plasmids. Genome comparison with the previously sequenced LMG 18311 and CNRZ1066 strains revealed 114 kb of LMD-9 specific chromosomal region, including genes that encode for histidine biosynthetic pathway, a cell surface proteinase, various host defense mechanisms and a phage remnant. Interestingly, also unique to LMD-9 are genes encoding for a putative mucus-binding protein, a peptide transporter, and exopolysaccharide biosynthetic proteins that have close orthologs in human intestinal microorganisms. LMD-9 harbors a large number of pseudogenes (13% of ORFeome), indicating that like LMG 18311 and CNRZ1066, LMD-9 has also undergone major reductive evolution, with the loss of carbohydrate metabolic genes and virulence genes found in their streptococcal counterparts. Functional genome distribution analysis of ORFeomes among streptococci showed that all three S. thermophilus strains formed a distinct functional cluster, further establishing their specialized adaptation to the nutrient-rich milk niche. An upregulation of CRISPR1 expression in LMD-9 during lytic bacteriophage DT1 infection suggests its protective role against phage invasion. When co-cultured with L. bulgaricus, LMD-9 overexpressed genes involved in amino acid transport and metabolism as well as DNA replication. Conclusions: The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications. The presence of a unique exopolysaccharide gene cluster and cell surface protein orthologs commonly associated with probiotic functionality revealed potential probiotic applications of LMD-9.

Book
15 Dec 2011
TL;DR: Conservation versus polymorphism of the MHC in relation to transplantation, immune responses, and autoimmune disease and multiple mutational mechanisms which generate polymorphism at H-2K are compared.
Abstract: Organization and evolution of the MHC chromosomal region: An overview.- Reconstruction of phylogenetic trees and evolution of major histocompatibility complex genes.- Trans-species polymorphism of HLA molecules, founder principle, and human evolution.- Calibrating evolutionary rates at major histocompatibility complex loci.- Concerted mutagenesis: Its potential impact on interpretation of evolutionary relationships.- Two models of evolution of the class I MHC.- Evolution of MHC domains: Strategy for isolation of MHC genes from primitive animals.- Generation of allelic polymorphism at the DRB1 locus of primates by exchange of polymorphic domains: A plausible hypothesis?.- A phylogenetic investigation of MHC class II DRB genes reveals convergent evolution in the antigen binding site.- Diversification of class II A? within the genus Mus.- Molecular and genetic mechanisms involved in the generation of Mhc diversity.- Evidence for multiple mutational mechanisms which generate polymorphism at H-2K.- Contributions of interlocus exchange to the structural diversity of the H-2K, D, and L alleles.- Evolution of Great Ape MHC class I genes.- Evolution of New World primate MHC class I genes.- Polymorphisms of the major histocompatibility complex in Old and New World primates.- Mhc class I genes of New World monkeys and their relationship to human genes.- Selective inactivation of the primate Mhc-DQA2 locus.- Is DQB2 functional among nonhuman primates?.- Alu repeats and evolution of the HLA-DQA1 locus.- The Alu repeats of the primate DRB genes.- Interpreting MHC disequilibrium.- Frozen haplotypes in Mhc evolution.- The age and evolution of the DRB pseudogenes.- Organization and evolution of the HLA-DRB genes.- The MHC of Peromyscus Leucopus (Mhc-Pele) illustrates large- and small-scale expansion in the phylogeny of MHC loci.- Sequence and evolution of bovine MHC class I genes.- Evolution of MHC molecules in nonmammalian vertebrates.- The polymorphic B-G antigens of the chicken MHC - Do the structure and tissue distribution suggest a function?.- Evolution of primate C4 and CYP21 genes.- Mapping of a hot spot in the major recombination area of the mouse H-2 complex.- Conservation versus polymorphism of the MHC in relation to transplantation, immune responses, and autoimmune disease.- HLA associations with malaria in Africa: Some implications for MHC evolution.- The evolution of MHC-based mating preferences in Mus.- Possible MHC associated heterozygous advantage in wild mouse populations.- Antigen presentation by neoclassical MHC class I gene products in murine rodents.- Mls antigens (superantigens), class II MHC, and Tcr repertoire: Co-adaptive evolution.- Diversity and evolution at the Eb recombinational hotspot in the mouse.- Molecular dissection of the Eb recombinational hotspot in the mouse.- Molecular cloning of nurse shark cDNAs with high sequence similarity to nucleoside diphosphate kinase genes.- Author Index.

Journal ArticleDOI
TL;DR: In this paper, a method for detecting germline mutations was developed by combining an original RNA-based cDNA-PCR mutation detection method and denaturing high-performance liquid chromatography (DHPLC) with multiplex ligation-dependent probe amplification (MLPA).

Journal ArticleDOI
TL;DR: New insights are provided on understanding the roles of nuclear mtDNA sequences in genome complexities of these mosquitoes and stretches of homologies are present among the genic and non-genic NUMTs that may play important roles in genomic rearrangement of NUMTs in these genomes.

Journal ArticleDOI
TL;DR: The Ma gene from the Myrobalan plum confers complete-spectrum, heat-stable, and high-level resistance to RKN, which is remarkable in comparison with the Mi-1 gene from tomato (Solanum lycopersicum), the sole RKN resistance gene cloned.
Abstract: Root-knot nematode (RKN) Meloidogyne species are major polyphagous pests of most crops worldwide, and cultivars with durable resistance are urgently needed because of nematicide bans. The Ma gene from the Myrobalan plum (Prunus cerasifera) confers complete-spectrum, heat-stable, and high-level resistance to RKN, which is remarkable in comparison with the Mi-1 gene from tomato (Solanum lycopersicum), the sole RKN resistance gene cloned. We report here the positional cloning and the functional validation of the Ma locus present at the heterozygous state in the P.2175 accession. High-resolution mapping totaling over 3,000 segregants reduced the Ma locus interval to a 32-kb cluster of three Toll/Interleukin1 Receptor-Nucleotide Binding Site-Leucine-Rich Repeat (LRR) genes (TNL1–TNL3), including a pseudogene (TNL2) and a truncated gene (TNL3). The sole complete gene in this interval (TNL1) was validated as Ma, as it conferred the same complete-spectrum and high-level resistance (as in P.2175) using its genomic sequence and native promoter region in Agrobacterium rhizogenes-transformed hairy roots and composite plants. The full-length cDNA (2,048 amino acids) of Ma is the longest of all Resistance genes cloned to date. Its TNL structure is completed by a huge post-LRR (PL) sequence (1,088 amino acids) comprising five repeated carboxyl-terminal PL exons with two conserved motifs. The amino-terminal region (213 amino acids) of the LRR exon is conserved between alleles and contrasts with the high interallelic polymorphisms of its distal region (111 amino acids) and of PL domains. The Ma gene highlights the importance of these uncharacterized PL domains, which may be involved in pathogen recognition through the decoy hypothesis or in nuclear signaling.

Journal ArticleDOI
TL;DR: The content and distribution of t RNA genes in five flowering plants and one green alga is analysed, and a new family of tRNA-related short interspersed nuclear elements (SINEs) in the Populus trichocarpa nuclear genome is identified.
Abstract: Although transfer RNA (tRNA) has a fundamental role in cell life, little is known about tRNA gene organization and expression on a genome-wide scale in eukaryotes, particularly plants. Here, we analyse the content and distribution of tRNA genes in five flowering plants and one green alga. The tRNA gene content is homogenous in plants, and is mostly correlated with genome size. The number of tRNA pseudogenes and organellar-like tRNA genes present in nuclear genomes varies greatly from one plant species to another. These pseudogenes or organellar-like genes appear to be generated or inserted randomly during evolution. Interestingly, we identified a new family of tRNA-related short interspersed nuclear elements (SINEs) in the Populus trichocarpa nuclear genome. In higher plants, intron-containing tRNA genes are rare, and correspond to genes coding for tRNA(Tyr) and tRNA(Mete) . By contrast, in green algae, more than half of the tRNA genes contain an intron. This suggests divergent means of intron acquisition and the splicing process between green algae and land plants. Numerous tRNAs are co-transcribed in Chlamydomonas, but they are mostly transcribed as a single unit in flowering plants. The only exceptions are tRNA(Gly) -snoRNA and tRNA(Mete) -snoRNA cotranscripts in dicots and monocots, respectively. The internal or external motifs required for efficient transcription of tRNA genes by RNA polymerase III are well conserved among angiosperms. A brief analysis of the mitochondrial and plastidial tRNA gene populations is also provided.

Journal ArticleDOI
01 Feb 2011-Genetica
TL;DR: A method based on sequence comparisons of lineages with and without functional GLO genes was used to calculate inactivation dates of 61 and 14 MYA for the primate and guinea pig genes, respectively, consistent with previous phylogeny-based estimates.
Abstract: The capacity to biosynthesize ascorbic acid has been lost in a number of species including primates, guinea pigs, teleost fishes, bats, and birds. This inability results from mutations in the GLO gene coding for L-gulono-γ-lactone oxidase, the enzyme responsible for catalyzing the last step in the vitamin C biosynthetic pathway. We analyzed available primate and rodent GLO gene sequences to determine their evolutionary history. We used a method based on sequence comparisons of lineages with and without functional GLO genes to calculate inactivation dates of 61 and 14 MYA for the primate and guinea pig genes, respectively. These estimates are consistent with previous phylogeny-based estimates. An analysis of transposable element distribution in the primate and rodent GLO sequences did not reveal conclusive evidence that illegitimate recombination between repeats has contributed to the loss of exons in the primate and guinea pig genes.

Journal ArticleDOI
TL;DR: Mechanistically, MYLKP1 overexpression inhibits smMLCK expression in cancer cells by decreasing RNA stability, leading to increased cell proliferation, and these studies provide strong evidence for the functional involvement of pseudogenes in carcinogenesis.
Abstract: Pseudogenes are considered nonfunctional genomic artifacts of catastrophic pathways. Recent evidence, however, indicates novel roles for pseudogenes as regulators of gene expression. We tested the functionality of myosin light chain kinase pseudogene (MYLKP1) in human cells and tissues by RT-PCR, promoter activity, and cell proliferation assays. MYLKP1 is partially duplicated from the original MYLK gene that encodes nonmuscle and smooth muscle myosin light chain kinase (smMLCK) isoforms and regulates cell contractility and cytokinesis. Despite strong homology with the smMLCK promoter (∼ 89.9%), the MYLKP1 promoter is minimally active in normal bronchial epithelial cells but highly active in lung adenocarcinoma cells. Moreover, MYLKP1 and smMLCK exhibit negatively correlated transcriptional patterns in normal and cancer cells with MYLKP1 strongly expressed in cancer cells and smMLCK highly expressed in non-neoplastic cells. For instance, expression of smMLCK decreased (19.5 ± 4.7 fold) in colon carcinoma tissues compared to normal colon tissues. Mechanistically, MYLKP1 overexpression inhibits smMLCK expression in cancer cells by decreasing RNA stability, leading to increased cell proliferation. These studies provide strong evidence for the functional involvement of pseudogenes in carcinogenesis and suggest MYLKP1 as a potential novel diagnostic or therapeutic target in human cancers.

Journal ArticleDOI
01 Jul 2011-Gene
TL;DR: In this paper, a new genome-wide analysis of cytochrome P450 genes was performed based on the advances in the silkworm genome project, a total of 84 CYP-related sequences were identified and could be classified into 26 families and 47 subfamilies according to standard nomenclature.

Journal ArticleDOI
TL;DR: The expression of OCT4 splicing variant and various pseudogenes at both the mRNA and protein levels in human somatic tumours might call into question the reliability of the results regarding OCT4 expression and function in tumourigenesis.
Abstract: The POU family transcription factor OCT4 is required for maintaining the pluripotency of embryonic stem cells and for generating induced pluripotent stem cells. Although OCT4 is clearly shown to be expressed in some pluripotent germ cell tumours, its expression in human somatic tumours remains controversial. Some studies have shown that OCT4 is expressed in adult stem cells, somatic cancers and, further, cancer stem cells, while other studies failed to make such an observation. It is thus important to ascertain whether OCT4 is expressed in human somatic tumours. By using RT-PCR and sequencing analysis, three OCT4 pseudogenes, viz. OCT4-pg1, OCT4-pg3 and OCT4-pg4 but excluding the OCT4 gene, were found to be expressed in two types of human solid tumours, glioma and breast carcinoma, from which cancer stem cells had earlier been isolated. The protein expression of these pseudogenes was further demonstrated by immunochemistry and western blotting. Along with this, it was shown that OCT4 pseudogenes lacked OCT4-like activities. The expression of OCT4 splicing variant and various pseudogenes at both the mRNA and protein levels in human somatic tumours might call into question the reliability of the results regarding OCT4 expression and function in tumourigenesis. Hence, in investigations of OCT4 expression in cancers and stem cells, different approaches with appropriate controls would be desirable to exclude possibility of false-positive results.

Journal ArticleDOI
TL;DR: Identification of a candidate gene underlying a trans-eQTL demonstrated the feasibility of eQTL cloning in maize and could help to understand the mechanism of gene expression regulation.
Abstract: Expression QTL analyses have shed light on transcriptional regulation in numerous species of plants, animals, and yeasts These microarray-based analyses identify regulators of gene expression as either cis-acting factors that regulate proximal genes, or trans-acting factors that function through a variety of mechanisms to affect transcript abundance of unlinked genes A hydroponics-based genetical genomics study in roots of a Zea mays IBM2 Syn10 double haploid population identified tens of thousands of cis-acting and trans-acting eQTL Cases of false-positive eQTL, which results from the lack of complete genomic sequences from both parental genomes, were described A candidate gene for a trans-acting regulatory factor was identified through positional cloning The unexpected regulatory function of a class I glutamine amidotransferase controls the expression of an ABA 8'-hydroxylase pseudogene Identification of a candidate gene underlying a trans-eQTL demonstrated the feasibility of eQTL cloning in maize and could help to understand the mechanism of gene expression regulation Lack of complete genome sequences from both parents could cause the identification of false-positive cis- and trans-acting eQTL

Journal ArticleDOI
TL;DR: The events following the initial discovery of the 'useless' pseudogene to its breakthrough as a functional molecule with hitherto unbeknownst potential to influence human disease are chronicle.