scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2008"


Journal ArticleDOI
22 May 2008-Nature
TL;DR: These findings indicate a function for pseudogenes in regulating gene expression by means of the RNA interference pathway and may, in part, explain the evolutionary pressure to conserve argonaute-mediated catalysis in mammals.
Abstract: Over evolutionary time, many genes undergo duplication and one copy accumulates mutations that render it non-functional. These 'pseudogenes' are generally thought to be rather uninteresting, dead-end pieces of the genome. Yet there now appears to be more to it than that. Two groups report in this issue on pseudogenes that can in fact influence gene expression. The mechanism involves pairing of RNA antisense transcripts from pseudogenes with the mRNAs of protein-coding genes, forming a duplex RNA that is processed into endogenous siRNAs. Over evolutionary time genes can undergo duplication, and may accumulate mutations that render them non-functional pseudogenes, which are thought to be uninteresting. This study (and that of the group of Sasaki) shows that pseudogenes can in fact influence gene expression. Pseudogenes populate the mammalian genome as remnants of artefactual incorporation of coding messenger RNAs into transposon pathways1. Here we show that a subset of pseudogenes generates endogenous small interfering RNAs (endo-siRNAs) in mouse oocytes. These endo-siRNAs are often processed from double-stranded RNAs formed by hybridization of spliced transcripts from protein-coding genes to antisense transcripts from homologous pseudogenes. An inverted repeat pseudogene can also generate abundant small RNAs directly. A second class of endo-siRNAs may enforce repression of mobile genetic elements, acting together with Piwi-interacting RNAs. Loss of Dicer, a protein integral to small RNA production, increases expression of endo-siRNA targets, demonstrating their regulatory activity. Our findings indicate a function for pseudogenes in regulating gene expression by means of the RNA interference pathway and may, in part, explain the evolutionary pressure to conserve argonaute-mediated catalysis in mammals.

1,059 citations


Journal ArticleDOI
TL;DR: The spectrum of GBA mutations and their distribution in the patient population, evolutionary conservation, clinical presentations, and how they may affect the structure and function of glucocerebrosidase are discussed.
Abstract: Gaucher disease (GD) is an autosomal recessive disorder caused by the deficiency of glucocerebrosidase, a lysosomal enzyme that catalyses the hydrolysis of the glycolipid glucocerebroside to ceramide and glucose. Lysosomal storage of the substrate in cells of the reticuloendothelial system leads to multisystemic manifestations, including involvement of the liver, spleen, bone marrow, lungs, and nervous system. Patients with GD have highly variable presentations and symptoms that, in many cases, do not correlate well with specific genotypes. Almost 300 unique mutations have been reported in the glucocerebrosidase gene (GBA), with a distribution that spans the gene. These include 203 missense mutations, 18 nonsense mutations, 36 small insertions or deletions that lead to either frameshifts or in-frame alterations, 14 splice junction mutations, and 13 complex alleles carrying two or more mutations in cis. Recombination events with a highly homologous pseudogene downstream of the GBA locus also have been identified, resulting from gene conversion, fusion, or duplication. In this review we discuss the spectrum of GBA mutations and their distribution in the patient population, evolutionary conservation, clinical presentations, and how they may affect the structure and function of glucocerebrosidase.

578 citations


Journal ArticleDOI
TL;DR: It seems that mutation by gene duplication and inactivation has important roles in both the adaptive and non-adaptive evolution of chemosensation.
Abstract: Chemosensory receptors are essential for the survival of organisms that range from bacteria to mammals. Recent studies have shown that the numbers of functional chemosensory receptor genes and pseudogenes vary enormously among the genomes of different animal species. Although much of the variation can be explained by the adaptation of organisms to different environments, it has become clear that a substantial portion is generated by genomic drift, a random process of gene duplication and deletion. Genomic drift also generates a substantial amount of copy-number variation in chemosensory receptor genes within species. It seems that mutation by gene duplication and inactivation has important roles in both the adaptive and non-adaptive evolution of chemosensation.

540 citations


Journal ArticleDOI
TL;DR: The genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and As pergillus clavatus NRRL1 are presented.
Abstract: We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”.

514 citations


Journal ArticleDOI
TL;DR: Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis, and it is proposed that experimental analysis in chickens and mice could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.
Abstract: We have determined the complete genome sequences of a host-promiscuous Salmonella enterica serovar Enteritidis PT4 isolate P125109 and a chicken-restricted Salmonella enterica serovar Gallinarum isolate 287/91. Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis. Significantly, the genome of S. Gallinarum has undergone extensive degradation through deletion and pseudogene formation. Comparison of the pseudogenes in S. Gallinarum with those identified previously in other host-adapted bacteria reveals the loss of many common functional traits and provides insights into possible mechanisms of host and tissue adaptation. We propose that experimental analysis in chickens and mice of S. Enteritidis-harboring mutations in functional homologs of the pseudogenes present in S. Gallinarum could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.

409 citations


Journal ArticleDOI
TL;DR: The chromatophore genome is characterized as a photosynthetic entity that is absolutely dependent on its host for growth and survival and is the only known cyanobacterial descendants besides plastids with a significantly reduced genome that confer photosynthesis to their eukaryotic host.

274 citations


Journal ArticleDOI
TL;DR: The genome sequence of A. salmonicida was determined to provide a better understanding of the virulence factors used by this pathogen to infect fish and provide insights into the mechanisms used by the bacterium for infection and avoidance of host defence systems.
Abstract: Aeromonas salmonicida subsp. salmonicida is a Gram-negative bacterium that is the causative agent of furunculosis, a bacterial septicaemia of salmonid fish. While other species of Aeromonas are opportunistic pathogens or are found in commensal or symbiotic relationships with animal hosts, A. salmonicida subsp. salmonicida causes disease in healthy fish. The genome sequence of A. salmonicida was determined to provide a better understanding of the virulence factors used by this pathogen to infect fish. The nucleotide sequences of the A. salmonicida subsp. salmonicida A449 chromosome and two large plasmids are characterized. The chromosome is 4,702,402 bp and encodes 4388 genes, while the two large plasmids are 166,749 and 155,098 bp with 178 and 164 genes, respectively. Notable features are a large inversion in the chromosome and, in one of the large plasmids, the presence of a Tn21 composite transposon containing mercury resistance genes and an In2 integron encoding genes for resistance to streptomycin/spectinomycin, quaternary ammonia compounds, sulphonamides and chloramphenicol. A large number of genes encoding potential virulence factors were identified; however, many appear to be pseudogenes since they contain insertion sequences, frameshifts or in-frame stop codons. A total of 170 pseudogenes and 88 insertion sequences (of ten different types) are found in the A. salmonicida genome. Comparison with the A. hydrophila ATCC 7966T genome reveals multiple large inversions in the chromosome as well as an approximately 9% difference in gene content indicating instances of single gene or operon loss or gain. A limited number of the pseudogenes found in A. salmonicida A449 were investigated in other Aeromonas strains and species. While nearly all the pseudogenes tested are present in A. salmonicida subsp. salmonicida strains, only about 25% were found in other A. salmonicida subspecies and none were detected in other Aeromonas species. Relative to the A. hydrophila ATCC 7966T genome, the A. salmonicida subsp. salmonicida genome has acquired multiple mobile genetic elements, undergone substantial rearrangement and developed a significant number of pseudogenes. These changes appear to be a consequence of adaptation to a specific host, salmonid fish, and provide insights into the mechanisms used by the bacterium for infection and avoidance of host defence systems.

253 citations


Journal ArticleDOI
TL;DR: The human hsp70-gene family is characterized by a remarkable evolutionary diversity that mainly resulted from multiple duplications and retrotranspositions of a highly expressed gene, HSPA8.
Abstract: Hsp70 chaperones are required for key cellular processes and response to environmental changes and survival but they have not been fully characterized yet. The human hsp70-gene family has an unknown number of members (eleven counted over ten years ago); some have been described but the information is incomplete and inconsistent. A coherent body of knowledge encompassing all family components that would facilitate their study individually and as a group is lacking. Nowadays, the study of chaperone genes benefits from the availability of genome sequences and a new protocol, chaperonomics, which we applied to elucidate the human hsp70 family. We identified 47 hsp70 sequences, 17 genes and 30 pseudogenes. The genes distributed into seven evolutionarily distinct groups with distinguishable subgroups according to phylogenetic and other data, such as exon-intron and protein features. The N-terminal ATP-binding domain (ABD) was conserved at least partially in the majority of the proteins but the C-terminal substrate-binding domain (SBD) was not. Nine proteins were typical Hsp70s (65–80 kDa) with ABD and SBD, two were lighter lacking partly or totally the SBD, and six were heavier (>80 kDa) with divergent C-terminal domains. We also analyzed exon-intron features, transcriptional variants and protein structure and isoforms, and modality and patterns of expression in various tissues and developmental stages. Evolutionary analyses, including human hsp70 genes and pseudogenes, and other eukaryotic hsp70 genes, showed that six human genes encoding cytosolic Hsp70s and 27 pseudogenes originated from retro-transposition of HSPA8, a gene highly expressed in most tissues and developmental stages. The human hsp70-gene family is characterized by a remarkable evolutionary diversity that mainly resulted from multiple duplications and retrotranspositions of a highly expressed gene, HSPA8. Human Hsp70 proteins are clustered into seven evolutionary Groups, with divergent C-terminal domains likely defining their distinctive functions. These functions may also be further defined by the observed differences in the N-terminal domain.

248 citations


Journal ArticleDOI
TL;DR: The first complete genome sequence of a termite gut symbiont—an uncultured bacterium named Rs-D17 belonging to the candidate phylum Termite Group 1—is presented, suggesting that this bacterial group plays a key role in the gut symbiotic system by stably supplying essential nitrogenous compounds deficient in lignocelluloses to their host protists and the termites.
Abstract: Termites harbor a symbiotic gut microbial community that is responsible for their ability to thrive on recalcitrant plant matter. The community comprises diverse microorganisms, most of which are as yet uncultivable; the detailed symbiotic mechanism remains unclear. Here, we present the first complete genome sequence of a termite gut symbiont—an uncultured bacterium named Rs-D17 belonging to the candidate phylum Termite Group 1 (TG1). TG1 is a dominant group in termite guts, found as intracellular symbionts of various cellulolytic protists, without any physiological information. To acquire the complete genome sequence, we collected Rs-D17 cells from only a single host protist cell to minimize their genomic variation and performed isothermal whole-genome amplification. This strategy enabled us to reconstruct a circular chromosome (1,125,857 bp) encoding 761 putative protein-coding genes. The genome additionally contains 121 pseudogenes assigned to categories, such as cell wall biosynthesis, regulators, transporters, and defense mechanisms. Despite its apparent reductive evolution, the ability to synthesize 15 amino acids and various cofactors is retained, some of these genes having been duplicated. Considering that diverse termite-gut protists harbor TG1 bacteria, we suggest that this bacterial group plays a key role in the gut symbiotic system by stably supplying essential nitrogenous compounds deficient in lignocelluloses to their host protists and the termites. Our results provide a breakthrough to clarify the functions of and the interactions among the individual members of this multilayered symbiotic complex.

228 citations


Journal ArticleDOI
TL;DR: The Trachelium chloroplast genome shares with Pelargonium and Jasminum both a higher number of repeats and larger repeated sequences in comparison to eight other angiosperm chloropleft genomes, and these are concentrated near rearrangement endpoints.
Abstract: Chloroplast genome organization, gene order, and content are highly conserved among land plants. We sequenced the chloroplast genome of Trachelium caeruleum L. (Campanulaceae), a member of an angiosperm family known for highly rearranged genomes. The total genome size is 162,321 bp, with an inverted repeat (IR) of 27,273 bp, large single-copy (LSC) region of 100,114 bp, and small single-copy (SSC) region of 7,661 bp. The genome encodes 112 different genes, with 17 duplicated in the IR, a tRNA gene (trnI-cau) duplicated once in the LSC region, and a protein-coding gene (psbJ) with two duplicate copies, for a total of 132 putatively intact genes. ndhK may be a pseudogene with internal stop codons, and clpP, ycf1, and ycf2 are so highly diverged that they also may be pseudogenes. ycf15, rpl23, infA, and accD are truncated and likely nonfunctional. The most conspicuous feature of the Trachelium genome is the presence of 18 internally unrearranged blocks of genes inverted or relocated within the genome relative to the ancestral gene order of angiosperm chloroplast genomes. Recombination between repeats or tRNA genes has been suggested as a mechanism of chloroplast genome rearrangements. The Trachelium chloroplast genome shares with Pelargonium and Jasminum both a higher number of repeats and larger repeated sequences in comparison to eight other angiosperm chloroplast genomes, and these are concentrated near rearrangement endpoints. Genes for tRNAs occur at many but not all inversion endpoints, so some combination of repeats and tRNA genes may have mediated these rearrangements.

225 citations


Journal ArticleDOI
TL;DR: This study showed that the other expressed AtFLS sequences have tissue- and cell type-specific promoter activities that overlap with those of At FLAVONOL SYNTHASE1 and encode proteins that interact with other flavonoid enzymes in yeast two-hybrid assays, suggesting that these “pseudogenes” have alternative, noncatalytic functions that have not yet been uncovered.
Abstract: The genome of Arabidopsis (Arabidopsis thaliana) contains five sequences with high similarity to FLAVONOL SYNTHASE1 (AtFLS1), a previously characterized flavonol synthase gene that plays a central role in flavonoid metabolism. This apparent redundancy suggests the possibility that Arabidopsis uses multiple isoforms of FLS with different substrate specificities to mediate the production of the flavonols, quercetin and kaempferol, in a tissue-specific and inducible manner. However, biochemical and genetic analysis of the six AtFLS sequences indicates that, although several of the members are expressed, only AtFLS1 encodes a catalytically competent protein. AtFLS1 also appears to be the only member of this group that influences flavonoid levels and the root gravitropic response in seedlings under nonstressed conditions. This study showed that the other expressed AtFLS sequences have tissue- and cell type-specific promoter activities that overlap with those of AtFLS1 and encode proteins that interact with other flavonoid enzymes in yeast two-hybrid assays. Thus, it is possible that these "pseudogenes" have alternative, noncatalytic functions that have not yet been uncovered.

Journal ArticleDOI
TL;DR: Beetle pupae were injected with TcOr1 dsRNA; unlike sham-injected and control beetles, these knock-down beetles showed no significant response to the Tribolium aggregation pheromone, supporting the hypothesis that T cOr1 plays a similar decisive role in olfaction to DmOr83b.

Journal ArticleDOI
TL;DR: The bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis are investigated, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago.
Abstract: Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we thoroughly investigated the bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis. Exhaustive PCR detection and Southern blot analysis suggested that ∼30% of Wolbachia genes, in terms of the gene repertoire of wMel, are present on the insect nuclear genome. Fluorescent in situ hybridization located the transferred genes on the proximal region of the basal short arm of the X chromosome. Molecular evolutionary and other lines of evidence indicated that the transferred genes are probably derived from a single lateral transfer event. The transferred genes were, for the length examined, structurally disrupted, freed from functional constraints, and transcriptionally inactive. Hence, most, if not all, of the transferred genes have been pseudogenized. Notwithstanding this, the transferred genes were ubiquitously detected from Japanese and Taiwanese populations of C. chinensis, while the number of the transferred genes detected differed between the populations. The transferred genes were not detected from congenic beetle species, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago. These features of the laterally transferred endosymbiont genes are compared with the evolutionary patterns of mitochondrial and plastid genome fragments acquired by nuclear genomes through recent endosymbiotic gene transfers.

Journal ArticleDOI
TL;DR: Comparisons of the Trifolium plastid genome with the Plant Repeat Database and searches for flanking inverted repeats suggest that the high incidence of dispersed repeats and rearrangements is not likely the result of transposition.
Abstract: The plastid genome of Trifolium subterraneum is 144,763 bp, about 20 kb longer than those of closely related legumes, which also lost one copy of the large inverted repeat (IR). The genome has undergone extensive genomic reconfiguration, including the loss of six genes (accD, infA, rpl22, rps16, rps18, and ycf1) and two introns (clpP and rps12) and numerous gene order changes, attributable to 14-18 inversions. All endpoints of rearranged gene clusters are flanked by repeated sequences, tRNAs, or pseudogenes. One unusual feature of the Trifolium subterraneum genome is the large number of dispersed repeats, which comprise 19.5% (ca. 28 kb) of the genome (versus about 4% for other angiosperms) and account for part of the increase in genome size. Nine genes (psbT, rbcL, clpP, rps3, rpl23, atpB, psbN, trnI-cau, and ycf3) have also been duplicated either partially or completely. rpl23 is the most highly duplicated gene, with portions of this gene duplicated six times. Comparisons of the Trifolium plastid genome with the Plant Repeat Database and searches for flanking inverted repeats suggest that the high incidence of dispersed repeats and rearrangements is not likely the result of transposition. Trifolium has 19.5 kb of unique DNA distributed among 160 fragments ranging in size from 30 to 494 bp, greatly surpassing the other five sequenced legume plastid genomes in novel DNA content. At least some of this unique DNA may represent horizontal transfer from bacterial genomes. These unusual features provide direction for the development of more complex models of plastid genome evolution.

Journal ArticleDOI
TL;DR: It is shown that in nine bird species from seven orders, the majority of amplified OR sequences are predicted to be from potentially functional genes, which suggests that olfaction in birds may be a more important sense than generally believed.
Abstract: Among vertebrates, the sense of smell is mediated by olfactory receptors (ORs) expressed in sensory neurons within the olfactory epithelium. Comparative genomic studies suggest that the olfactory acuity of mammalian species correlates positively with both the total number and the proportion of functional OR genes encoded in their genomes. In contrast to mammals, avian olfaction is poorly understood, with birds widely regarded as relying primarily on visual and auditory inputs. Here, we show that in nine bird species from seven orders (blue tit, Cyanistes caeruleus; black coucal, Centropus grillii; brown kiwi, Apteryx australis; canary, Serinus canaria; galah, Eolophus roseicapillus; red jungle fowl, Gallus gallus; kakapo, Strigops habroptilus; mallard, Anas platyrhynchos; snow petrel, Pagodroma nivea), the majority of amplified OR sequences are predicted to be from potentially functional genes. This finding is somewhat surprising as one previous report suggested that the majority of OR genes in an avian (red jungle fowl) genomic sequence are non-functional pseudogenes. We also show that it is not the estimated proportion of potentially functional OR genes, but rather the estimated total number of OR genes that correlates positively with relative olfactory bulb size, an anatomical correlate of olfactory capability. We further demonstrate that all the nine bird genomes examined encode OR genes belonging to a large gene clade, termed g-c, the expansion of which appears to be a shared characteristic of class Aves. In summary, our findings suggest that olfaction in birds may be a more important sense than generally believed.

Journal ArticleDOI
TL;DR: A large-scale analysis provides a comprehensive evaluation on the Arabidopsis GRAS members, and the phenome-ready unimutant collection will be a useful resource to better understand individual GRAS proteins that play diverse roles in plant growth and development.
Abstract: GRAS proteins belong to a plant-specific transcription factor family. Currently, 33 GRAS members including a putative expressed pseudogene have been identified in the Arabidopsis genome. With a reverse genetic approach, we have constructed a "phenome-ready unimutant collection" of the GRAS genes in Arabidopsis thaliana. Of this collection, we focused on loss-of-function mutations in 23 novel GRAS members. Under standard conditions, homozygous mutants have no obvious morphological phenotypes compared with those of wild-type plants. Expression analysis of GRAS genes using quantitative real-time RT-PCR (qRT-PCR), microarray data mining, and promoter::GUS reporter fusions revealed their tissue-specific expression patterns. Our analysis of protein-protein interaction and subcellular localization of individual GRAS members indicated their roles as transcription regulators. In our yeast two-hybrid (Y2H) assay, we confirmed the protein-protein interaction between SHORT-ROOT (SHR) and SCARECROW (SCR). Furthermore, we identified a new SHR-interacting protein, SCARECROW-LIKE23 (SCL23), which is the most closely related to SCR. Our large-scale analysis provides a comprehensive evaluation on the Arabidopsis GRAS members, and also our phenome-ready unimutant collection will be a useful resource to better understand individual GRAS proteins that play diverse roles in plant growth and development.

Journal ArticleDOI
TL;DR: The complete genome sequence of O. tsutsugamushi strain Ikeda is determined, which comprises a single chromosome of 2 008 987 bp and contains 1967 protein coding sequences (CDSs), and is a prominent example of the high plasticity of bacterial genomes.
Abstract: Scrub typhus (‘Tsutsugamushi’ disease in Japanese) is a mite-borne infectious disease. The causative agent is Orientia tsutsugamushi, an obligate intracellular bacterium belonging to the family Rickettsiaceae of the subdivision alpha-Proteobacteria. In this study, we determined the complete genome sequence of O. tsutsugamushi strain Ikeda, which comprises a single chromosome of 2 008 987 bp and contains 1967 protein coding sequences (CDSs). The chromosome is much larger than those of other members of Rickettsiaceae, and 46.7% of the sequence was occupied by repetitive sequences derived from an integrative and conjugative element, 10 types of transposable elements, and seven types of short repeats of unknown origins. The massive amplification and degradation of these elements have generated a huge number of repeated genes (1196 CDSs, categorized into 85 families), many of which are pseudogenes (766 CDSs), and also induced intensive genome shuffling. By comparing the gene content with those of other family members of Rickettsiacea, we identified the core gene set of the family Rickettsiaceae and found that, while much more extensive gene loss has taken place among the housekeeping genes of Orientia than those of Rickettsia, O. tsutsugamushi has acquired a large number of foreign genes. The O. tsutsugamushi genome sequence is thus a prominent example of the high plasticity of bacterial genomes, and provides the genetic basis for a better understanding of the biology of O. tsutsugamushi and the pathogenesis of ‘Tsutsugamushi’ disease.

Journal ArticleDOI
TL;DR: The study revealed that the extracellular matrix surrounding vertebrate eggs contains three to at least six ZP glycoproteins, and provides new directions to investigate the molecular basis of sperm-egg recognition, a mechanism which is not yet elucidated.
Abstract: Vertebrate eggs are surrounded by an extracellular matrix with similar functions and conserved individual components: the zona pellucida (ZP) glycoproteins. In mammals, chickens, frogs, and some fish species, we established an updated list of the ZP genes, studied the relationships within the ZP gene family using phylogenetic analysis, and identified ZP pseudogenes. Our study confirmed the classification of ZP genes in six subfamilies: ZPA/ZP2, ZPB/ZP4, ZPC/ZP3, ZP1, ZPAX, and ZPD. The identification of a Zpb pseudogene in the mouse genome, Zp1 pseudogenes in the dog and bovine genomes, and Zpax pseudogenes in the human, chimpanzee, macaque, and bovine genomes showed that the evolution of ZP genes mainly occurs by death of genes. Our study revealed that the extracellular matrix surrounding vertebrate eggs contains three to at least six ZP glycoproteins. Mammals can be classified in three categories. In the mouse, the ZP is composed of three ZP proteins (ZPA/ZP2, ZPC/ZP3, and ZP1). In dog, cattle and, putatively, pig, cat, and rabbit, the zona is composed of three ZP proteins (ZPA/ZP2, ZPB/ZP4, and ZPC/ZP3). In human, chimpanzee, macaque, and rat, the ZP is composed of four ZP proteins (ZPA/ZP2, ZPB/ZP4, ZPC/ZP3, and ZP1). Our review provides new directions to investigate the molecular basis of sperm-egg recognition, a mechanism which is not yet elucidated.

Journal ArticleDOI
TL;DR: The DNA sequences of wheat Acc-1 and Acc-2 loci, encoding the plastid and cytosolic forms of the enzyme acetyl-CoA carboxylase, were analyzed with a view to understanding the evolution of these genes and the origin of the three genomes in modern hexaploid wheat.
Abstract: The DNA sequences of wheat Acc-1 and Acc-2 loci, encoding the plastid and cytosolic forms of the enzyme acetyl-CoA carboxylase, were analyzed with a view to understanding the evolution of these genes and the origin of the three genomes in modern hexaploid wheat. Acc-1 and Acc-2 loci from each of the wheats Triticum urartu (A genome), Aegilops tauschii (D genome), Triticum turgidum (AB genome), and Triticum aestivum (ABD genome), as well as two Acc-2-related pseudogenes from T. urartu were sequenced. The 2.3-2.4 Mya divergence time calculated here for the three homoeologous chromosomes, on the basis of coding and intron sequences of the Acc-1 genes, is at the low end of other estimates. Our clock was calibrated by using 60 Mya for the divergence between wheat and maize. On the same time scale, wheat and barley diverged 11.6 Mya, based on sequences of Acc and other genes. The regions flanking the Acc genes are not conserved among the A, B, and D genomes. They are conserved when comparing homoeologous genomes of diploid, tetraploid, and hexaploid wheats. Substitution rates in intergenic regions consisting primarily of repetitive sequences vary substantially along the loci and on average are 3.5-fold higher than the Acc intron substitution rates. The composition of the Acc homoeoloci suggests haplotype divergence exceeding in some cases 0.5 Mya. Such variation might result in a significant overestimate of the time since tetraploid wheat formation, which occurred no more than 0.5 Mya.

Journal ArticleDOI
25 Sep 2008-PLOS ONE
TL;DR: The findings illustrate that increasing genomic complexity of the Mup gene family is not evolutionarily isolated, but is instead a recurring mechanism of generating coding diversity consistent with a species-specific function in mammals.
Abstract: Species-specific chemosignals, pheromones, regulate social behaviors such as aggression, mating, pup-suckling, territory establishment, and dominance. The identity of these cues remains mostly undetermined and few mammalian pheromones have been identified. Genetically-encoded pheromones are expected to exhibit several different mechanisms for coding 1) diversity, to enable the signaling of multiple behaviors, 2) dynamic regulation, to indicate age and dominance, and 3) species-specificity. Recently, the major urinary proteins (Mups) have been shown to function themselves as genetically-encoded pheromones to regulate species-specific behavior. Mups are multiple highly related proteins expressed in combinatorial patterns that differ between individuals, gender, and age; which are sufficient to fulfill the first two criteria. We have now characterized and fully annotated the mouse Mup gene content in detail. This has enabled us to further analyze the extent of Mup coding diversity and determine their potential to encode species-specific cues. Our results show that the mouse Mup gene cluster is composed of two subgroups: an older, more divergent class of genes and pseudogenes, and a second class with high sequence identity formed by recent sequential duplications of a single gene/pseudogene pair. Previous work suggests that truncated Mup pseudogenes may encode a family of functional hexapeptides with the potential for pheromone activity. Sequence comparison, however, reveals that they have limited coding potential. Similar analyses of nine other completed genomes find Mup gene expansions in divergent lineages, including those of rat, horse and grey mouse lemur, occurring independently from a single ancestral Mup present in other placental mammals. Our findings illustrate that increasing genomic complexity of the Mup gene family is not evolutionarily isolated, but is instead a recurring mechanism of generating coding diversity consistent with a species-specific function in mammals.

Journal ArticleDOI
TL;DR: The olfactory receptor gene (OR) superfamily is the largest in the human genome and contains 390 putatively functional genes and 465 pseudogenes arranged into 18 gene families and 300 subfamilies, which is lowest in human, higher in chimpanzee and highest in rat and mouse.
Abstract: The olfactory receptor gene (OR) superfamily is the largest in the human genome. The superfamily contains 390 putatively functional genes and 465 pseudogenes arranged into 18 gene families and 300 subfamilies. Even members within the same subfamily are often located on different chromosomes. OR genes are located on all autosomes except chromosome 20, plus the X chromosome but not the Y chromosome. The gene:pseudogene ratio is lowest in human, higher in chimpanzee and highest in rat and mouse -- most likely reflecting the greater need of olfaction for survival in the rodent than in the human. The OR genes undergo allelic exclusion, each sensory neurone expressing usually only one odourant receptor allele; the mechanism by which this phenomenon is regulated is not yet understood. The nomenclature system (based on evolutionary divergence of genes into families and subfamilies of the OR gene superfamily) has been designed similarly to that originally used for the CYP gene superfamily.

Journal ArticleDOI
TL;DR: A multiplex ligation-dependent probe amplification assay was developed employing base-pair differences between PKD1 and the six pseudogenes to generate PKD 1-specific probes, which improves detection levels and the reliability of molecular testing of patients with ADPKD.

Journal ArticleDOI
TL;DR: The results suggest that recombination activity is not a direct cause of convergent gene rearrangement; rather, homoplasious gene rearranged seems to be mediated by persistence of a copied genomic condition through several lineage splits and subsequent parallel deletions.
Abstract: In Malagasy frogs of the family Mantellidae, the genus Mantella is known to possess highly reorganized mitochondrial (mt) genomes with the following characteristics: 1) some rearranged gene positions, 2) 2 distinct genes and a pseudogene corresponding to the transfer RNA gene for methionine (trnM), and 3) 2 control regions (CRs) with almost identical nucleotide sequences. These unique genomic features were observed concentrated between the duplicated CRs surrounding cytochrome b (cob) and nicotinamide adenine dinucleotide dehydrogenase subunit 2 (cnad2) genes. To elucidate the mechanisms and evolutionary pathway that yielded the derived genome condition, we surveyed the reorganized genomic portion for all 12 mantellid genera. Our results show that the mt genomes of 7 genera retain the ancestral condition. In contrast, adding to Mantella, 4 genera of the subfamily Mantellinae, Blommersia, Guibemantis, Wakea, and Spinomantis, share several derived genomic characters. Furthermore, mt genomes of these mantellines showed additional structural divergences, resulting in different genome conditions between them. The high frequency of genomic reorganization does not correlate with nucleotide substitution rate. The encountered mt genomic conditions also suggest the occurrences of stepwise gene duplication and deletion events during the evolution of mantellines. Simultaneously, the majority of duplication events seems to be mediated by general (homologous) or illegitimate recombination, and general recombination also plays a role in concerted sequence evolution between multiple CRs. Considering our observations and recent conditional evidences, the following outlines can be expected for recombination processes in mt genome reorganization. 1) The CR is the "hot spot" of recombination; 2) highly frequent recombination between CRs may be mediated by a replication fork barrier lying in the CR; 3) general recombination has a potential to cause gene rearrangement in upstream regions of multiple CRs as the results of gene conversion and unequal crossing over processes. Our results also suggest that recombination activity is not a direct cause of convergent gene rearrangement; rather, homoplasious gene rearrangement seems to be mediated by persistence of a copied genomic condition through several lineage splits and subsequent parallel deletions.

Journal ArticleDOI
TL;DR: Chemoreceptor gene families in Caenorhabditis species are large and evolutionarily dynamic as a result of gene duplication and gene loss, and the gray pawn hypothesis is proposed: individual genes are of little significance, but the aggregate of a large number of diverse genes is required to cover a large phenotype space.
Abstract: Chemoreceptor proteins mediate the first step in the transduction of environmental chemical stimuli, defining the breadth of detection and conferring stimulus specificity. Animal genomes contain families of genes encoding chemoreceptors that mediate taste, olfaction, and pheromone responses. The size and diversity of these families reflect the biology of chemoperception in specific species. Based on manual curation and sequence comparisons among putative G-protein-coupled chemoreceptor genes in the nematode Caenorhabditis elegans, we identified approximately 1300 genes and 400 pseudogenes in the 19 largest gene families, most of which fall into larger superfamilies. In the related species C. briggsae and C. remanei, we identified most or all genes in each of the 19 families. For most families, C. elegans has the largest number of genes and C. briggsae the smallest number, suggesting changes in the importance of chemoperception among the species. Protein trees reveal family-specific and species-specific patterns of gene duplication and gene loss. The frequency of strict orthologs varies among the families, from just over 50% in two families to less than 5% in three families. Several families include large species-specific expansions, mostly in C. elegans and C. remanei. Chemoreceptor gene families in Caenorhabditis species are large and evolutionarily dynamic as a result of gene duplication and gene loss. These dynamics shape the chemoreceptor gene complements in Caenorhabditis species and define the receptor space available for chemosensory responses. To explain these patterns, we propose the gray pawn hypothesis: individual genes are of little significance, but the aggregate of a large number of diverse genes is required to cover a large phenotype space.

Journal ArticleDOI
TL;DR: CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes, which means they occur more often at the periphery of the protein interaction network.

Journal ArticleDOI
TL;DR: A pattern of genome evolution congruent with functional gene losses in parasitic angiosperms is observed but it is suggested that A. mirabilis' plastid genome represents a genome in the early stages of decay following the relaxation of selection pressures.
Abstract: Aneura mirabilis is a parasitic liverwort that exploits an existing mycorrhizal association between a basidiomycete and a host tree. This unusual liverwort is the only known parasitic seedless land plant with a completely nonphotosynthetic life history. The complete plastid genome of A. mirabilis was sequenced to examine the effect of its nonphotosynthetic life history on plastid genome content. Using a partial genomic fosmid library approach, the genome was sequenced and shown to be 108,007 bp with a structure typical of green plant plastids. Comparisons were made with the plastid genome of Marchantia polymorpha, the only other liverwort plastid sequence available. All ndh genes are either absent or pseudogenes. Five of 15 psb genes are pseudogenes, as are 2 of 6 psa genes and 2 of 6 pet genes. Pseudogenes of cysA, cysT, ccsA, and ycf3 were also detected. The remaining complement of genes present in M. polymorpha is present in the plastid of A. mirabilis with intact open reading frames. All pseudogenes and gene losses co-occur with losses detected in the plastid of the parasitic angiosperm Epifagus virginiana, though the latter has functional gene losses not found in A. mirabilis. The plastid genome sequence of A. mirabilis represents only the second liverwort, and first mycoheterotroph, to have its plastid genome sequenced. We observed a pattern of genome evolution congruent with functional gene losses in parasitic angiosperms but suggest that its plastid genome represents a genome in the early stages of decay following the relaxation of selection pressures.

Journal ArticleDOI
TL;DR: It is observed that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases, and results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee, a finding potentially related to the known diminution of the human OR repertoire.
Abstract: Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic contentof ORgenes isconsiderably reducedin humans,asreflected bythe relatively small repertoire sizeand the high fraction (,55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ,50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us touncover nine common deletion alleles that affect 15 ORgenes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.

Journal ArticleDOI
05 Jun 2008-Nature
TL;DR: Pseudogenes constitute many of the non-coding DNA sequences that make up large parts of genomes, and it now emerges that some of them have active regulatory roles.
Abstract: Pseudogenes constitute many of the non-coding DNA sequences that make up large parts of genomes Once considered merely protein fossils, it now emerges that some of them have active regulatory roles

Journal ArticleDOI
TL;DR: The observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of MUP gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns, and it is proposed that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.
Abstract: Background: The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution. Results: We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified - including a gene/pseudogene pair - suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes. Conclusion: Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.

Journal ArticleDOI
TL;DR: This is the first activating human Siglec receptor found to have functional and non‐functional alleles within the population.
Abstract: Sialic acid binding immunoglobulin-like lectins (Siglec) are important components of immune recognition. The organization of Siglec genes in different species is consistent with rapid selection imposed by pathogens. We studied SIGLEC11 genes in human, rodent, dog, cow and non-human primates. The lineages of SIGLEC11 genes in these species have undergone dynamic gene duplication and conversion, forming a potential inhibitory (SIGLEC11)/activating (SIGLEC16) receptor pair in chimpanzee and humans. A cDNA encoding human Siglec-16, currently classed as a pseudogene in the databases (SIGLECP16), is expressed in various cell lines and tissues. A polymorphism screen for the two alleles (wild type and four-base pair deletion, 4bpDelta) of SIGLEC16 found their frequencies to be 50% amongst the UK population. A search for donor sequences for SIGLEC16 revealed a subfamily of activating Siglec with charged transmembrane domains predicted to associate with ITAM-encoding adaptor proteins. In support of this, Siglec-16 was expressed at the cell surface in the presence of DAP12, but not the FcRgamma chain. Using antisera specific to the cytoplasmic tail of Siglec-16, we identified Siglec-16 expression in CD14(+) tissue macrophages and in normal human brain, cancerous oesophagus and lung. This is the first activating human Siglec receptor found to have functional and non-functional alleles within the population.