scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2010"


Journal ArticleDOI
24 Jun 2010-Nature
TL;DR: It is found that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role, and this analysis extended to other cancer-related genes that possess pseudogenes, and revealed a non-coding function for mRNAs.
Abstract: The canonical role of messenger RNA (mRNA) is to deliver protein-coding information to sites of protein synthesis. However, given that microRNAs bind to RNAs, we hypothesized that RNAs could possess a regulatory role that relies on their ability to compete for microRNA binding, independently of their protein-coding function. As a model for the protein-coding-independent role of RNAs, we describe the functional relationship between the mRNAs produced by the PTEN tumour suppressor gene and its pseudogene PTENP1 and the critical consequences of this interaction. We find that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role. We also show that the PTENP1 locus is selectively lost in human cancer. We extended our analysis to other cancer-related genes that possess pseudogenes, such as oncogenic KRAS. We also demonstrate that the transcripts of protein-coding genes such as PTEN are biologically active. These findings attribute a novel biological role to expressed pseudogenes, as they can regulate coding gene expression, and reveal a non-coding function for mRNAs.

2,107 citations


Journal ArticleDOI
TL;DR: The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency.
Abstract: The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary time. In the laboratory and in nature, numts enter the nuclear DNA via non-homolgous end joining (NHEJ) at double-strand breaks (DSBs). The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency. Polymorphic numts in humans link maternally inherited mitochondrial genotypes to nuclear DNA haplotypes during the past, offering new opportunities to associate nuclear markers with mitochondrial markers back in time.

550 citations


Journal ArticleDOI
TL;DR: Recently, a non-coding RNA expressed from a human pseudogene was reported to regulate the corresponding proteincoding mRNA by acting as a decoy for microRNAs (miRNAs) that bind to common sites in the 3′ untranslated regions (UTRs).

423 citations


01 Oct 2010
TL;DR: Questions are raised about the potential ability of thousands of non-coding transcripts to interact with miRNAs and influence the expression of miRNA target genes and some criteria for screening candidate sponge RNAs are considered.
Abstract: Recently, a non-coding RNA expressed from a human pseudogene was reported to regulate the corresponding protein-coding mRNA by acting as a decoy for microRNAs (miRNAs) that bind to common sites in the 3′ untranslated regions (UTRs). It was proposed that competing for miRNAs might be a general activity of pseudogenes. This study raises questions about the potential ability of thousands of non-coding transcripts to interact with miRNAs and influence the expression of miRNA target genes. Three years ago, artificial miRNA decoys termed ‘miRNA sponges’ were introduced as a means to create loss-of-function phenotypes for miRNA families in cell culture and in virally infected tissue and transgenic animals. Given the efficacy of miRNA sponges expressed from stable chromosomal insertions, it seemed plausible that natural non-coding RNAs might have evolved to sequence-specifically sequester miRNAs. The first such endogenous sponge RNA was discovered in plants and found to attenuate a miRNA-mediated response to an environmental stress. More recently, a viral non-coding RNA was observed to sequester and promote the degradation of a cellular miRNA in infected primate cells. In this review we discuss the potential and proven roles for endogenous miRNA sponges and consider some criteria for screening candidate sponge RNAs.

395 citations


Journal ArticleDOI
TL;DR: This study used Illumina‐based massively parallel sequencing to gain new insight into the transcriptome (RNA‐Seq) of the human malaria parasite, Plasmodium falciparum, and greatly improves existing annotation of the P. falcIParum genome.
Abstract: Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5′ and 3′ untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.

370 citations


Journal ArticleDOI
TL;DR: The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine and is provided with detailed experimental functional annotation of 39 members of this important gene family in Grapevine and comprehensive information about gene structure and phylogeny.
Abstract: Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and phylogeny for the entire currently known VvTPS gene family.

368 citations


01 Jan 2010
TL;DR: It is found that the PTENP1 locus is selectively lost in human cancer, and it is demonstrated that the transcripts of protein-coding genes such as PTEN are biologically active, which attribute a novel biological role to expressed pseudogenes, as they can regulate coding gene expression, and reveal a non-Coding function for mRNAs.
Abstract: LauraPoliseno 1 *{,LeonardoSalmena 1 *,JiangwenZhang 2 ,BrettCarver 3 ,WilliamJ.Haveman 1 &PierPaoloPandolfi 1 The canonical role of messenger RNA (mRNA) is to deliver protein-coding information to sites of protein synthesis. However,giventhatmicroRNAsbindtoRNAs,wehypothesizedthatRNAscouldpossessaregulatoryrolethatreliesontheir ability to compete for microRNA binding, independently of their protein-coding function. As a model for the protein-coding-independent role of RNAs, we describe the functional relationship between the mRNAs produced by the PTEN tumour suppressor gene and its pseudogene PTENP1 and the critical consequences of this interaction. We find that PTENP1isbiologicallyactiveasitcanregulatecellularlevelsofPTENandexertagrowth-suppressiverole.Wealsoshowthat the PTENP1 locus is selectively lost in human cancer. We extended our analysis to other cancer-related genes that possess pseudogenes, such as oncogenic KRAS. We also demonstrate that the transcripts of protein-coding genes such as PTEN are biologically active. These findings attribute a novel biological role to expressed pseudogenes, as they can regulate coding gene expression, and reveal a non-coding function for mRNAs.

365 citations


Journal ArticleDOI
TL;DR: The expression patterns of genes implicated in nodulation, and also transcription factors, are investigated using both the Solexa sequence data and large-scale qRT-PCR, facilitating both basic and applied aspects of soybean research.
Abstract: *SUMMARY Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcription of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.

345 citations


Journal ArticleDOI
TL;DR: This survey of intragenomic diversity of 16S rRNA genes provides strong evidence supporting the theory of ribosomal constraint.
Abstract: Analysis of intragenomic variation of 16S rRNA genes is a unique approach to examining the concept of ribosomal constraints on rRNA genes; the degree of variation is an important parameter to consider for estimation of the diversity of a complex microbiome in the recently initiated Human Microbiome Project (http://nihroadmap.nih.gov/hmp). The current GenBank database has a collection of 883 prokaryotic genomes representing 568 unique species, of which 425 species contained 2 to 15 copies of 16S rRNA genes per genome (2.22 ± 0.81). Sequence diversity among the 16S rRNA genes in a genome was found in 235 species (from 0.06% to 20.38%; 0.55% ± 1.46%). Compared with the 16S rRNA-based threshold for operational definition of species (1 to 1.3% diversity), the diversity was borderline (between 1% and 1.3%) in 10 species and >1.3% in 14 species. The diversified 16S rRNA genes in Haloarcula marismortui (diversity, 5.63%) and Thermoanaerobacter tengcongensis (6.70%) were highly conserved at the 2° structure level, while the diversified gene in B. afzelii (20.38%) appears to be a pseudogene. The diversified genes in the remaining 21 species were also conserved, except for a truncated 16S rRNA gene in “Candidatus Protochlamydia amoebophila.” Thus, this survey of intragenomic diversity of 16S rRNA genes provides strong evidence supporting the theory of ribosomal constraint. Taxonomic classification using the 16S rRNA-based operational threshold could misclassify a number of species into more than one species, leading to an overestimation of the diversity of a complex microbiome. This phenomenon is especially seen in 7 bacterial species associated with the human microbiome or diseases.

265 citations


Journal ArticleDOI
TL;DR: Experimental evidence supports the hypothesis that DBA is primarily the result of defective ribosome synthesis, and bioinformatic tools show that gene conversion mechanism is not common in RP genes mutagenesis, notwithstanding the abundance of RP pseudogenes.
Abstract: Diamond-Blackfan Anemia (DBA) is characterized by a defect of erythroid progenitors and, clinically, by anemia and malformations. DBA exhibits an autosomal dominant pattern of inheritance with incomplete penetrance. Currently nine genes, all encoding ribosomal proteins (RP), have been found mutated in approximately 50% of patients. Experimental evidence supports the hypothesis that DBA is primarily the result of defective ribosome synthesis. By means of a large collaboration among six centers, we report here a mutation update that includes nine genes and 220 distinct mutations, 56 of which are new. The DBA Mutation Database now includes data from 355 patients. Of those where inheritance has been examined, 125 patients carry a de novo mutation and 72 an inherited mutation. Mutagenesis may be ascribed to slippage in 65.5% of indels, whereas CpG dinucleotides are involved in 23% of transitions. Using bioinformatic tools we show that gene conversion mechanism is not common in RP genes mutagenesis, notwithstanding the abundance of RP pseudogenes. Genotype-phenotype analysis reveals that malformations are more frequently associated with mutations in RPL5 and RPL11 than in the other genes. All currently reported DBA mutations together with their functional and clinical data are included in the DBA Mutation Database.

223 citations


Journal ArticleDOI
TL;DR: Functional OR gene repertoires were reduced independently in the multiple origins of aquatic mammals and were significantly divergent in bats, rejecting recent neutralist views of olfactory subgenome evolution and correlate specific OR gene families with physiological requirements.
Abstract: The ability to smell is governed by the largest gene family in mammalian genomes, the olfactory receptor (OR) genes. Although these genes are well annotated in the finished human and mouse genomes, we still do not understand which receptors bind specific odorants or how they fully function. Previous comparative studies have been taxonomically limited and mostly focused on the percentage of OR pseudogenes within species. No study has investigated the adaptive changes of functional OR gene families across phylogenetically and ecologically diverse mammals. To determine the extent to which OR gene repertoires have been influenced by habitat, sensory specialization, and other ecological traits, to better understand the functional importance of specific OR gene families and thus the odorants they bind, we compared the functional OR gene repertoires from 50 mammalian genomes. We amplified more than 2000 OR genes in aquatic, semi-aquatic, and flying mammals and coupled these data with 48,000 OR genes from mostly terrestrial mammals, extracted from genomic projects. Phylogenomic, Bayesian assignment, and principle component analyses partitioned species by ecotype (aquatic, semi-aquatic, terrestrial, flying) rather than phylogenetic relatedness, and identified OR families important for each habitat. Functional OR gene repertoires were reduced independently in the multiple origins of aquatic mammals and were significantly divergent in bats. We reject recent neutralist views of olfactory subgenome evolution and correlate specific OR gene families with physiological requirements, a preliminary step toward unraveling the relationship between specific odors and respective OR gene families.

Journal ArticleDOI
TL;DR: The results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the host nuclear genome, but suggest that aphids utilize a set of duplicated genes acquired from other bacteria in the context of the BuchnerA–aphid mutualism.
Abstract: Genome reduction is typical of obligate symbionts. In cellular organelles, this reduction partly reflects transfer of ancestral bacterial genes to the host genome, but little is known about gene transfer in other obligate symbioses. Aphids harbor anciently acquired obligate mutualists, Buchnera aphidicola (Gammaproteobacteria), which have highly reduced genomes (420–650 kb), raising the possibility of gene transfer from ancestral Buchnera to the aphid genome. In addition, aphids often harbor other bacteria that also are potential sources of transferred genes. Previous limited sampling of genes expressed in bacteriocytes, the specialized cells that harbor Buchnera, revealed that aphids acquired at least two genes from bacteria. The newly sequenced genome of the pea aphid, Acyrthosiphon pisum, presents the first opportunity for a complete inventory of genes transferred from bacteria to the host genome in the context of an ancient obligate symbiosis. Computational screening of the entire A. pisum genome, followed by phylogenetic and experimental analyses, provided strong support for the transfer of 12 genes or gene fragments from bacteria to the aphid genome: three LD–carboxypeptidases (LdcA1, LdcA2,ψLdcA), five rare lipoprotein As (RlpA1-5), N-acetylmuramoyl-L-alanine amidase (AmiD), 1,4-beta-N-acetylmuramidase (bLys), DNA polymerase III alpha chain (ψDnaE), and ATP synthase delta chain (ψAtpH). Buchnera was the apparent source of two highly truncated pseudogenes (ψDnaE and ψAtpH). Most other transferred genes were closely related to genes from relatives of Wolbachia (Alphaproteobacteria). At least eight of the transferred genes (LdcA1, AmiD, RlpA1-5, bLys) appear to be functional, and expression of seven (LdcA1, AmiD, RlpA1-5) are highly upregulated in bacteriocytes. The LdcAs and RlpAs appear to have been duplicated after transfer. Our results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the host nuclear genome, but suggest that aphids utilize a set of duplicated genes acquired from other bacteria in the context of the Buchnera–aphid mutualism.

Journal ArticleDOI
TL;DR: It is shown that primate V1R decline happened prior to acquisition of trichromatic vision, earlier during evolution than was previously thought, and that it is extremely unlikely that decline of the dog V 1R repertoire occurred in response to selective pressures imposed by humans during domestication.
Abstract: We report an evolutionary analysis of the V1R gene family across 37 mammalian genomes. V1Rs comprise one of three chemosensory receptor families expressed in the vomeronasal organ, and contribute to pheromone detection. We first demonstrate that Trace Archive data can be used effectively to determine V1R family sizes and to obtain sequences of most V1R family members. Analyses of V1R sequences from trace data and genome assemblies show that species-specific expansions previously observed in only eight species were prevalent throughout mammalian evolution, resulting in "semi-private" V1R repertoires for most mammals. The largest families are found in mouse and platypus, whose V1R repertoires have been published previously, followed by mouse lemur and rabbit (approximately 215 and approximately 160 intact V1Rs, respectively). In contrast, two bat species and dolphin possess no functional V1Rs, only pseudogenes, and suffered inactivating mutations in the vomeronasal signal transduction gene Trpc2. We show that primate V1R decline happened prior to acquisition of trichromatic vision, earlier during evolution than was previously thought. We also show that it is extremely unlikely that decline of the dog V1R repertoire occurred in response to selective pressures imposed by humans during domestication. Functional repertoire sizes in each species correlate roughly with anatomical observations of vomeronasal organ size and quality; however, no single ecological correlate explains the very diverse fates of this gene family in different mammalian genomes. V1Rs provide one of the most extreme examples observed to date of massive gene duplication in some genomes, with loss of all functional genes in other species.

Journal ArticleDOI
TL;DR: The information is summarized about the structure and utility of the phylogenetically informative spacer regions of the rDNA, namely internal- and external transcribedSpacer regions as well as the intergenic spacer (IGS).
Abstract: The nuclear ribosomal locus coding for the large subunit is represented in tandem arrays in the plant genome. These consecutive gene blocks, consisting of several regions, are widely applied in plant phylogenetics. The regions coding for the subunits of the rRNA have the lowest rate of evolution. Also the spacer regions like the internal transcribed spacers (ITS) and external transcribed spacers (ETS) are widely utilized in phylogenetics. The fact, that these regions are present in many copies in the plant genome is an advantage for laboratory practice but might be problem for phylogenetic analysis. Beside routine usage, the rDNA regions provide the great potential to study complex evolutionary mechanisms, such as reticulate events or array duplications. The understanding of these processes is based on the observation that the multiple copies of rDNA regions are homogenized through concerted evolution. This phenomenon results to paralogous copies, which can be misleading when incorporated in phylogenetic analyses. The fact that non-functional copies or pseudogenes can coexist with ortholougues in a single individual certainly makes also the analysis difficult. This article summarizes the information about the structure and utility of the phylogenetically informative spacer regions of the rDNA, namely internal- and external transcribed spacer regions as well as the intergenic spacer (IGS).

Journal ArticleDOI
TL;DR: It is found that after their initial formation, the youngest pseudogenes in Salmonella genomes have a very high likelihood of being removed by deleting processes and are eliminated too rapidly to be governed by a strictly neutral model of stochastic loss.
Abstract: Pseudogenes are usually considered to be completely neutral sequences whose evolution is shaped by random mutations and chance events. It is possible, however, for disrupted genes to generate products that are deleterious due either to the energetic costs of their transcription and translation or to the formation of toxic proteins. We found that after their initial formation, the youngest pseudogenes in Salmonella genomes have a very high likelihood of being removed by deletional processes and are eliminated too rapidly to be governed by a strictly neutral model of stochastic loss. Those few highly degraded pseudogenes that have persisted in Salmonella genomes correspond to genes with low expression levels and low connectivity in gene networks, such that their inactivation and any initial deleterious effects associated with their inactivation are buffered. Although pseudogenes have long been considered the paradigm of neutral evolution, the distribution of pseudogenes among Salmonella strains indicates that removal of many of these apparently functionless regions is attributable to positive selection.

Journal ArticleDOI
TL;DR: A new nomenclature system is described that identifies homolog gene families and allocates a unique name for each gene in human, mouse, and rat carboxylesterase genes and serves as a model for naming CES genes from other mammalian species.
Abstract: Mammalian carboxylesterase (CES or Ces) genes encode enzymes that participate in xenobiotic, drug, and lipid metabolism in the body and are members of at least five gene families. Tandem duplications have added more genes for some families, particularly for mouse and rat genomes, which has caused confusion in naming rodent Ces genes. This article describes a new nomenclature system for human, mouse, and rat carboxylesterase genes that identifies homolog gene families and allocates a unique name for each gene. The guidelines of human, mouse, and rat gene nomenclature committees were followed and “CES” (human) and “Ces” (mouse and rat) root symbols were used followed by the family number (e.g., human CES1). Where multiple genes were identified for a family or where a clash occurred with an existing gene name, a letter was added (e.g., human CES4A; mouse and rat Ces1a) that reflected gene relatedness among rodent species (e.g., mouse and rat Ces1a). Pseudogenes were named by adding “P” and a number to the human gene name (e.g., human CES1P1) or by using a new letter followed by ps for mouse and rat Ces pseudogenes (e.g., Ces2d-ps). Gene transcript isoforms were named by adding the GenBank accession ID to the gene symbol (e.g., human CES1_AB119995 or mouse Ces1e_BC019208). This nomenclature improves our understanding of human, mouse, and rat CES/Ces gene families and facilitates research into the structure, function, and evolution of these gene families. It also serves as a model for naming CES genes from other mammalian species.

Journal ArticleDOI
TL;DR: A pipeline to detect human unitary pseudogenes through analyzing the global inventory of orthologs between the human genome and its mammalian relatives is developed and it is shown that for a group of 76, the functional genes appear to be disabled at a fairly uniform rate throughout primate evolution.
Abstract: Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution. We have developed a pipeline to detect human unitary pseudogenes through analyzing the global inventory of orthologs between the human genome and its mammalian relatives. We focus on gene losses along the human lineage after the divergence from rodents about 75 million years ago. In total, we identify 76 unitary pseudogenes, including previously annotated ones, and many novel ones. By comparing each of these to its functioning ortholog in other mammals, we can approximately date the creation of each unitary pseudogene (that is, the gene 'death date') and show that for our group of 76, the functional genes appear to be disabled at a fairly uniform rate throughout primate evolution - not all at once, correlated, for instance, with the 'Alu burst'. Furthermore, we identify 11 unitary pseudogenes that are polymorphic - that is, they have both nonfunctional and functional alleles currently segregating in the human population. Comparing them with their orthologs in other primates, we find that two of them are in fact pseudogenes in non-human primates, suggesting that they represent cases of a gene being resurrected in the human lineage. This analysis of unitary pseudogenes provides insights into the evolutionary constraints faced by different organisms and the timescales of functional gene loss in humans.

Journal ArticleDOI
TL;DR: The zebrafish genome encodes the largest repertoire of functional vertebrate aquaporins with dual paralogy to human isoforms, and a new classification for the piscine aquaporin superfamily is proposed.
Abstract: Aquaporins are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. These proteins are vital for maintaining water homeostasis in living organisms. In mammals, thirteen aquaporins (AQP0-12) have been characterized, but in lower vertebrates, such as fish, the diversity, structure and substrate specificity of these membrane channel proteins are largely unknown. The screening and isolation of transcripts from the zebrafish (Danio rerio) genome revealed eighteen sequences structurally related to the four subfamilies of tetrapod aquaporins, i.e., aquaporins (AQP0, -1 and -4), water and glycerol transporters or aquaglyceroporins (Glps; AQP3 and AQP7-10), a water and urea transporter (AQP8), and two unorthodox aquaporins (AQP11 and -12). Phylogenetic analyses of nucleotide and deduced amino acid sequences demonstrated dual paralogy between teleost and human aquaporins. Three of the duplicated zebrafish isoforms have unlinked loci, two have linked loci, while DrAqp8 was found in triplicate across two chromosomes. Genomic sequencing, structural analysis, and maximum likelihood reconstruction, further revealed the presence of a putative pseudogene that displays hybrid exons similar to tetrapod AQP5 and -1. Ectopic expression of the cloned transcripts in Xenopus laevis oocytes demonstrated that zebrafish aquaporins and Glps transport water or water, glycerol and urea, respectively, whereas DrAqp11b and -12 were not functional in oocytes. Contrary to humans and some rodents, intrachromosomal duplicates of zebrafish AQP8 were water and urea permeable, while the genomic duplicate only transported water. All aquaporin transcripts were expressed in adult tissues and found to have divergent expression patterns. In some tissues, however, redundant expression of transcripts encoding two duplicated paralogs seems to occur. The zebrafish genome encodes the largest repertoire of functional vertebrate aquaporins with dual paralogy to human isoforms. Our data reveal an early and specific diversification of these integral membrane proteins at the root of the crown-clade of Teleostei. Despite the increase in gene copy number, zebrafish aquaporins mostly retain the substrate specificity characteristic of the tetrapod counterparts. Based upon the integration of phylogenetic, genomic and functional data we propose a new classification for the piscine aquaporin superfamily.

Journal ArticleDOI
04 Mar 2010-Nature
TL;DR: Experimental evidence suggests that inactivation of the GAL3 and GAL80 regulatory genes facilitated the origin and long-term maintenance of the two gene network states, and introduces a remarkable type of intraspecific variation that may be widespread.
Abstract: Local adaptations within species are often governed by several interacting genes scattered throughout the genome. Single-locus models of selection cannot explain the maintenance of such complex variation because recombination separates co-adapted alleles. Here we report a previously unrecognized type of intraspecific multi-locus genetic variation that has been maintained over a vast period. The galactose (GAL) utilization gene network of Saccharomyces kudriavzevii, a relative of brewer's yeast, exists in two distinct states: a functional gene network in Portuguese strains and, in Japanese strains, a non-functional gene network of allelic pseudogenes. Genome sequencing of all available S. kudriavzevii strains revealed that none of the functional GAL genes were acquired from other species. Rather, these polymorphisms have been maintained for nearly the entire history of the species, despite more recent gene flow genome-wide. Experimental evidence suggests that inactivation of the GAL3 and GAL80 regulatory genes facilitated the origin and long-term maintenance of the two gene network states. This striking example of a balanced unlinked gene network polymorphism introduces a remarkable type of intraspecific variation that may be widespread.

Journal ArticleDOI
TL;DR: The analyses of the zebra finch MHC suggest a complex history involving chromosomal fission, gene duplication and translocation in the history of the MHC in birds, and highlight striking differences in MHC structure and organization among avian lineages.
Abstract: Background: Due to its high polymorphism and importance for disease resistance, the major histocompatibility complex (MHC) has been an important focus of many vertebrate genome projects. Avian MHC organization is of particular interest because the chicken Gallus gallus, the avian species with the best characterized MHC, possesses a highly streamlined minimal essential MHC, which is linked to resistance against specific pathogens. It remains unclear the extent to which this organization describes the situation in other birds and whether it represents a derived or ancestral condition. The sequencing of the zebra finch Taeniopygia guttata genome, in combination with targeted bacterial artificial chromosome (BAC) sequencing, has allowed us to characterize an MHC from a highly divergent and diverse avian lineage, the passerines. Results: The zebra finch MHC exhibits a complex structure and history involving gene duplication and fragmentation. The zebra finch MHC includes multiple Class I and Class II genes, some of which appear to be pseudogenes, and spans a much more extensive genomic region than the chicken MHC, as evidenced by the presence of MHC genes on each of seven BACs spanning 739 kb. Cytogenetic (FISH) evidence and the genome assembly itself place core MHC genes on as many as four chromosomes with TAP and Class I genes mapping to different chromosomes. MHC Class II regions are further characterized by high endogenous retroviral content. Lastly, we find strong evidence of selection acting on sites within passerine MHC Class I and Class II genes. Conclusion: The zebra finch MHC differs markedly from that of the chicken, the only other bird species with a complete genome sequence. The apparent lack of synteny between TAP and the expressed MHC Class I locus is in fact reminiscent of a pattern seen in some mammalian lineages and may represent convergent evolution. Our analyses of the zebra finch MHC suggest a complex history involving chromosomal fission, gene duplication and translocation in the history of the MHC in birds, and highlight striking differences in MHC structure and organization among avian lineages.

Journal ArticleDOI
TL;DR: Next generation sequencing offers a universal, affordable method for the characterization and, in perspective, genotyping of MHC systems of virtually any complexity and is an effective tool for advancing the understanding of the MHC class II structure and evolutionary patterns in Passeriformes.
Abstract: Because of their functional significance, the Major Histocompatibility Complex (MHC) class I and II genes have been the subject of continuous interest in the fields of ecology, evolution and conservation. In some vertebrate groups MHC consists of multiple loci with similar alleles; therefore, the multiple loci must be genotyped simultaneously. In such complex systems, understanding of the evolutionary patterns and their causes has been limited due to challenges posed by genotyping. Here we used 454 amplicon sequencing to characterize MHC class IIB exon 2 variation in the collared flycatcher, an important organism in evolutionary and immuno-ecological studies. On the basis of over 152,000 sequencing reads we identified 194 putative alleles in 237 individuals. We found an extreme complexity of the MHC class IIB in the collared flycatchers, with our estimates pointing to the presence of at least nine expressed loci and a large, though difficult to estimate precisely, number of pseudogene loci. Many similar alleles occurred in the pseudogenes indicating either a series of recent duplications or extensive concerted evolution. The expressed alleles showed unambiguous signals of historical selection and the occurrence of apparent interlocus exchange of alleles. Placing the collared flycatcher's MHC sequences in the context of passerine diversity revealed transspecific MHC class II evolution within the Muscicapidae family. 454 amplicon sequencing is an effective tool for advancing our understanding of the MHC class II structure and evolutionary patterns in Passeriformes. We found a highly dynamic pattern of evolution of MHC class IIB genes with strong signals of selection and pronounced sequence divergence in expressed genes, in contrast to the apparent sequence homogenization in pseudogenes. We show that next generation sequencing offers a universal, affordable method for the characterization and, in perspective, genotyping of MHC systems of virtually any complexity.

Journal ArticleDOI
TL;DR: This study reports the best evidence to date that multiple mitochondrial genes can be transferred via a single HGT event and that transfer occurred via a strictly DNA-level intermediate, and suggests that transferred genes may be evolutionarily important in generating mitochondrial genetic diversity.
Abstract: Horizontal gene transfer (HGT) is relatively common in plant mitochondrial genomes but the mechanisms, extent and consequences of transfer remain largely unknown. Previous results indicate that parasitic plants are often involved as either transfer donors or recipients, suggesting that direct contact between parasite and host facilitates genetic transfer among plants. In order to uncover the mechanistic details of plant-to-plant HGT, the extent and evolutionary fate of transfer was investigated between two groups: the parasitic genus Cuscuta and a small clade of Plantago species. A broad polymerase chain reaction (PCR) survey of mitochondrial genes revealed that at least three genes (atp1, atp6 and matR) were recently transferred from Cuscuta to Plantago. Quantitative PCR assays show that these three genes have a mitochondrial location in the one species line of Plantago examined. Patterns of sequence evolution suggest that these foreign genes degraded into pseudogenes shortly after transfer and reverse transcription (RT)-PCR analyses demonstrate that none are detectably transcribed. Three cases of gene conversion were detected between native and foreign copies of the atp1 gene. The identical phylogenetic distribution of the three foreign genes within Plantago and the retention of cytidines at ancestral positions of RNA editing indicate that these genes were probably acquired via a single, DNA-mediated transfer event. However, samplings of multiple individuals from two of the three species in the recipient Plantago clade revealed complex and perplexing phylogenetic discrepancies and patterns of sequence divergence for all three of the foreign genes. This study reports the best evidence to date that multiple mitochondrial genes can be transferred via a single HGT event and that transfer occurred via a strictly DNA-level intermediate. The discovery of gene conversion between co-resident foreign and native mitochondrial copies suggests that transferred genes may be evolutionarily important in generating mitochondrial genetic diversity. Finally, the complex relationships within each lineage of transferred genes imply a surprisingly complicated history of these genes in Plantago subsequent to their acquisition via HGT and this history probably involves some combination of additional transfers (including intracellular transfer), gene duplication, differential loss and mutation-rate variation. Unravelling this history will probably require sequencing multiple mitochondrial and nuclear genomes from Plantago. See Commentary: http://www.biomedcentral.com/1741-7007/8/147 .

Journal ArticleDOI
TL;DR: Evidence is provided for a regulatory role of an expressed pseudogene in humans and a novel mechanistic linkage between pseudogene HMGA1-p expression and type 2 diabetes mellitus is established.
Abstract: Pseudogenes are prevalent in the human genome; however, their biological function is relatively unknown. In this study, the high mobility group A1 (HMGA1) pseudogene is shown to destabilize HMGA1 mRNA. These findings have implications for diabetes, as two patients are reported to express high levels of the HMGA1 pseudogene.

Journal ArticleDOI
TL;DR: Novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm are demonstrated and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event.
Abstract: Plant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS) is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding. Two CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp), and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp), were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica) mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes), 22 tRNA genes (including pseudo-tRNAs), and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon. Our results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the CMS genes occurred during the breeding history of rice.

Journal ArticleDOI
TL;DR: In this article, the authors identify and describe the rainbow trout (Oncorhynchus mykiss) TLR7 and TLR8 gene orthologs and their mRNA expression.
Abstract: Induction of the innate immune pathways is critical for early anti-viral defense but there is limited understanding of how teleost fish recognize viral molecules and activate these pathways. In mammals, Toll-like receptors (TLR) 7 and 8 bind single-stranded RNA of viral origin and are activated by synthetic anti-viral imidazoquinoline compounds. Herein, we identify and describe the rainbow trout (Oncorhynchus mykiss) TLR7 and TLR8 gene orthologs and their mRNA expression. Two TLR7/8 loci were identified from a rainbow trout bacterial artificial chromosome (BAC) library using DNA fingerprinting and genetic linkage analyses. Direct sequencing of two representative BACs revealed intact omTLR7 and omTLR8a1 open reading frames (ORFs) located on chromosome 3 and a second locus on chromosome 22 that contains an omTLR8a2 ORF and a putative TLR7 pseudogene. We used the omTLR8a1/2 nomenclature for the two trout TLR8 genes as phylogenetic analysis revealed that they and all the other teleost TLR8 genes sequenced to date are similar to the zebrafish TLR8a, but are distinct from the zebrafish TLR8b. The duplicated trout loci exhibit conserved synteny with other fish genomes extending beyond the tandem of TLR7/8 genes. The trout TLR7 and 8a1/2 genes are composed of a single large exon similar to all other described TLR7/8 genes. The omTLR7 ORF is predicted to encode a 1049 amino acid (aa) protein with 84% similarity to the Fugu TLR7 and a conserved pattern of predicted leucine-rich repeats (LRR). The omTLR8a1 and omTLR8a2 are predicted to encode 1035- and 1034-aa proteins, respectively, and have 86% similarity to each other. omTLR8a1 is likely the ortholog of the only Atlantic salmon TLR8 gene described to date as they have 95% aa sequence similarity. The tissue expression profiles of omTLR7, omTLR8a1 and omTLR8a2 in healthy trout were highest in spleen tissue followed by anterior and then posterior kidney tissues. Rainbow trout anterior kidney leukocytes produced elevated levels of pro-inflammatory and type I interferon cytokines mRNA in response to stimulation with the human TLR7/8 agonist R848 or the TLR3 agonist poly I:C. Only poly I:C-induced IFN2 transcription was significantly suppressed in the presence of chloroquine, a compound known to block endosomal acidification and inhibit endosomal maturation. The effect of chloroquine on R848-induced cytokine expression was equivocal and so it remains questionable whether rainbow trout recognition of R848 requires endosomal maturation. TLR7 and TLR8a1 expression levels in rainbow trout anterior kidney leukocytes were not affected by poly I:C or R848 treatments, but surprisingly, TLR8a2 expression was moderately down-regulated by R848. The down-regulation of omTLR8a2 may imply that this gene has evolved to a new or altered function in rainbow trout, as often occurs when the two duplicated genes remain active.

Journal ArticleDOI
12 Feb 2010-PLOS ONE
TL;DR: Infection by T. cruzi has the unexpected consequence of increasing human genetic diversity, and Chagas disease may be a fortuitous share of negative selection.
Abstract: Interspecies DNA transfer is a major biological process leading to the accumulation of mutations inherited by sexual reproduction among eukaryotes. Lateral DNA transfer events and their inheritance has been challenging to document. In this study we modified a thermal asymmetric interlaced PCR by using additional targeted primers, along with Southern blots, fluorescence techniques, and bioinformatics, to identify lateral DNA transfer events from parasite to host. Instances of naturally occurring human infections by Trypanosoma cruzi are documented, where mitochondrial minicircles integrated mainly into retrotransposable LINE-1 of various chromosomes. The founders of five families show minicircle integrations that were transferred vertically to their progeny. Microhomology end-joining of 6 to 22 AC-rich nucleotide repeats in the minicircles and host DNA mediates foreign DNA integration. Heterogeneous minicircle sequences were distributed randomly among families, with diversity increasing due to subsequent rearrangement of inserted fragments. Mosaic recombination and hitchhiking on retrotransposition events to different loci were more prevalent in germ line as compared to somatic cells. Potential new genes, pseudogenes, and knockouts were identified. A pathway of minicircle integration and maintenance in the host genome is suggested. Thus, infection by T. cruzi has the unexpected consequence of increasing human genetic diversity, and Chagas disease may be a fortuitous share of negative selection. This demonstration of contemporary transfer of eukaryotic DNA to the human genome and its subsequent inheritance by descendants introduces a significant change in the scientific concept of evolutionary biology and medicine.

Journal ArticleDOI
TL;DR: This work examined the evolutionary fate of the motilin (MLN) hormone gene, after the pseudogenization of its specific receptor, MLN receptor (MLNR), on the rodent lineage, and speculated that the MLNR gene became a pseudogene before the divergence of the squirrel and other rodents about 75 mya.
Abstract: Specific interactions among biomolecules drive virtually all cellular functions and underlie phenotypic complexity and diversity. Biomolecules are not isolated particles, but are elements of integrated interaction networks, and play their roles through specific interactions. Simultaneous emergence or loss of multiple interacting partners is unlikely. If one of the interacting partners is lost, then what are the evolutionary consequences for the retained partner? Taking advantages of the availability of the large number of mammalian genome sequences and knowledge of phylogenetic relationships of the species, we examined the evolutionary fate of the motilin (MLN) hormone gene, after the pseudogenization of its specific receptor, MLN receptor (MLNR), on the rodent lineage. We speculate that the MLNR gene became a pseudogene before the divergence of the squirrel and other rodents about 75 mya. The evolutionary consequences for the MLN gene were diverse. While an intact open reading frame for the MLN gene, which appears functional, was preserved in the kangaroo rat, the MLN gene became inactivated independently on the lineages leading to the guinea pig and the common ancestor of the mouse and rat. Gain and loss of specific interactions among biomolecules through the birth and death of genes for biomolecules point to a general evolutionary dynamic: gene birth and death are widespread phenomena in genome evolution, at the genetic level; thus, once mutations arise, a stepwise process of elaboration and optimization ensues, which gradually integrates and orders mutations into a coherent pattern.

Journal ArticleDOI
TL;DR: Analysis of the complete mitochondrial genome from hornwort Phaeoceros laevis indicates that mitochondrial genome evolution in hornworts is less conservative than in liverworts, but has not reached the dynamic level as seen in seed plants.
Abstract: Plants have large and complex mitochondrial genomes in comparison to other eukaryotes. In bryophytes, the mitochondrial genomes exhibit a mixed mode of conservative and dynamic evolution. Here, we sequenced the complete mitochondrial genome from hornwort Phaeoceros laevis, to investigate the level of conservation in mitochondrial genome evolution within hornworts. The circular molecule consists of 209,482 base pairs and represents the largest known mitochondrial genome of bryophytes. It contains 30 protein genes, 3 rRNA genes, and 21 tRNA genes, with 34 cis-spliced group II introns disrupting 16 protein genes. There are 11 pseudogenes in this genome, and nine of them are shared with the other fully sequenced hornwort chondriome from Megaceros aenigmaticus, a distant relative of P. laevis. These pseudogenes were likely formed during an early stage of hornwort evolution. The two hornwort chondriomes differ by four inversions and translocations, seven genes, and four introns in the genome structure and organization. At the sequence level, they are very similar, with the identity values ranging mostly from 80 to 95% in intergenic spacers, introns, and exons. These data indicate that mitochondrial genome evolution in hornworts is less conservative than in liverworts, but has not reached the dynamic level as seen in seed plants.

Journal ArticleDOI
TL;DR: This study investigates the evolutionary history of the multi-member double-homeobox gene family in eutherian mammals and suggests a possible mechanism for the generation of the DUX gene structure.
Abstract: DUX4 is causally involved in the molecular pathogenesis of the neuromuscular disorder facioscapulohumeral muscular dystrophy (FSHD). It has previously been proposed to have arisen by retrotransposition of DUXC, one of four known intron-containing DUX genes. Here, we investigate the evolutionary history of this multi-member double-homeobox gene family in eutherian mammals. Our analysis of the DUX family shows the distribution of different homologues across the mammalian class, including events of secondary loss. Phylogenetic comparison, analysis of gene structures and information from syntenic regions confirm the paralogous relationship of Duxbl and DUXB and characterize their relationship with DUXA and DUXC. We further identify Duxbl pseudogene orthologues in primates. A survey of non-mammalian genomes identified a single-homeobox gene (sDUX) as a likely representative homologue of the mammalian DUX ancestor before the homeobox duplication. Based on the gene structure maps, we suggest a possible mechanism for the generation of the DUX gene structure. Our study underlines how secondary loss of orthologues can obscure the true ancestry of individual gene family members. Their relationships should be considered when interpreting the relevance of functional data from DUX4 homologues such as Dux and Duxbl to FSHD.

Journal ArticleDOI
TL;DR: The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown.
Abstract: Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction.