scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2009"


Journal ArticleDOI
TL;DR: Genome analysis of other epidemic ST313 isolates from Malawi and Kenya provided evidence for microevolution and clonal replacement in the field, including evidence of genome degradation, including pseudogene formation and chromosomal deletions, when compared with other S. Typhimurium genome sequences.
Abstract: Whereas most nontyphoidal Salmonella (NTS) are associated with gastroenteritis, there has been a dramatic increase in reports of NTS-associated invasive disease in sub-Saharan Africa. Salmonella enterica serovar Typhimurium isolates are responsible for a significant proportion of the reported invasive NTS in this region. Multilocus sequence analysis of invasive S. Typhimurium from Malawi and Kenya identified a dominant type, designated ST313, which currently is rarely reported outside of Africa. Whole-genome sequencing of a multiple drug resistant (MDR) ST313 NTS isolate, D23580, identified a distinct prophage repertoire and a composite genetic element encoding MDR genes located on a virulence-associated plasmid. Further, there was evidence of genome degradation, including pseudogene formation and chromosomal deletions, when compared with other S. Typhimurium genome sequences. Some of this genome degradation involved genes previously implicated in virulence of S. Typhimurium or genes for which the orthologs in S. Typhi are either pseudogenes or are absent. Genome analysis of other epidemic ST313 isolates from Malawi and Kenya provided evidence for microevolution and clonal replacement in the field.

504 citations


Journal ArticleDOI
TL;DR: In this article, the effects of salt, osmotic, cold and heat stress as well as application of the hormone abscisic acid (ABA), an important mediator of stress responses, were analyzed in the Arabidopsis thaliana transcriptome.
Abstract: The responses of plants to abiotic stresses are accompanied by massive changes in transcriptome composition. To provide a comprehensive view of stress-induced changes in the Arabidopsis thaliana transcriptome, we have used whole-genome tiling arrays to analyze the effects of salt, osmotic, cold and heat stress as well as application of the hormone abscisic acid (ABA), an important mediator of stress responses. Among annotated genes in the reference strain Columbia we have found many stress-responsive genes, including several transcription factor genes as well as pseudogenes and transposons that have been missed in previous analyses with standard expression arrays. In addition, we report hundreds of newly identified, stress-induced transcribed regions. These often overlap with known, annotated genes. The results are accessible through the Arabidopsis thaliana Tiling Array Express (At-TAX) homepage, which provides convenient tools for displaying expression values of annotated genes, as well as visualization of unannotated transcribed regions along each chromosome.

301 citations


Journal ArticleDOI
TL;DR: High-density, strand-specific cDNA sequencing (ssRNA–seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi and provided a novel and powerful approach to the characterization of the bacterial transcriptome.
Abstract: High-density, strand-specific cDNA sequencing (ssRNA–seq) was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi). By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3′- or 5′-untranslated regions (UTR). An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA–seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA–seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

267 citations


Journal ArticleDOI
TL;DR: Current understanding of gene family evolution is reviewed including methods for inferring copy number change, evidence for adaptive expansion and adaptive contraction of gene families, the origins of new families and deaths of previously established ones, and a perspective on challenges and promising directions for future research are reviewed.
Abstract: One of the unique insights provided by the growing number of fully sequenced genomes is the pervasiveness of gene duplication and gene loss Indeed, several metrics now suggest that rates of gene birth and death per gene are only 10-40% lower than nucleotide substitutions per site, and that per nucleotide, the consequent lineage-specific expansion and contraction of gene families may play at least as large a role in adaptation as changes in orthologous sequences While gene family evolution is pervasive, it may be especially important in our own evolution since it appears that the "revolving door" of gene duplication and loss has undergone multiple accelerations in the lineage leading to humans In this paper, we review current understanding of gene family evolution including: methods for inferring copy number change, evidence for adaptive expansion and adaptive contraction of gene families, the origins of new families and deaths of previously established ones, and finally we conclude with a perspective on challenges and promising directions for future research

246 citations


Journal ArticleDOI
TL;DR: New insights are emerging from genetic analyses of gene expression in cells at rest and following exposure to stimuli, leading to a better understanding of how expression levels of individual genes are regulated and how genes interact with each other.
Abstract: There is extensive natural variation in human gene expression. As quantitative phenotypes, expression levels of genes are heritable. Genetic linkage and association mapping have identified cis- and trans-acting DNA variants that influence expression levels of human genes. New insights into human gene regulation are emerging from genetic analyses of gene expression in cells at rest and following exposure to stimuli. The integration of these genetic mapping results with data from co-expression networks is leading to a better understanding of how expression levels of individual genes are regulated and how genes interact with each other. These findings are important for basic understanding of gene regulation and of diseases that result from disruption of normal gene regulation.

236 citations


Journal ArticleDOI
TL;DR: The observation that the attenuated Dugway isolate has the largest genome with the fewest pseudogenes and IS elements suggests that this isolate's lineage is at an earlier stage of pathoadaptation than the NM, K, and G lineages.
Abstract: Genetically distinct isolates of Coxiella burnetii, the cause of human Q fever, display different phenotypes with respect to in vitro infectivity/cytopathology and pathogenicity for laboratory animals. Moreover, correlations between C. burnetii genomic groups and human disease presentation (acute versus chronic) have been described, suggesting that isolates have distinct virulence characteristics. To provide a more-complete understanding of C. burnetii's genetic diversity, evolution, and pathogenic potential, we deciphered the whole-genome sequences of the K (Q154) and G (Q212) human chronic endocarditis isolates and the naturally attenuated Dugway (5J108-111) rodent isolate. Cross-genome comparisons that included the previously sequenced Nine Mile (NM) reference isolate (RSA493) revealed both novel gene content and disparate collections of pseudogenes that may contribute to isolate virulence and other phenotypes. While C. burnetii genomes are highly syntenous, recombination between abundant insertion sequence (IS) elements has resulted in genome plasticity manifested as chromosomal rearrangement of syntenic blocks and DNA insertions/deletions. The numerous IS elements, genomic rearrangements, and pseudogenes of C. burnetii isolates are consistent with genome structures of other bacterial pathogens that have recently emerged from nonpathogens with expanded niches. The observation that the attenuated Dugway isolate has the largest genome with the fewest pseudogenes and IS elements suggests that this isolate's lineage is at an earlier stage of pathoadaptation than the NM, K, and G lineages.

197 citations


Journal ArticleDOI
TL;DR: In this article, the authors sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence and identified 168 species-specific genes including some hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3.
Abstract: Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.

192 citations


Journal ArticleDOI
01 Aug 2009-Genetics
TL;DR: A genomewide comparison of the major class of resistant gene family, the nucleotide-binding site–leucine-rich repeat (NBS–LRR) genefamily, between 93-11 (indica) and Nipponbare (japonica) with a focus on their pseudogene members found great differences in either constitution or distribution of pseudogenes between the two genomes.
Abstract: Rice blast, caused by Magnaporthe oryzae, is one of the most devastating diseases. The two major subspecies of Asian cultivated rice (Oryza sativa L.), indica and japonica, have shown obvious differences in rice blast resistance, but the genomic basis that underlies the difference is not clear. We performed a genomewide comparison of the major class of resistant gene family, the nucleotide-binding site–leucine-rich repeat (NBS–LRR) gene family, between 93-11 (indica) and Nipponbare (japonica) with a focus on their pseudogene members. We found great differences in either constitution or distribution of pseudogenes between the two genomes. According to this comparison, we designed the PCR-based molecular markers specific to the Nipponbare NBS–LRR pseudogene alleles and used them as cosegregation markers for blast susceptibility in a segregation population from a cross between a rice blast-resistant indica variety and a susceptible japonica variety. Through this approach, we identified a new blast resistance gene, Pid3, in the indica variety, Digu. The allelic Pid3 loci in most of the tested japonica varieties were identified as pseudogenes due to a nonsense mutation at the nucleotide position 2208 starting from the translation initiation site. However, this mutation was not found in any of the tested indica varieties, African cultivated rice varieties, or AA genome-containing wild rice species. These results suggest that the pseudogenization of Pid3 in japonica occurred after the divergence of indica and japonica.

188 citations


Journal ArticleDOI
TL;DR: The position‐sensitive iterative database search program PSI‐BLAST connected the mammalian ARTs with most known bacterial ADP‐ribosylating toxins, suggesting that the two enzyme families that catalyze reversible mono‐ADP‐ ribosylation either were lost from the genomes of these nonchordata eucaryotes or were subject to horizontal gene transfer between kingdoms.
Abstract: ADP-ribosyltransferases including toxins secreted by Vibrio cholera, Pseudomonas aerurginosa, and other pathogenic bacteria inactivate the function of human target proteins by attaching ADP-ribose onto a critical amino acid residue. Cross-species polymerase chain reaction (PCR) and database mining identified the orthologs of these ADP-ribosylating toxins in humans and the mouse. The human genome contains four functional toxin-related ADP-ribosyltransferase genes (ARTs) and two related intron-containing pseudogenes; the mouse has six functional orthologs. The human and mouse ART genes map to chromosomal regions with conserved linkage synteny. The individual ART genes reveal highly restricted expression patterns, which are largely conserved in humans and the mouse. We confirmed the predicted extracellular location of the ART proteins by expressing recombinant ARTs in insect cells. Two human and four mouse ARTs contain the active site motif (R-S-EXE) typical of arginine-specific ADP-ribosyltransferases and exhibit the predicted enzyme activities. Two other human ARTs and their murine orthologues deviate in the active site motif and lack detectable enzyme activity. Conceivably, these ARTs may have acquired a new specificity or function. The position-sensitive iterative database search program PSI-BLAST connected the mammalian ARTs with most known bacterial ADP-ribosylating toxins. In contrast, no related open reading frames occur in the four completed genomes of lower eucaryotes (yeast, worm, fly, and mustard weed). Interestingly, these organisms also lack genes for ADP-ribosylhydrolases, the enzymes that reverse protein ADP-ribosylation. This suggests that the two enzyme families that catalyze reversible mono-ADP-ribosylation either were lost from the genomes of these nonchordata eucaryotes or were subject to horizontal gene transfer between kingdoms.

176 citations


Journal ArticleDOI
TL;DR: Recombination and pseudogene-formation have been important mechanisms of genetic convergence between Paratyphi A and Typhi, with most pseudogenes arising independently after extensive recombination between the serovars.
Abstract: Of the > 2000 serovars of Salmonella enterica subspecies I, most cause self-limiting gastrointestinal disease in a wide range of mammalian hosts. However, S. enterica serovars Typhi and Paratyphi A are restricted to the human host and cause the similar systemic diseases typhoid and paratyphoid fever. Genome sequence similarity between Paratyphi A and Typhi has been attributed to convergent evolution via relatively recent recombination of a quarter of their genomes. The accumulation of pseudogenes is a key feature of these and other host-adapted pathogens, and overlapping pseudogene complements are evident in Paratyphi A and Typhi. We report the 4.5 Mbp genome of a clinical isolate of Paratyphi A, strain AKU_12601, completely sequenced using capillary techniques and subsequently checked using Illumina/Solexa resequencing. Comparison with the published genome of Paratyphi A ATCC9150 revealed the two are collinear and highly similar, with 188 single nucleotide polymorphisms and 39 insertions/deletions. A comparative analysis of pseudogene complements of these and two finished Typhi genomes (CT18, Ty2) identified several pseudogenes that had been overlooked in prior genome annotations of one or both serovars, and identified 66 pseudogenes shared between serovars. By determining whether each shared and serovar-specific pseudogene had been recombined between Paratyphi A and Typhi, we found evidence that most pseudogenes have accumulated after the recombination between serovars. We also divided pseudogenes into relative-time groups: ancestral pseudogenes inherited from a common ancestor, pseudogenes recombined between serovars which likely arose between initial divergence and later recombination, serovar-specific pseudogenes arising after recombination but prior to the last evolutionary bottlenecks in each population, and more recent strain-specific pseudogenes. Recombination and pseudogene-formation have been important mechanisms of genetic convergence between Paratyphi A and Typhi, with most pseudogenes arising independently after extensive recombination between the serovars. The recombination events, along with divergence of and within each serovar, provide a relative time scale for pseudogene-forming mutations, affording rare insights into the progression of functional gene loss associated with host adaptation in Salmonella.

164 citations


Journal ArticleDOI
TL;DR: Results provide new insight into response to iloperidone, developed with the ultimate goal of directing therapy to patients with the highest benefit-to-risk ratio.
Abstract: A whole genome association study was performed in a phase 3 clinical trial conducted to evaluate a novel antipsychotic, iloperidone, administered to treat patients with schizophrenia. Genotypes of 407 patients were analyzed for 334,563 single nucleotide polymorphisms (SNPs). SNPs associated with iloperidone efficacy were identified within the neuronal PAS domain protein 3 gene (NPAS3), close to a translocation breakpoint site previously observed in a family with schizophrenia. Five other loci were identified that include the XK, Kell blood group complex subunit-related family, member 4 gene (XKR4), the tenascin-R gene (TNR), the glutamate receptor, inotropic, AMPA 4 gene (GRIA4), the glial cell line-derived neurotrophic factor receptor-alpha2 gene (GFRA2), and the NUDT9P1 pseudogene located in the chromosomal region of the serotonin receptor 7 gene (HTR7). The study of these polymorphisms and genes may lead to a better understanding of the etiology of schizophrenia and of its treatment. These results provide new insight into response to iloperidone, developed with the ultimate goal of directing therapy to patients with the highest benefit-to-risk ratio.

Journal ArticleDOI
TL;DR: Alsophila spinulosa (Cyatheaceae) is the only tree fern to have a complete chloroplast (cp) genome sequence as mentioned in this paper, which contains a quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp), regions separated by two copies of an inverted repeat (IRs, 24,365 bp).
Abstract: Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae). The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes. By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the polypod fern Adiantum in terms of gene content, gene order and GC content. However, there exist some striking differences between them: the trnR-UCG gene represents a putative molecular apomorphy of tree ferns; and the repeats observed at one inversion endpoint may be a vestige of some unknown rearrangement(s). This work provided fresh insights into the fern cp genome evolution as well as useful data for future phylogenetic studies.

Journal ArticleDOI
TL;DR: The hypotheses that T2R gene repertoires are closely related to the dietary habits of different species and that birth-and-death evolution is associated with adaptations to dietary changes are supported.
Abstract: Sensing bitter tastes is crucial for many animals because it can prevent them from ingesting harmful foods. This process is mainly mediated by the bitter taste receptors (T2R), which are largely expressed in the taste buds. Previous studies have identified some T2R gene repertoires, and marked variation in repertoire size has been noted among species. However, the mechanisms underlying the evolution of vertebrate T2R genes remain poorly understood. To better understand the evolutionary pattern of these genes, we identified 16 T2R gene repertoires based on the high coverage genome sequences of vertebrates and studied the evolutionary changes in the number of T2R genes during birth-and-death evolution using the reconciled-tree method. We found that the number of T2R genes and the fraction of pseudogenes vary extensively among species. Based on the results of phylogenetic analysis, we showed that T2R gene families in teleost fishes are more diverse than those in tetrapods. In addition to the independent gene expansions in teleost fishes, frogs and mammals, lineage-specific gene duplications were also detected in lizards. Furthermore, extensive gains and losses of T2R genes were detected in each lineage during their evolution, resulting in widely differing T2R gene repertoires. These results further support the hypotheses that T2R gene repertoires are closely related to the dietary habits of different species and that birth-and-death evolution is associated with adaptations to dietary changes.

Journal ArticleDOI
TL;DR: Oligonucleotides were designed for a specific amplification of DNA from Bovidae and apparently, the artiodactyle SINEs were established after the divergence leading to the Suidae and Bov Families but before the radiation within these families.
Abstract: DNA sequences from Bovidae (cattle, goats and sheep) in the EMBL nucleotide database contain several short interspersed repeated sequences (SINEs). Three different SINEs have been found: Bov-A2, containing two 115-bp A elements; Bov-tA, a tRNA pseudogene coupled to an A element; and Bov-B of 560 bp or less and partially homologous to the A element. Bov-A2, Bov-tA and Bov-B occupy about 1.8%, 1.6% and 0.5%, respectively, of the bovine genome as represented in the nucleotide database. Apart from a tRNA-like sequence in both Bov-tA and the porcine SINEs, there was no similarity with the porcine SINEs. Apparently, the artiodactyle SINEs were established after the divergence leading to the Suidae and Bovidae but before the radiation within these families. Oligonucleotides were designed for a specific amplification of DNA from Bovidae.

Journal ArticleDOI
TL;DR: This analysis sequenced and compared nine orthologous genomic regions encompassing the Adh1-Adh2 genes (from six diploid genome types) with the rice reference sequence and revealed the architectural complexities and dynamic evolution of this region over the past ∼15 million years.
Abstract: Oryza (23 species; 10 genome types) contains the world's most important food crop - rice. Although the rice genome serves as an essential tool for biological research, little is known about the evolution of the other Oryza genome types. They contain a historical record of genomic changes that led to diversification of this genus around the world as well as an untapped reservoir of agriculturally important traits. To investigate the evolution of the collective Oryza genome, we sequenced and compared nine orthologous genomic regions encompassing the Adh1-Adh2 genes (from six diploid genome types) with the rice reference sequence. Our analysis revealed the architectural complexities and dynamic evolution of this region that have occurred over the past ∼15 million years. Of the 46 intact genes and four pseudogenes in the japonica genome, 38 (76%) fell into eight multigene families. Analysis of the evolutionary history of each family revealed independent and lineage-specific gain and loss of gene family members as frequent causes of synteny disruption. Transposable elements were shown to mediate massive replacement of intergenic space (>95%), gene disruption, and gene/gene fragment movement. Three cases of long-range structural variation (inversions/deletions) spanning several hundred kilobases were identified that contributed significantly to genome diversification.

Journal ArticleDOI
TL;DR: The origin of vertebrate OR genes can be traced back to the common ancestor of all chordate species, but insects, nematodes and echinoderms utilise distinctive families of chemoreceptors, suggesting that chemoreceptor genes have evolved many times independently in animal evolution.
Abstract: Olfaction is essential for the survival of animals. Versatile odour molecules in the environment are received by olfactory receptors (ORs), which form the largest multigene family in vertebrates. Identification of the entire repertories of OR genes using bioinformatics methods from the whole-genome sequences of diverse organisms revealed that the numbers of OR genes vary enormously, ranging from ~1,200 in rats and ~400 in humans to ~150 in zebrafish and ~15 in pufferfish. Most species have a considerable fraction of pseudogenes. Extensive phylogenetic analyses have suggested that the numbers of gene gains and losses are extremely large in the OR gene family, which is a striking example of the birth-and-death evolution. It appears that OR gene repertoires change dynamically, depending on each organism's living environment. For example, higher primates equipped with a well-developed vision system have lost a large number of OR genes. Moreover, two groups of OR genes for detecting airborne odorants greatly expanded after the time of terrestrial adaption in the tetrapod lineage, whereas fishes retain diverse repertoires of genes that were present in aquatic ancestral species. The origin of vertebrate OR genes can be traced back to the common ancestor of all chordate species, but insects, nematodes and echinoderms utilise distinctive families of chemoreceptors, suggesting that chemoreceptor genes have evolved many times independently in animal evolution.

Journal ArticleDOI
TL;DR: New secreted glycoside hydrolases and carbohydrate esterases were identified in the genome, revealing a diverse biomass-degrading enzyme repertoire far greater than previously characterized and elevating the industrial value of this organism.
Abstract: We present here the complete 2.4 Mb genome of the cellulolytic actinobacterial thermophile, Acidothermus cellulolyticus 11B. New secreted glycoside hydrolases and carbohydrate esterases were identified in the genome, revealing a diverse biomass-degrading enzyme repertoire far greater than previously characterized, and significantly elevating the industrial value of this organism. A sizable fraction of these hydrolytic enzymes break down plant cell walls and the remaining either degrade components in fungal cell walls or metabolize storage carbohydrates such as glycogen and trehalose, implicating the relative importance of these different carbon sources. A novel feature of the A. cellulolyticus secreted cellulolytic and xylanolytic enzymes is that they are fused to multiple tandemly arranged carbohydrate binding modules (CBM), from families 2 and 3. Interestingly, CBM3 was found to be always N-terminal to CBM2, suggesting a functional constraint driving this organization. While the catalytic domains of these modular enzymes are either diverse or unrelated, the CBMs were found to be highly conserved in sequence and may suggest selective substrate-binding interactions. For the most part, thermophilic patterns in the genome and proteome of A. cellulolyticus were weak, which may be reflective of the recent evolutionary history of A. cellulolyticus since its divergence from its closest phylogenetic neighbor Frankia, a mesophilic plant endosymbiont and soil dweller. However, ribosomal proteins and non-coding RNAs (rRNA and tRNAs) in A. cellulolyticus showed thermophilic traits suggesting the importance of adaptation of cellular translational machinery to environmental temperature. Elevated occurrence of IVYWREL amino acids in A. cellulolyticus orthologs compared to mesophiles, and inverse preferences for G and A at the first and third codon positions also point to its ongoing thermoadaptation. Additional interesting features in the genome of this cellulolytic, hot-springs dwelling prokaryote include a low occurrence of pseudogenes or mobile genetic elements, an unexpected complement of flagellar genes, and presence of three laterally-acquired genomic islands of likely ecophysiological value.

Journal ArticleDOI
TL;DR: Differential methylation of the CpG island skews RB1 gene expression in favor of the maternal allele, indicating that RB1 is imprinted in the same direction as CDKN1C, which operates upstream of RB1.
Abstract: Genomic imprinting is an epigenetic process leading to parent-of-origin–specific DNA methylation and gene expression. To date, ∼60 imprinted human genes are known. Based on genome-wide methylation analysis of a patient with multiple imprinting defects, we have identified a differentially methylated CpG island in intron 2 of the retinoblastoma (RB1) gene on chromosome 13. The CpG island is part of a 5′-truncated, processed pseudogene derived from the KIAA0649 gene on chromosome 9 and corresponds to two small CpG islands in the open reading frame of the ancestral gene. It is methylated on the maternal chromosome 13 and acts as a weak promoter for an alternative RB1 transcript on the paternal chromosome 13. In four other KIAA0649 pseudogene copies, which are located on chromosome 22, the two CpG islands have deteriorated and the CpG dinucleotides are fully methylated. By analysing allelic RB1 transcript levels in blood cells, as well as in hypermethylated and 5-aza-2′-deoxycytidine–treated lymphoblastoid cells, we have found that differential methylation of the CpG island skews RB1 gene expression in favor of the maternal allele. Thus, RB1 is imprinted in the same direction as CDKN1C, which operates upstream of RB1. The imprinting of two components of the same pathway indicates that there has been strong evolutionary selection for maternal inhibition of cell proliferation.


Journal ArticleDOI
TL;DR: The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters, which utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche.
Abstract: Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain 0140J was undertaken to help elucidate the biology of this effective bovine pathogen. The genome revealed 1,825 predicted coding sequences (CDSs) of which 62 were identified as pseudogenes or gene fragments. Comparisons with related pyogenic streptococci identified a conserved core (40%) of orthologous CDSs. Intriguingly, S. uberis 0140J displayed a lower number of mobile genetic elements when compared with other pyogenic streptococci, however bacteriophage-derived islands and a putative genomic island were identified. Comparative genomics analysis revealed most similarity to the genomes of Streptococcus agalactiae and Streptococcus equi subsp. zooepidemicus. In contrast, streptococcal orthologs were not identified for 11% of the CDSs, indicating either unique retention of ancestral sequence, or acquisition of sequence from alternative sources. Functions including transport, catabolism, regulation and CDSs encoding cell envelope proteins were over-represented in this unique gene set; a limited array of putative virulence CDSs were identified. S. uberis utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche. The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters.

Journal ArticleDOI
TL;DR: It is indicated that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents.
Abstract: Background: The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances. Results: We report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome. Conclusions: Our analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.

Journal ArticleDOI
TL;DR: Preliminary phylogenetic analysis of its 16S rRNA gene and a few other gene segments revealed significant divergence from Mycobacterium leprae, a well-known cause of leprosy, that justifies the status of M. lepromatosis as a new species.
Abstract: Mycobacterium lepromatosis is a newly discovered leprosy-causing organism. Preliminary phylogenetic analysis of its 16S rRNA gene and a few other gene segments revealed significant divergence from Mycobacterium leprae, a well-known cause of leprosy, that justifies the status of M. lepromatosis as a new species. In this study we analyzed the sequences of 20 genes and pseudogenes (22,814 nucleotides). Overall, the level of matching of these sequences with M. leprae sequences was 90.9%, which substantiated the species-level difference; the levels of matching for the 16S rRNA genes and 14 protein-encoding genes were 98.0% and 93.1%, respectively, but the level of matching for five pseudogenes was only 79.1%. Five conserved protein-encoding genes were selected to construct phylogenetic trees and to calculate the numbers of synonymous substitutions (dS values) and nonsynonymous substitutions (dN values) in the two species. Robust phylogenetic trees constructed using concatenated alignment of these genes placed M. lepromatosis and M. leprae in a tight cluster with long terminal branches, implying that the divergence occurred long ago. The dS and dN values were also much higher than those for other closest pairs of mycobacteria. The dS values were 14 to 28% of the dS values for M. leprae and Mycobacterium tuberculosis, a more divergent pair of species. These results thus indicate that M. lepromatosis and M. leprae diverged 10 million years ago. The M. lepromatosis pseudogenes analyzed that were also pseudogenes in M. leprae showed nearly neutral evolution, and their relative ages were similar to those of M. leprae pseudogenes, suggesting that they were pseudogenes before divergence. Taken together, the results described above indicate that M. lepromatosis and M. leprae diverged from a common ancestor after the massive gene inactivation event described previously for M. leprae.

Journal ArticleDOI
TL;DR: Quantitative reverse transcriptase PCR assays were developed to systematically profile all of the rainbow trout IFN transcripts, with high specificity at an individual gene level, in naïve fish and after stimulation with virus or viral-related molecules and will be useful for future studies aimed at identifying which cells produce IFNs at early time points after infection.

Journal ArticleDOI
TL;DR: In silico analysis of the bovine genome, 18 distinct PAG genes and 14 pseudogenes were identified and it was discovered that boPAG-2 is the most abundant of all boPAg transcripts and provided evidence for the role of ETS- and DDVL-related TFs in its regulation.
Abstract: The Pregnancy-associated glycoproteins (PAGs) belong to a large family of aspartic peptidases expressed exclusively in the placenta of species in the Artiodactyla order. In cattle, the PAG gene family is comprised of at least 22 transcribed genes, as well as some variants. Phylogenetic analyses have shown that the PAG family segregates into 'ancient' and 'modern' groupings. Along with sequence differences between family members, there are clear distinctions in their spatio-temporal distribution and in their relative level of expression. In this report, 1) we performed an in silico analysis of the bovine genome to further characterize the PAG gene family, 2) we scrutinized proximal promoter sequences of the PAG genes to evaluate the evolution pressures operating on them and to identify putative regulatory regions, 3) we determined relative transcript abundance of selected PAGs during pregnancy and, 4) we performed preliminary characterization of the putative regulatory elements for one of the candidate PAGs, bovine (bo) PAG-2. From our analysis of the bovine genome, we identified 18 distinct PAG genes and 14 pseudogenes. We observed that the first 500 base pairs upstream of the translational start site contained multiple regions that are conserved among all boPAGs. However, a preponderance of conserved regions, that harbor recognition sites for putative transcriptional factors (TFs), were found to be unique to the modern boPAG grouping, but not the ancient boPAGs. We gathered evidence by means of Q-PCR and screening of EST databases to show that boPAG-2 is the most abundant of all boPAG transcripts. Finally, we provided preliminary evidence for the role of ETS- and DDVL-related TFs in the regulation of the boPAG-2 gene. PAGs represent a relatively large gene family in the bovine genome. The proximal promoter regions of these genes display differences in putative TF binding sites, likely contributing to observed differences in spatial and temporal expression. We also discovered that boPAG-2 is the most abundant of all boPAG transcripts and provided evidence for the role of ETS and DDVL TFs in its regulation. These experiments mark the crucial first step in discerning the complex transcriptional regulation operating within the boPAG gene family.

Journal ArticleDOI
TL;DR: It is shown that the three Tetrahymena Piwi family proteins (Twis) preferentially expressed in growing cells differ in their genetic essentiality and subcellular localization, and Affinity purification of all eight distinct Twi proteins revealed unique properties of their bound sRNAs.
Abstract: PAZ/PIWI domain (PPD) proteins carrying small RNAs (sRNAs) function in gene and genome regulation. The ciliate Tetrahymena thermophila encodes numerous PPD proteins exclusively of the Piwi clade. We show that the three Tetrahymena Piwi family proteins (Twis) preferentially expressed in growing cells differ in their genetic essentiality and subcellular localization. Affinity purification of all eight distinct Twi proteins revealed unique properties of their bound sRNAs. Deep sequencing of Twi-bound and total sRNAs in strains disrupted for various silencing machinery uncovered an unanticipated diversity of 23- to 24-nt sRNA classes in growing cells, each with distinct genetic requirements for accumulation. Altogether, Twis distinguish sRNAs derived from loci of pseudogene families, three types of DNA repeats, structured RNAs, and EST-supported loci with convergent or paralogous transcripts. Most surprisingly, Twi7 binds complementary strands of unequal length, while Twi10 binds a specific permutation of the guanosine-rich telomeric repeat. These studies greatly expand the structural and functional repertoire of endogenous sRNAs and RNPs.

Journal ArticleDOI
TL;DR: The principle aim of this paper is to identify the complete CYPome of Aspergillus nidulans from the genome sequence version AN.3 deposited at the Broad institute, assign the appropriate CYP nomenclature and define function where possible.

Journal ArticleDOI
20 Mar 2009-PLOS ONE
TL;DR: This model suggests that the gene duplications and single exon retrotransposons of mammalian type-III interferons are positively selected for within a genome, and could be responsible for the generation of the IL10 family and also the singleExon type-I interferONS.
Abstract: Background Type-I interferons, type-II interferons, and the IL-10 family are helical cytokines with similar three-dimensional folds. However, their homologous relationship is difficult to detect on the basis of sequence alone. We have previously described the discovery of the human type-III interferons (IFN lambda-1, -2, -3 or IL-29, IL-28A, IL-28B), which required a combination of manual and computational techniques applied to predicted protein sequences. Principal Findings Here we describe how the use of gene structure analysis and comparative genomics enabled a more extensive understanding of these genes early in the discovery process. More recently, additional mammalian genome sequences have shown that there are between one and potentially nine copies of interferon lambda genes in each genome, and that several species have single exon versions of the interferon lambda gene. Significance The variable number of single exon type-I interferons in mammals, along with recently identified genes in zebrafish homologous to interferons allows a story of interferon evolution to be proposed. This model suggests that the gene duplications and single exon retrotransposons of mammalian type-III interferons are positively selected for within a genome. These characteristics are also shared with the fish interferons and could be responsible for the generation of the IL10 family and also the single exon type-I interferons.

Journal ArticleDOI
TL;DR: The results presented here indicate that Lig diversity has important ramifications for the selection of Lig polypeptides for use in diagnosis and as vaccine candidates and this sequence information will aid the identification of highly conserved regions within the Lig proteins and improve upon the performance characteristics of the Lg proteins in diagnostic assays and in subunit vaccine formulations.

Journal ArticleDOI
TL;DR: Heterogeneity in unit structure may reflect ongoing homogenisation of variant unit types without fixation for any particular variant.
Abstract: Typically in plants, the 5S and 35S ribosomal DNA (rDNA) encoding two major ribosomal RNA species occur at separate loci. However, in some algae, bryophytes and ferns, they are at the same locus (linked arranged). Southern blot hybridisation, polymerase chain reactions (PCR), fluorescent in situ hybridisation, cloning and sequencing were used to reveal 5S and 35S rDNA genomic organisation in Artemisia. We observed thousands of rDNA units at two-three loci containing 5S rDNA in an inverted orientation within the inter-genic spacer (IGS) of 35S rDNA. The sequenced clones of 26-18S IGS from Artemisia absinthium appeared to contain a conserved 5S gene insertion proximal to the 26S gene terminus (5S rDNA-1) and a second less conserved 5S insertion (5S rDNA-2) further downstream. Whilst the 5S rDNA-1 showed all the structural features of a functional gene, the 5S-rDNA-2 had a deletion in the internal promoter and probably represents a pseudogene. The linked arrangement probably evolved before the divergence of Artemisia from the rest of Asteraceae (>10 Myrs). This arrangement may have involved retrotransposons and once formed spread via mechanisms of concerted evolution. Heterogeneity in unit structure may reflect ongoing homogenisation of variant unit types without fixation for any particular variant.

Journal ArticleDOI
TL;DR: The results show that pseudogenes derived from protein-coding genes are prevalent in the rice genome, and a subset of them are strong candidates for producing small RNAs with novel regulatory roles.
Abstract: Pseudogenes are significant components of eukaryotic genomes, and some have acquired novel regulatory roles. To date, no study has characterized rice pseudogenes systematically or addressed their impact on the structure and function of the rice genome. In this genome-wide study, we have identified 11,956 non-transposon-related rice pseudogenes, most of which are from gene duplications. About 12% of the rice protein-coding genes, half of which are in singleton families, have a pseudogene paralog. Interestingly, we found that 145 of these pseudogenes potentially gave rise to antisense small RNAs after examining ∼1.5 million small RNAs from developing rice grains. The majority (>50%) of these antisense RNAs are 24-nucleotides long, a feature often seen in plant repeat-associated small interfering RNAs (siRNAs) produced by RNA-dependent RNA polymerase (RDR2) and Dicer-like protein 3 (DCL3), suggesting that some pseudogene-derived siRNAs may be implicated in repressing pseudogene transcription (i.e., cis-acting). Multiple lines of evidence, however, indicate that small RNAs from rice pseudogenes might also function as natural antisense siRNAs either by interacting with the complementary sense RNAs from functional parental genes (38 cases) or by forming double-strand RNAs with transcripts of adjacent paralogous pseudogenes (2 cases) (i.e., trans-acting). Further examinations of five additional small RNA libraries revealed that pseudogene-derived antisense siRNAs could be produced in specific rice developmental stages or physiological growth conditions, suggesting their potentially important roles in normal rice development. In summary, our results show that pseudogenes derived from protein-coding genes are prevalent in the rice genome, and a subset of them are strong candidates for producing small RNAs with novel regulatory roles. Our findings suggest that pseudogenes of exapted functions may be a phenomenon ubiquitous in eukaryotic organisms.