scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2005"


Journal ArticleDOI
Matthew Berriman1, Elodie Ghedin2, Elodie Ghedin3, Christiane Hertz-Fowler1, Gaëlle Blandin2, Hubert Renauld1, Daniella Castanheira Bartholomeu2, Nicola Lennard1, Elisabet Caler2, N. Hamlin1, Brian J. Haas2, Ulrike Böhme1, Linda Hannick2, Martin Aslett1, Joshua Shallom2, Lucio Marcello4, Lihua Hou2, Bill Wickstead5, U. Cecilia M. Alsmark6, Claire Arrowsmith1, Rebecca Atkin1, Andrew Barron1, Frédéric Bringaud7, Karen Brooks1, Mark Carrington8, Inna Cherevach1, Tracey-Jane Chillingworth1, Carol Churcher1, Louise Clark1, Craig Corton1, Ann Cronin1, Robert L. Davies1, Jonathon Doggett1, Appolinaire Djikeng2, Tamara Feldblyum2, Mark C. Field8, Audrey Fraser1, Ian Goodhead1, Zahra Hance1, David Harper1, Barbara Harris1, Heidi Hauser1, Jessica B. Hostetler2, Al Ivens1, Kay Jagels1, David W. Johnson1, Justin Johnson2, Kristine Jones2, Arnaud Kerhornou1, Hean Koo2, Natasha Larke1, Scott M. Landfear9, Christopher Larkin2, Vanessa Leech8, Alexandra Line1, Angela Lord1, Annette MacLeod4, P. Mooney1, Sharon Moule1, David M. A. Martin10, Gareth W. Morgan11, Karen Mungall1, Halina Norbertczak1, Doug Ormond1, Grace Pai2, Christopher S. Peacock1, Jeremy Peterson2, Michael A. Quail1, Ester Rabbinowitsch1, Marie-Adèle Rajandream1, Chris P Reitter8, Steven L. Salzberg2, Mandy Sanders1, Seth Schobel2, Sarah Sharp1, Mark Simmonds1, Anjana J. Simpson2, Luke J. Tallon2, C. Michael R. Turner4, Andrew Tait4, Adrian Tivey1, Susan Van Aken2, Danielle Walker1, David Wanless2, Shiliang Wang2, Brian White1, Owen White2, Sally Whitehead1, John Woodward1, Jennifer R. Wortman2, Mark Raymond Adams12, T. Martin Embley6, Keith Gull5, Elisabetta Ullu13, J. David Barry4, Alan H. Fairlamb10, Fred R. Opperdoes14, Barclay G. Barrell1, John E. Donelson15, Neil Hall2, Neil Hall16, Claire M. Fraser2, Sara E. Melville8, Najib M. El-Sayed3, Najib M. El-Sayed2 
15 Jul 2005-Science
TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.
Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

1,631 citations


Journal ArticleDOI
Alasdair Ivens1, Christopher S. Peacock1, Elizabeth A. Worthey2, Lee Murphy1, Gautam Aggarwal2, Matthew Berriman1, Ellen Sisk2, Marie-Adèle Rajandream1, Ellen Adlem1, Rita Aert3, Atashi Anupama2, Zina Apostolou, Philip Attipoe2, Nathalie Bason1, Christopher Bauser4, Alfred Beck5, Stephen M. Beverley6, Gabriella Bianchettin7, K. Borzym5, G. Bothe4, Carlo V. Bruschi8, Carlo V. Bruschi7, Matt Collins1, Eithon Cadag2, Laura Ciarloni7, Christine Clayton, Richard M.R. Coulson9, Ann Cronin1, Angela K. Cruz10, Robert L. Davies1, Javier G. De Gaudenzi11, Deborah E. Dobson6, Andreas Duesterhoeft, Gholam Fazelina2, Nigel Fosker1, Alberto C.C. Frasch11, Audrey Fraser1, Monika Fuchs, Claudia Gabel, Arlette Goble1, André Goffeau12, David Harris1, Christiane Hertz-Fowler1, Helmut Hilbert, David Horn13, Yiting Huang2, Sven Klages5, Andrew J Knights1, Michael Kube5, Natasha Larke1, Lyudmila Litvin2, Angela Lord1, Tin Louie2, Marco A. Marra, David Masuy12, Keith R. Matthews14, Shulamit Michaeli, Jeremy C. Mottram15, Silke Müller-Auer, Heather Munden2, Siri Nelson2, Halina Norbertczak1, Karen Oliver1, Susan O'Neil1, Martin Pentony2, Thomas M. Pohl4, Claire Price1, Bénédicte Purnelle12, Michael A. Quail1, Ester Rabbinowitsch1, Richard Reinhardt5, Michael A. Rieger, Joel Rinta2, Johan Robben3, Laura Robertson2, Jeronimo C. Ruiz10, Simon Rutter1, David L. Saunders1, Melanie Schäfer, Jacquie Schein, David C. Schwartz16, Kathy Seeger1, Amber Seyler2, Sarah Sharp1, Heesun Shin, Dhileep Sivam2, Rob Squares1, Steve Squares1, Valentina Tosato7, Christy Vogt2, Guido Volckaert3, Rolf Wambutt, T. Warren1, Holger Wedler, John Woodward1, Shiguo Zhou16, Wolfgang Zimmermann, Deborah F. Smith17, Jenefer M. Blackwell18, Kenneth Stuart2, Kenneth Stuart19, Bart Barrell1, Peter J. Myler2, Peter J. Myler19 
15 Jul 2005-Science
TL;DR: The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Tritryp genomes suggest that the mechanisms regulating RNA polymerase II–directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling.
Abstract: Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.

1,357 citations


Journal ArticleDOI
TL;DR: The complete sequences of Takifugu Toll-like receptor (TLR) loci and gene predictions from many draft genomes enable comprehensive molecular phylogenetic analysis and shows that coincidental evolution plays a minor role in TLR evolution.
Abstract: The complete sequences of Takifugu Toll-like receptor (TLR) loci and gene predictions from many draft genomes enable comprehensive molecular phylogenetic analysis. Strong selective pressure for recognition of and response to pathogen-associated molecular patterns has maintained a largely unchanging TLR recognition in all vertebrates. There are six major families of vertebrate TLRs. This repertoire is distinct from that of invertebrates. TLRs within a family recognize a general class of pathogen-associated molecular patterns. Most vertebrates have exactly one gene ortholog for each TLR family. The family including TLR1 has more species-specific adaptations than other families. A major family including TLR11 is represented in humans only by a pseudogene. Coincidental evolution plays a minor role in TLR evolution. The sequencing phase of this study produced finished genomic sequences for the 12 Takifugu rubripes TLRs. In addition, we have produced >70 gene models, including sequences from the opossum, chicken, frog, dog, sea urchin, and sea squirt.

1,126 citations


Journal ArticleDOI
19 Aug 2005-Science
TL;DR: P. ubique, the first cultured member of the SAR11 clade, has the smallest genome and encodes the smallest number of predicted open reading frames known for a free-living microorganism.
Abstract: The SAR11 clade consists of very small, heterotrophic marine α-proteobacteria that are found throughout the oceans, where they account for about 25% of all microbial cells. Pelagibacter ubique, the first cultured member of this clade, has the smallest genome and encodes the smallest number of predicted open reading frames known for a free-living microorganism. In contrast to parasitic bacteria and archaea with small genomes, P. ubique has complete biosynthetic pathways for all 20 amino acids and all but a few cofactors. P. ubique has no pseudogenes, introns, transposons, extrachromosomal elements, or inteins; few paralogs; and the shortest intergenic spacers yet observed for any cell.

1,056 citations


Journal ArticleDOI
TL;DR: It is proposed that variation in intragenic repeat number provides the functional diversity of cell surface antigens that, in fungi and other pathogens, allows rapid adaptation to the environment and elusion of the host immune system.
Abstract: Tandemly repeated DNA sequences are highly dynamic components of genomes. Most repeats are in intergenic regions, but some are in coding sequences or pseudogenes. In humans, expansion of intragenic triplet repeats is associated with various diseases, including Huntington chorea and fragile X syndrome. The persistence of intragenic repeats in genomes suggests that there is a compensating benefit. Here we show that in the genome of Saccharomyces cerevisiae, most genes containing intragenic repeats encode cell-wall proteins. The repeats trigger frequent recombination events in the gene or between the gene and a pseudogene, causing expansion and contraction in the gene size. This size variation creates quantitative alterations in phenotypes (e.g., adhesion, flocculation or biofilm formation). We propose that variation in intragenic repeat number provides the functional diversity of cell surface antigens that, in fungi and other pathogens, allows rapid adaptation to the environment and elusion of the host immune system.

617 citations


Journal ArticleDOI
TL;DR: Characterization of all ABC transporters from the human genome and from model organisms will lead to additional insights into normal physiology and human disease.
Abstract: The ATP-binding cassette (ABC) superfamily of genes encode membrane proteins that transport a diverse set of substrates across membranes. Mutations in ABC transporters cause or contribute to many different Mendelian and complex disorders including adrenoleukodystrophy, cystic fibrosis, retinal degeneration, hypercholesterolemia, and cholestasis. The genes play important roles in protecting organisms from xenobiotics and transport compounds across the intestine, blood-brain barrier, and the placenta. There are 48 ABC genes in the human genome divided into seven subfamilies based on amino acid sequence similarities and phylogeny. These seven subfamilies are represented in all eukaryotic genomes and are therefore of ancient origin. Sequencing the genomes of numerous vertebrate organisms has allowed the complement of ABC transporters to be characterized and the evolution of the genes to be assessed. Most ABC transporters are conserved in all vertebrates, but there are also several examples of recent duplication and gene loss. For genes with a conserved ortholog, animal models have been identified or developed that can be used to probe the function and regulation of selected genes. Genes that are restricted to a specific group of animals may represent specialized functions that could provide insight into unique biological properties of that organism. Further characterization of all ABC transporters from the human genome and from model organisms will lead to additional insights into normal physiology and human disease.

575 citations


Journal ArticleDOI
01 Dec 2005-Genomics
TL;DR: Phylogenetic analyses based on both nucleotide and protein data demonstrated that HSP90(AA+AB+B) formed a monophyletic clade, whereas TRAP is a relatively distant paralogue of this clade.

346 citations


Journal ArticleDOI
TL;DR: Analysis of the aldehyde dehydrogenase (ALDH) gene superfamily showed that the human genome contains 19 putatively functional genes and three pseudogenes, and the ALDH gene products appear to be multifunctional proteins, possessing both catalytic and non-catalytic properties.
Abstract: The aldehyde dehydrogenase (ALDH) gene superfamily encodes enzymes that are critical for certain life processes and detoxification via the NAD(P)+-dependent oxidation of numerous endogenous and exogenous aldehyde substrates, including pharmaceuticals and environmental pollutants. Analysis of the ALDH gene superfamily in the latest databases showed that the human genome contains 19 putatively functional genes and three pseudogenes. A number of ALDH genes are upregulated as a part of the oxidative stress response and inexplicably overexpressed in various tumours, leading to problems during cancer chemotherapy. Mutations in ALDH genes cause inborn errors of metabolism -- such as the Sjogren - Larsson syndrome, type II hyperprolinaemia and γ-hydroxybutyric aciduria -- and are likely to contribute to several complex diseases, including cancer and Alzheimer's disease. The ALDH gene products appear to be multifunctional proteins, possessing both catalytic and non-catalytic properties.

342 citations


Journal ArticleDOI
TL;DR: Mice and humans must deploy their immune resources against vacuolar pathogens in radically different ways, and the absence of the p47 resistance system in humans suggests that possession of this resistance system carries significant costs that are not outweighed by the benefits.
Abstract: Background: Members of the p47 (immunity-related GTPases (IRG) family) GTPases are essential, interferon-inducible resistance factors in mice that are active against a broad spectrum of important intracellular pathogens. Surprisingly, there are no reports of p47 function in humans. Results: Here we show that the p47 GTPases are represented by 23 genes in the mouse, whereas humans have only a single full-length p47 GTPase and an expressed, truncated presumed pseudogene. The human full-length gene is orthologous to an isolated mouse p47 GTPase that carries no interferon-inducible elements in the promoter of either species and is expressed constitutively in the mature testis of both species. Thus, there is no evidence for a p47 GTPase-based resistance system in humans. Dogs have several interferon-inducible p47s, and so the primate lineage that led to humans appears to have lost an ancient function. Multiple p47 GTPases are also present in the zebrafish, but there is only a tandem p47 gene pair in pufferfish. Conclusion: Mice and humans must deploy their immune resources against vacuolar pathogens in radically different ways. This carries significant implications for the use of the mouse as a model of human infectious disease. The absence of the p47 resistance system in humans suggests that possession of this resistance system carries significant costs that, in the primate lineage that led to humans, are not outweighed by the benefits. The origin of the vertebrate p47 system is obscure.

302 citations


Journal ArticleDOI
TL;DR: Understanding the biological significance of this curious situation may be aided by studying which NK receptor genes are used in other vertebrates, especially in relation to species-specific differences in genes for major histocompatibility complex class I molecules.
Abstract: Many receptors on natural killer (NK) cells recognize major histocompatibility complex class I molecules in order to monitor unhealthy tissues, such as cells infected with viruses, and some tumors. Genes encoding families of NK receptors and related sequences are organized into two main clusters in humans: the natural killer complex on Chromosome 12p13.1, which encodes C-type lectin molecules, and the leukocyte receptor complex on Chromosome 19q13.4, which encodes immunoglobulin superfamily molecules. The composition of these gene clusters differs markedly between closely related species, providing evidence for rapid, lineage-specific expansions or contractions of sets of loci. The choice of NK receptor genes is polarized in the two species most studied, mouse and human. In mouse, the C-type lectin-related Ly49 gene family predominates. Conversely, the single Ly49 sequence is a pseudogene in humans, and the immunoglobulin superfamily KIR gene family is extensive. These different gene sets encode proteins that are comparable in function and genetic diversity, even though they have undergone species-specific expansions. Understanding the biological significance of this curious situation may be aided by studying which NK receptor genes are used in other vertebrates, especially in relation to species-specific differences in genes for major histocompatibility complex class I molecules.

279 citations


Journal ArticleDOI
TL;DR: By sequencing and annotating the complete 1,197,687-bp genome of the St. Maries strain of A. marginale, it is shown that this surface coat is dominated by two families containing immunodominant proteins: the msp2 superfamily and the mSP1 superfamily.
Abstract: The rickettsia Anaplasma marginale is the most prevalent tick-borne livestock pathogen worldwide and is a severe constraint to animal health. A. marginale establishes lifelong persistence in infected ruminants and these animals serve as a reservoir for ticks to acquire and transmit the pathogen. Within the mammalian host, A. marginale generates antigenic variants by changing a surface coat composed of numerous proteins. By sequencing and annotating the complete 1,197,687-bp genome of the St. Maries strain of A. marginale, we show that this surface coat is dominated by two families containing immunodominant proteins: the msp2 superfamily and the msp1 superfamily. Of the 949 annotated coding sequences, just 62 are predicted to be outer membrane proteins, and of these, 49 belong to one of these two superfamilies. The genome contains unusual functional pseudogenes that belong to the msp2 superfamily and play an integral role in surface coat antigenic variation, and are thus distinctly different from pseudogenes described as byproducts of reductive evolution in other Rickettsiales.

Journal ArticleDOI
TL;DR: A new comparative census of members of cytokine and chemokine gene families is made, distinguishing the core set of molecules likely to be common to all higher vertebrates from those particular to these 300 million-year-old lineages.
Abstract: As most mechanisms of adaptive immunity evolved during the divergence of vertebrates, the immune systems of extant vertebrates represent different successful variations on the themes initiated in their earliest common ancestors. The genes involved in elaborating these mechanisms have been subject to exceptional selective pressures in an arms race with highly adaptable pathogens, resulting in highly divergent sequences of orthologous genes and the gain and loss of members of gene families as different species find different solutions to the challenge of infection. Consequently, it has been difficult to transfer to the chicken detailed knowledge of the molecular mechanisms of the mammalian immune system and, thus, to enhance the already significant contribution of chickens toward understanding the evolution of immunity. The availability of the chicken genome sequence provides the opportunity to resolve outstanding questions concerning which molecular components of the immune system are shared between mammals and birds and which represent their unique evolutionary solutions. We have integrated genome data with existing knowledge to make a new comparative census of members of cytokine and chemokine gene families, distinguishing the core set of molecules likely to be common to all higher vertebrates from those particular to these 300 million-year-old lineages. Some differences can be explained by the different architectures of the mammalian and avian immune systems. Chickens lack lymph nodes and also the genes for the lymphotoxins and lymphotoxin receptors. The lack of functional eosinophils correlates with the absence of the eotaxin genes and our previously reported observation that interleukin- 5 (IL-5) is a pseudogene. To summarize, in the chicken genome, we can identify the genes for 23 ILs, 8 type I interferons (IFNs), IFN-gamma, 1 colony-stimulating factor (GM-CSF), 2 of the 3 known transforming growth factors (TGFs), 24 chemokines (1 XCL, 14 CCL, 8 CXCL, and 1 CX3CL), and 10 tumor necrosis factor superfamily (TNFSF) members. Receptor genes present in the genome suggest the likely presence of 2 other ILs, 1 other CSF, and 2 other TNFSF members.

Journal ArticleDOI
TL;DR: It is suggested that the formation of pseudogenes may provide a simple evolutionary pathway that complements gene acquisition to enhance virulence and antimicrobial resistance in S.Choleraesuis.
Abstract: Salmonella enterica serovar Choleraesuis (S.Choleraesuis), a highly invasive serovar among non-typhoidal Salmonella, usually causes sepsis or extra-intestinal focal infections in humans. S.Choleraesuis infections have now become particularly difficult to treat because of the emergence of resistance to multiple antimicrobial agents. The 4.7 Mb genome sequence of a multidrug-resistant S.Choleraesuis strain SC-B67 was determined. Genome wide comparison of three sequenced Salmonella genomes revealed that more deletion events occurred in S.Choleraesuis SC-B67 and S.Typhi CT18 relative to S.Typhimurium LT2. S.Choleraesuis has 151 pseudogenes, which, among the three Salmonella genomes, include the highest percentage of pseudogenes arising from the genes involved in bacterial chemotaxis signal-transduction pathways. Mutations in these genes may increase smooth swimming of the bacteria, potentially allowing more effective interactions with and invasion of host cells to occur. A key regulatory gene of TetR/AcrR family, acrR, was inactivated through the introduction of an internal stop codon resulting in overexpression of AcrAB that appears to be associated with ciprofloxacin resistance. While lateral gene transfer providing basic functions to allow niche expansion in the host and environment is maintained during the evolution of different serovars of Salmonella, genes providing little overall selective benefit may be lost rapidly. Our findings suggest that the formation of pseudogenes may provide a simple evolutionary pathway that complements gene acquisition to enhance virulence and antimicrobial resistance in S.Choleraesuis.

Journal ArticleDOI
TL;DR: The data suggest that the genomic MAI undergoes frequent transposition events, which lead to subsequent deletion by homologous recombination under physiological stress conditions, which can be interpreted in terms of adaptation to physiological stress and might contribute to the genetic plasticity and mobilization of the magnetosome island.
Abstract: Genes involved in magnetite biomineralization are clustered in the genome of the magnetotactic bacterium Magnetospirillum gryphiswaldense. We analyzed a 482-kb genomic fragment, in which we identified an approximately 130-kb region representing a putative genomic “magnetosome island” (MAI). In addition to all known magnetosome genes, the MAI contains genes putatively involved in magnetosome biomineralization and numerous genes with unknown functions, as well as pseudogenes, and it is particularly rich in insertion elements. Substantial sequence polymorphism of clones from different subcultures indicated that this region undergoes frequent rearrangements during serial subcultivation in the laboratory. Spontaneous mutants affected in magnetosome formation arise at a frequency of up to 10−2 after prolonged storage of cells at 4°C or exposure to oxidative stress. All nonmagnetic mutants exhibited extended and multiple deletions in the MAI and had lost either parts of or the entire mms and mam gene clusters encoding magnetosome proteins. The mutations were polymorphic with respect to the sites and extents of deletions, but all mutations were found to be associated with the loss of various copies of insertion elements, as revealed by Southern hybridization and PCR analysis. Insertions and deletions in the MAI were also found in different magnetosome-producing clones, indicating that parts of this region are not essential for the magnetic phenotype. Our data suggest that the genomic MAI undergoes frequent transposition events, which lead to subsequent deletion by homologous recombination under physiological stress conditions. This can be interpreted in terms of adaptation to physiological stress and might contribute to the genetic plasticity and mobilization of the magnetosome island.

Journal ArticleDOI
TL;DR: The sweet-receptor genes of domestic cats as well as those of other members of the Felidae family of obligate carnivores, tiger and cheetah are characterized and it is concluded that cat Tas1r3 is an apparently functional and expressed receptor but thatCat Tas1 r2 is an unexpressed pseudogene.
Abstract: Although domestic cats (Felis silvestris catus) possess an otherwise functional sense of taste, they, unlike most mammals, do not prefer and may be unable to detect the sweetness of sugars. One possible explanation for this behavior is that cats lack the sensory system to taste sugars and therefore are indifferent to them. Drawing on work in mice, demonstrating that alleles of sweet-receptor genes predict low sugar intake, we examined the possibility that genes involved in the initial transduction of sweet perception might account for the indifference to sweet-tasting foods by cats. We characterized the sweet-receptor genes of domestic cats as well as those of other members of the Felidae family of obligate carnivores, tiger and cheetah. Because the mammalian sweet-taste receptor is formed by the dimerization of two proteins (T1R2 and T1R3; gene symbols Tas1r2 and Tas1r3), we identified and sequenced both genes in the cat by screening a feline genomic BAC library and by performing PCR with degenerate primers on cat genomic DNA. Gene expression was assessed by RT-PCR of taste tissue, in situ hybridization, and immunohistochemistry. The cat Tas1r3 gene shows high sequence similarity with functional Tas1r3 genes of other species. Message from Tas1r3 was detected by RT-PCR of taste tissue. In situ hybridization and immunohistochemical studies demonstrate that Tas1r3 is expressed, as expected, in taste buds. However, the cat Tas1r2 gene shows a 247-base pair microdeletion in exon 3 and stop codons in exons 4 and 6. There was no evidence of detectable mRNA from cat Tas1r2 by RT-PCR or in situ hybridization, and no evidence of protein expression by immunohistochemistry. Tas1r2 in tiger and cheetah and in six healthy adult domestic cats all show the similar deletion and stop codons. We conclude that cat Tas1r3 is an apparently functional and expressed receptor but that cat Tas1r2 is an unexpressed pseudogene. A functional sweet-taste receptor heteromer cannot form, and thus the cat lacks the receptor likely necessary for detection of sweet stimuli. This molecular change was very likely an important event in the evolution of the cat's carnivorous behavior.

Journal ArticleDOI
TL;DR: The genome of B. abortus 2308, the virulent prototype biovar 1 strain, and its comparison to the two other human pathogenic Brucella species and to B. suis are presented to suggest adaptation of brucellae to an intracellular life-style.
Abstract: Despite their high DNA identity and a proposal to group classical Brucella species as biovars of Brucella melitensis, the commonly recognized Brucella species can be distinguished by distinct biochemical and fatty acid characters, as well as by a marked host range (e.g., Brucella suis for swine, B. melitensis for sheep and goats, and Brucella abortus for cattle). Here we present the genome of B. abortus 2308, the virulent prototype biovar 1 strain, and its comparison to the two other human pathogenic Brucella species and to B. abortus field isolate 9-941. The global distribution of pseudogenes, deletions, and insertions supports previous indications that B. abortus and B. melitensis share a common ancestor that diverged from B. suis. With the exception of a dozen genes, the genetic complements of both B. abortus strains are identical, whereas the three species differ in gene content and pseudogenes. The pattern of species-specific gene inactivations affecting transcriptional regulators and outer membrane proteins suggests that these inactivations may play an important role in the establishment of host specificity and may have been a primary driver of speciation in the genus Brucella. Despite being nonmotile, the brucellae contain flagellum gene clusters and display species-specific flagellar gene inactivations, which lead to the putative generation of different versions of flagellum-derived structures and may contribute to differences in host specificity and virulence. Metabolic changes such as the lack of complete metabolic pathways for the synthesis of numerous compounds (e.g., glycogen, biotin, NAD, and choline) are consistent with adaptation of brucellae to an intracellular life-style.

Journal ArticleDOI
TL;DR: It is concluded that MULE-mediated host gene duplication results in the formation of pseudogenes, not novel functional protein-coding genes; however, the transcribed duplications possess characteristics consistent with a potential role in the regulation of host gene expression.
Abstract: DNA transposons are known to frequently capture duplicated fragments of host genes The evolutionary impact of this phenomenon depends on how frequently the fragments retain protein-coding function as opposed to becoming pseudogenes Gene fragment duplication by Mutator-like elements (MULEs) has previously been documented in maize, Arabidopsis, and rice Here we present a rigorous genome-wide analysis of MULEs in the model plant Oryza sativa (domesticated rice) We identify 8274 MULEs with intact termini and target-site duplications (TSDs) and show that 1337 of them contain duplicated host gene fragments Through a detailed examination of the 5% of duplicated gene fragments that are transcribed, we demonstrate that virtually all cases contain pseudogenic features such as fragmented conserved protein domains, frameshifts, and premature stop codons In addition, we show that the distribution of the ratio of nonsynonymous to synonymous amino acid substitution rates for the duplications agrees with the expected distribution for pseudogenes We conclude that MULE-mediated host gene duplication results in the formation of pseudogenes, not novel functional protein-coding genes; however, the transcribed duplications possess characteristics consistent with a potential role in the regulation of host gene expression

Journal ArticleDOI
TL;DR: The transcribed processed pseudogene (TPΨg), which is disabled but nonetheless transcribed, is identified and is unlike other PΨgs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPΩgs have potential orthologs in mouse; this latter finding indicates that the vast majority of TPάgs is lineage specific.
Abstract: Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete proteincoding sequences. Processed pseudogenes (PCgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PCgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPCg), which is disabled but nonetheless transcribed. TPCgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPCg. To elucidate their role, we identified human TPCgs by mapping expressed sequences onto PCgs and, reciprocally, extracting TPCgs from known mRNAs. We consider only those PCgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166– 233 TPCg s( � 4–6% of PCgs). Proteins/transcripts with the highest numbers of homologous TPCgs generally have many homologous PCgs and are abundantly expressed. TPCgs are significantly overrepresented near both the 5 0 and 3 0 ends of genes; this suggests that TPCgs can be formed through gene– promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPCgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPCgs are unlike other PCgs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPCgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPCgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/ LINE activity. The list of TPCgs is available at:

Journal ArticleDOI
TL;DR: It is reported that an Oct4 pseudogene localized in human chromosome 10 (Oct 4-pg5) and a pseudogene in chromosome 8 (Oct4-pg1) were transcribed in cancer cell lines as well as cancer tissues tested, and they were not found transcribing in embryonic carcinoma cells, human fibroblasts, and normal tissues tested.

Journal ArticleDOI
TL;DR: The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae, suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species.
Abstract: The obligate intracellular bacterial pathogen Chlamydophila abortus strain S26/3 (formerly the abortion subtype of Chlamydia psittaci) is an important cause of late gestation abortions in ruminants and pigs. Furthermore, although relatively rare, zoonotic infection can result in acute illness and miscarriage in pregnant women. The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae. The 1,144,377-bp genome contains 961 predicted coding sequences, 842 of which are conserved with those of Chlamydophila caviae and Chlamydophila pneumoniae. Within this conserved Cp. abortus core genome we have identified the major regions of variation and have focused our analysis on these loci, several of which were found to encode highly variable protein families, such as TMH/Inc and Pmp families, which are strong candidates for the source of diversity in host tropism and disease causation in this group of organisms. Significantly, Cp. abortus lacks any toxin genes, and also lacks genes involved in tryptophan metabolism and nucleotide salvaging (guaB is present as a pseudogene), suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species.

Journal ArticleDOI
TL;DR: The results demonstrate that the 5S family in filamentous fungi of the subphylum Pezizomycotina is characterized by birth-and-death evolution under strong purifying selection, and suggest that birth and death evolution occurs at different rates in the genera examined.
Abstract: In eukaryotes, the primary components of the ribosome are encoded by multicopy nuclear ribosomal RNA (rRNA) genes: 28/26S, 18S, 5.8S, and 5S. Copies of these genes are typically localized within tandem arrays and homogenized within a genome. As a result, nuclear rRNA gene families have become a paradigm of concerted evolution. In filamentous fungi of the subphylum Pezizomycotina, 5S rRNA genes exist as a large and dispersed multigene family, with between 50 and 100 copies per genome. To determine whether these genes defy the concerted evolution paradigm, we examined the patterns of evolution of these genes by using sequences from the complete genomes of four species. Analyses of these sequences revealed (i) multiple 5S gene types within a genome, (ii) interspecies clustering of gene types, (iii) multiple identical gene types shared among species, (iv) multiple pseudogenes within a genome, and (v) presence/absence variation of individual 5S copies in comparisons of closely related species. These results demonstrate that the 5S family in these species is characterized by birth-and-death evolution under strong purifying selection. Furthermore, our results suggest that birth-and-death evolution occurs at different rates in the genera examined, and that the multiplication and movement of 5S genes across the genome are highly dynamic. As such, we hypothesize that a mechanism resembling retroposition controls 5S rRNA gene amplification, dispersal, and integration in the genomes of filamentous fungi.

Journal ArticleDOI
TL;DR: The complete 1,516,355-bp sequence of the type strain, the stock derived from the South African Welgevonden isolate, is reported, showing a large number of tandemly repeated and duplicated sequences, some of continuously variable copy number, which contributes to the low proportion of coding sequence.
Abstract: Heartwater, a tick-borne disease of domestic and wild ruminants, is caused by the intracellular rickettsia Ehrlichia ruminantium (previously known as Cowdria ruminantium). It is a major constraint to livestock production throughout subSaharan Africa, and it threatens to invade the Americas, yet there is no immediate prospect of an effective vaccine. A shotgun genome sequencing project was undertaken in the expectation that access to the complete protein coding repertoire of the organism will facilitate the search for vaccine candidate genes. We report here the complete 1,516,355-bp sequence of the type strain, the stock derived from the South African Welgevonden isolate. Only 62% of the genome is predicted to be coding sequence, encoding 888 proteins and 41 stable RNA species. The most striking feature is the large number of tandemly repeated and duplicated sequences, some of continuously variable copy number, which contributes to the low proportion of coding sequence. These repeats have mediated numerous translocation and inversion events that have resulted in the duplication and truncation of some genes and have also given rise to new genes. There are 32 predicted pseudogenes, most of which are truncated fragments of genes associated with repeats. Rather then being the result of the reductive evolution seen in other intracellular bacteria, these pseudogenes appear to be the product of ongoing sequence duplication events.

Journal ArticleDOI
01 Sep 2005-Nature
TL;DR: DNA sequences of unique, Y-linked genes in chimpanzee and human, which diverged about six million years ago, are compared to find evidence that in the human lineage, all such genes were conserved through purifying selection.
Abstract: The human Y chromosome, transmitted clonally through males, contains far fewer genes than the sexually recombining autosome from which it evolved. The enormity of this evolutionary decline has led to predictions that the Y chromosome will be completely bereft of functional genes within ten million years. Although recent evidence of gene conversion within massive Y-linked palindromes runs counter to this hypothesis, most unique Y-linked genes are not situated in palindromes and have no gene conversion partners. The 'impending demise' hypothesis thus rests on understanding the degree of conservation of these genes. Here we find, by systematically comparing the DNA sequences of unique, Y-linked genes in chimpanzee and human, which diverged about six million years ago, evidence that in the human lineage, all such genes were conserved through purifying selection. In the chimpanzee lineage, by contrast, several genes have sustained inactivating mutations. Gene decay in the chimpanzee lineage might be a consequence of positive selection focused elsewhere on the Y chromosome and driven by sperm competition.

Journal ArticleDOI
TL;DR: Although many of the comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogene are formed and eliminated relatively rapidly from most bacterial genomes.
Abstract: Pseudogenes are now known to be a regular feature of bacterial genomes and are found in particularly high numbers within the genomes of recently emerged bacterial pathogens. As most pseudogenes are recognized by sequence alignments, we use newly available genomic sequences to identify the pseudogenes in 11 genomes from 4 bacterial genera, each of which contains at least 1 human pathogen. The numbers of pseudogenes range from 27 in Staphylococcus aureus MW2 to 337 in Yersinia pestis CO92 (e.g. 1–8% of the annotated genes in the genome). Most pseudogenes are formed by small frameshifting indels, but because stop codons are A + T-rich, the two low-G + C Gram-positive taxa (Streptococcus and Staphylococcus) have relatively high fractions of pseudogenes generated by nonsense mutations when compared with more G + C-rich genomes. Over half of the pseudogenes are produced from genes whose original functions were annotated as ‘hypothetical’ or ‘unknown’; however, several broadly distributed genes involved in nucleotide processing, repair or replication have become pseudogenes in one of the sequenced Vibrio vulnificus genomes. Although many of our comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogenes are formed and eliminated relatively rapidly from most bacterial genomes.

Journal ArticleDOI
TL;DR: Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome, with special emphasis on the final annotation release (version 5).
Abstract: Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications. Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5). Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.

Journal ArticleDOI
TL;DR: A complete or almost complete list of OR genes in the dog and the rat is established and the sequences of these genes within and between the two species are compared.
Abstract: Dogs and rats have a highly developed capability to detect and identify odorant molecules, even at minute concentrations. Previous analyses have shown that the olfactory receptors (ORs) that specifically bind odorant molecules are encoded by the largest gene family sequenced in mammals so far. We identified five amino acid patterns characteristic of ORs in the recently sequenced boxer dog and brown Norway rat genomes. Using these patterns, we retrieved 1,094 dog genes and 1,493 rat genes from these shotgun sequences. The retrieved sequences constitute the olfactory receptor repertoires of these two animals. Subsets of 20.3% (for the dog) and 19.5% (for the rat) of these genes were annotated as pseudogenes as they had one or several mutations interrupting their open reading frames. We performed phylogenetic studies and organized these two repertoires into classes, families and subfamilies. We have established a complete or almost complete list of OR genes in the dog and the rat and have compared the sequences of these genes within and between the two species. Our results provide insight into the evolutionary development of these genes and the local amplifications that have led to the specific amplification of many subfamilies. We have also compared the human and rat ORs with the human and mouse OR repertoires.

Journal ArticleDOI
01 May 2005-Genetics
TL;DR: The results show that primates have accumulated more pseudogenes than mice after their separation from the common ancestor and that lineage-specific pseudogenization becomes more conspicuous in humans than in nonhuman primates.
Abstract: Since the process of becoming dead genes or pseudogenes (pseudogenization) is irreversible and can occur rather rapidly under certain environmental circumstances, it is one plausible determinant for characterizing species specificity. To test this evolutionary hypothesis, we analyzed the tempo and mode of duplication and pseudogenization of bitter taste receptor (T2R) genes in humans as well as in 12 nonhuman primates. The results show that primates have accumulated more pseudogenes than mice after their separation from the common ancestor and that lineage-specific pseudogenization becomes more conspicuous in humans than in nonhuman primates. Although positive selection has operated on some amino acids in extracellular domains, functional constraints against T2R genes are more relaxed in primates than in mice and this trend has culminated in the rapid deterioration of the bitter-tasting capability in humans. Since T2R molecules play an important role in avoiding generally bitter toxic and harmful substances, substantial modification of the T2R gene repertoire is likely to reflect different responses to changes in the environment and to result from species-specific food preference during primate evolution.

Journal ArticleDOI
01 Sep 2005-Genomics
TL;DR: The first global draft of the V2r gene repertoire is reported, composed of approximately 200 genes and pseudogenes, and opens the door to genomic-level studies of the structure, function, and evolution of this diverse group of sensory receptors.

Journal ArticleDOI
TL;DR: Phylogenetic analyses of OsWAKs, Arabidopsis WAK/WAK-Likes, and barley (Hordeum vulgare) HvWaks show that the OsWak gene family expanded in the rice genome due to lineage-specific expansion of the family in monocots.
Abstract: The wall-associated kinase (WAK) gene family, one of the receptor-like kinase (RLK) gene families in plants, plays important roles in cell expansion, pathogen resistance, and heavy-metal stress tolerance in Arabidopsis (Arabidopsis thaliana). Through a reiterative database search and manual reannotation, we identified 125 OsWAK gene family members from rice (Oryza sativa) japonica cv Nipponbare; 37 (approximately 30%) OsWAKs were corrected/reannotated from earlier automated annotations. Of the 125 OsWAKs, 67 are receptor-like kinases, 28 receptor-like cytoplasmic kinases, 13 receptor-like proteins, 12 short genes, and five pseudogenes. The two-intron gene structure of the Arabidopsis WAK/WAK-Likes is generally conserved in OsWAKs; however, extra/missed introns were observed in some OsWAKs either in extracellular regions or in protein kinase domains. In addition to the 38 OsWAKs with full-length cDNA sequences and the 11 with rice expressed sequence tag sequences, gene expression analyses, using tiling-microarray analysis of the 20 OsWAKs on chromosome 10 and reverse transcription-PCR analysis for five OsWAKs, indicate that the majority of identified OsWAKs are likely expressed in rice. Phylogenetic analyses of OsWAKs, Arabidopsis WAK/WAK-Likes, and barley (Hordeum vulgare) HvWAKs show that the OsWAK gene family expanded in the rice genome due to lineage-specific expansion of the family in monocots. Localized gene duplications appear to be the primary genetic event in OsWAK gene family expansion and the 125 OsWAKs, present on all 12 chromosomes, are mostly clustered.

Journal ArticleDOI
TL;DR: Differential methylation of genic and non-genic sequences was observed in all species tested, from non-vascular to vascular plants, but in some cases, such as wheat and pine, a lower than expected level of enrichment was observed.
Abstract: The hypomethylated fraction of plant genomes is usually enriched in genes and can be selectively cloned using methylation filtration (MF). Therefore, MF has been used as a gene enrichment technology in sorghum and maize, where gene enrichment was proportional to genome size. Here we apply MF to a broad variety of plant species spanning a wide range of genome sizes. Differential methylation of genic and non-genic sequences was observed in all species tested, from non-vascular to vascular plants, but in some cases, such as wheat and pine, a lower than expected level of enrichment was observed. Remarkably, hexaploid wheat and pine show a dramatically large number of gene-like sequences relative to other plants. In hexaploid wheat, this apparent excess of genes may reflect an abundance of methylated pseudogenes, which may thus be more prevalent in recent polyploids.