scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2004"


Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,989 citations


Journal ArticleDOI
TL;DR: The cytochrome P450 (CYP) gene superfamily is summarized and complete identification of all pseudogene sequences is likely to be clinically important, because some of these highly similar exons can interfere with PCR-based genotyping assays.
Abstract: ObjectivesCompletion of both the mouse and human genome sequences in the private and public sectors has prompted comparison between the two species at multiple levels. This review summarizes the cytochrome P450 (CYP) gene superfamily. For the first time, we have the ability to compare complete sets

971 citations


Journal ArticleDOI
TL;DR: This work analyzed the structural variations, which are the basis of functional diversification, as well as the genomic organization of the S100 family in human and compared it with the S 100 repertoires in mouse and rat, and identified evolutionary related subgroups of S100 proteins within the three species.

824 citations


Journal ArticleDOI
TL;DR: The complete genomic sequence of Y. pseudotuberculosis IP32953 is reported and provides a sobering example of how a highly virulent epidemic clone can suddenly emerge from a less virulent, closely related progenitor.
Abstract: Yersinia pestis, the causative agent of plague, is a highly uniform clone that diverged recently from the enteric pathogen Yersinia pseudotuberculosis. Despite their close genetic relationship, they differ radically in their pathogenicity and transmission. Here, we report the complete genomic sequence of Y. pseudotuberculosis IP32953 and its use for detailed genome comparisons with available Y. pestis sequences. Analyses of identified differences across a panel of Yersinia isolates from around the world reveal 32 Y. pestis chromosomal genes that, together with the two Y. pestis-specific plasmids, to our knowledge, represent the only new genetic material in Y. pestis acquired since the the divergence from Y. pseudotuberculosis. In contrast, 149 other pseudogenes (doubling the previous estimate) and 317 genes absent from Y. pestis were detected, indicating that as many as 13% of Y. pseudotuberculosis genes no longer function in Y. pestis. Extensive insertion sequence-mediated genome rearrangements and reductive evolution through massive gene loss, resulting in elimination and modification of preexisting gene expression pathways, appear to be more important than acquisition of genes in the evolution of Y. pestis. These results provide a sobering example of how a highly virulent epidemic clone can suddenly emerge from a less virulent, closely related progenitor.

599 citations


Journal ArticleDOI
TL;DR: Type of odorant structures that may be recognized by some subfam families were predicted by identifying subfamilies that contain ORs with known odor ligands or human homologs of such ORs, and most subfam Families are encoded by a single chromosomal locus.
Abstract: Humans perceive an immense variety of chemicals as having distinct odors. Odor perception initiates in the nose, where odorants are detected by a large family of olfactory receptors (ORs). ORs have diverse protein sequences but can be assigned to subfamilies on the basis of sequence relationships. Members of the same subfamily have related sequences and are likely to recognize structurally related odorants. To gain insight into the mechanisms underlying odor perception, we analyzed the human OR gene family. By searching the human genome database, we identified 339 intact OR genes and 297 OR pseudogenes. Determination of their genomic locations showed that OR genes are unevenly distributed among 51 different loci on 21 human chromosomes. Sequence comparisons showed that the human OR family is composed of 172 subfamilies. Types of odorant structures that may be recognized by some subfamilies were predicted by identifying subfamilies that contain ORs with known odor ligands or human homologs of such ORs. Analysis of the chromosomal locations of members of each OR subfamily revealed that most subfamilies are encoded by a single chromosomal locus. Moreover, many loci encode only one or a few subfamilies, suggesting that different parts of the genome may, to some extent, be involved in the detection of different types of odorant structural motifs.

551 citations


Journal ArticleDOI
TL;DR: Analysis of 13 eukaryotic species with sequenced mitochondrial and nuclear genomes reveals a large interspecific variation of NUMT number and size.
Abstract: Mitochondrial DNA sequences are frequently transferred to the nucleus giving rise to the so-called nuclear mitochondrial DNA (NUMT). Analysis of 13 eukaryotic species with sequenced mitochondrial and nuclear genomes reveals a large interspecific variation of NUMT number and size. Copy number ranges from none or few copies in Anopheles, Caenorhabditis, Plasmodium, Drosophila, and Fugu to more than 500 in human, rice, and Arabidopsis. The average size is between 62 (baker's yeast) and 647 bps (Neurospora), respectively. A correlation between the abundance of NUMTs and the size of the nuclear or the mitochondrial genomes, or of the nuclear gene density, is not evident. Other factors, such as the number and/or stability of mitochondria in the germline, or species-specific mechanisms controlling accumulation/loss of nuclear DNA, might be responsible for the interspecific diversity in NUMT accumulation.

450 citations


Journal ArticleDOI
TL;DR: This work estimates the proportion of OR pseudogenes in 19 primate species by surveying randomly chosen subsets of 100 OR genes from each species and finds that apes, Old World monkeys and one New World monkey, the howler monkey, have a significantly higher proportion ofORS than other New World monkeys or the lemur.
Abstract: Olfactory receptor (OR) genes constitute the molecular basis for the sense of smell and are encoded by the largest gene family in mammalian genomes. Previous studies suggested that the proportion of pseudogenes in the OR gene family is significantly larger in humans than in other apes and significantly larger in apes than in the mouse. To investigate the process of degeneration of the olfactory repertoire in primates, we estimated the proportion of OR pseudogenes in 19 primate species by surveying randomly chosen subsets of 100 OR genes from each species. We find that apes, Old World monkeys and one New World monkey, the howler monkey, have a significantly higher proportion of OR pseudogenes than do other New World monkeys or the lemur (a prosimian). Strikingly, the howler monkey is also the only New World monkey to possess full trichromatic vision, along with Old World monkeys and apes. Our findings suggest that the deterioration of the olfactory repertoire occurred concomitant with the acquisition of full trichromatic color vision in primates.

431 citations


Journal ArticleDOI
TL;DR: In this paper, data mining methods have been used to identify 356 Cyt P450 genes and 99 related pseudogenes in the rice (Oryza sativa) genome using sequence information available from both the indica and japonica strains.
Abstract: Data mining methods have been used to identify 356 Cyt P450 genes and 99 related pseudogenes in the rice (Oryza sativa) genome using sequence information available from both the indica and japonica strains. Because neither of these genomes is completely available, some genes have been identified in only one strain, and 28 genes remain incomplete. Comparison of these rice genes with the 246 P450 genes and 26 pseudogenes in the Arabidopsis genome has indicated that most of the known plant P450 families existed before the monocot-dicot divergence that occurred approximately 200 million years ago. Comparative analysis of P450s in the Pinus expressed sequence tag collections has identified P450 families that predated the separation of gymnosperms and flowering plants. Complete mapping of all available plant P450s onto the Deep Green consensus plant phylogeny highlights certain lineage-specific families maintained (CYP80 in Ranunculales) and lineage-specific families lost (CYP92 in Arabidopsis) in the course of evolution.

426 citations


Journal ArticleDOI
TL;DR: This analysis of the mouse OR gene family suggests that humans and mice recognize many of the same odorant structural motifs, but mice may be superior in odor sensitivity and discrimination.
Abstract: In mammals, odor detection in the nose is mediated by a diverse family of olfactory receptors (ORs), which are used combinatorially to detect different odorants and encode their identities. The OR family can be divided into subfamilies whose members are highly related and are likely to recognize structurally related odorants. To gain further insight into the mechanisms underlying odor detection, we analyzed the mouse OR gene family. Exhaustive searches of a mouse genome database identified 913 intact OR genes and 296 OR pseudogenes. These genes were localized to 51 different loci on 17 chromosomes. Sequence comparisons showed that the mouse OR family contains 241 subfamilies. Subfamily sizes vary extensively, suggesting that some classes of odorants may be more easily detected or discriminated than others. Determination of subfamilies that contain ORs with identified ligands allowed tentative functional predictions for 19 subfamilies. Analysis of the chromosomal locations of members of each subfamily showed that many OR gene loci encode only one or a few subfamilies. Furthermore, most subfamilies are encoded by a single locus, suggesting that different loci may encode receptors for different types of odorant structural features. Comparison of human and mouse OR subfamilies showed that the two species have many, but not all, subfamilies in common. However, mouse subfamilies are usually larger than their human counterparts. This finding suggests that humans and mice recognize many of the same odorant structural motifs, but mice may be superior in odor sensitivity and discrimination.

361 citations


Journal ArticleDOI
TL;DR: The results suggest that the majority of expression differences observed between species are selectively neutral or nearly neutral and likely to be of little or no functional significance, which should be based on null hypotheses assuming functional neutrality.
Abstract: Microarray technologies allow the identification of large numbers of expression differences within and between species. Although environmental and physiological stimuli are clearly responsible for changes in the expression levels of many genes, it is not known whether the majority of changes of gene expression fixed during evolution between species and between various tissues within a species are caused by Darwinian selection or by stochastic processes. We find the following: (1) expression differences between species accumulate approximately linearly with time; (2) gene expression variation among individuals within a species correlates positively with expression divergence between species; (3) rates of expression divergence between species do not differ significantly between intact genes and expressed pseudogenes; (4) expression differences between brain regions within a species have accumulated approximately linearly with time since these regions emerged during evolution. These results suggest that the majority of expression differences observed between species are selectively neutral or nearly neutral and likely to be of little or no functional significance. Therefore, the identification of gene expression differences between species fixed by selection should be based on null hypotheses assuming functional neutrality. Furthermore, it may be possible to apply a molecular clock based on expression differences to infer the evolutionary history of tissues.

357 citations


Journal ArticleDOI
05 Aug 2004-Nature
TL;DR: A direct and unbiased estimate of the nuclear mutation rate and its molecular spectrum is provided with a set of C. elegans mutation-accumulation lines that reveal a mutation rate about tenfold higher than previous indirect estimates and an excess of insertions over deletions.
Abstract: Mutations have pivotal functions in the onset of genetic diseases and are the fundamental substrate for evolution. However, present estimates of the spontaneous mutation rate and spectrum are derived from indirect and biased measurements. For instance, mutation rate estimates for Caenorhabditis elegans are extrapolated from observations on a few genetic loci with visible phenotypes and vary over an order of magnitude1. Alternative approaches in mammals, relying on phylogenetic comparisons of pseudogene loci2 and fourfold degenerate codon positions3, suffer from uncertainties in the actual number of generations separating the compared species and the inability to exclude biases associated with natural selection. Here we provide a direct and unbiased estimate of the nuclear mutation rate and its molecular spectrum with a set of C. elegans mutation-accumulation lines that reveal a mutation rate about tenfold higher than previous indirect estimates and an excess of insertions over deletions. Because deletions dominate patterns of C. elegans pseudogene variation4,5, our observations indicate that natural selection might be significant in promoting small genome size, and challenge the prevalent assumption that pseudogene divergence accurately reflects the spontaneous mutation spectrum.

Journal ArticleDOI
TL;DR: The human major histocompatibility genomic region at chromosomal position 6p21 encodes the six classical transplantation HLA genes and many other genes that have important roles in the regulation of the immune system as well as in some fundamental cellular processes.
Abstract: The human major histocompatibility (MHC) genomic region at chromosomal position 6p21 encodes the six classical transplantation HLA genes and many other genes that have important roles in the regulation of the immune system as well as in some fundamental cellular processes. This small segment of the human genome has been associated with more than 100 diseases, including common diseases--such as diabetes, rheumatoid arthritis, psoriasis, asthma and various autoimmune disorders. The MHC 3.6 Mb genomic sequence was first reported in 1999 with the annotation of 224 gene loci. The locus and allelic information of the MHC continue to be updated by identifying newly mapped expressed genes and pseudogenes based on comparative genomics, SNP analysis and cDNA projects. Since 1999, new innovations in bioinformatics and gene-specific functional databases and studies on the MHC genes have resulted in numerous changes to gene names and better ways to update and link the MHC gene symbols, names and sequences together with function, variation and disease associations. In this study, we present a brief overview of the MHC genomic structure and the recent information that we have gathered on the MHC gene loci via LocusLink at the National Centre for Biological Information (http://www.ncbi.nih.gov/.) and the MHC genes' association with various diseases taken from publications and records in public databases, such as the Online Mendelian Inheritance in Man and the Genetic Association Database.

Journal ArticleDOI
Panos Deloukas1, M Earthrowl1, Darren Grafham1, Marc Rubenfield, Lisa French1, Charles A. Steward1, Sarah Sims1, Matthew Jones1, S. Searle1, Carol Scott1, Kerstin Howe1, Sarah E. Hunt1, T D Andrews1, James G. R. Gilbert1, David Swarbreck1, Jennifer L. Ashurst1, A Taylor1, J Battles, Christine P. Bird1, R Ainscough1, J P Almeida1, R I S Ashwell1, K D Ambrose1, A K Babbage1, C L Bagguley1, J Bailey1, Ruby Banerjee1, K Bates1, Helen Beasley1, S Bray-Allen1, A J Brown1, J Y Brown1, D C Burford1, W Burrill1, John Burton1, Patrick Cahill, D Camire, Nigel P. Carter1, J C Chapman1, S Y Clark1, G Clarke1, C M Clee1, S. M. Clegg1, N Corby1, Alan Coulson1, Pawandeep Dhami1, I Dutta1, Matthew Dunn1, L M Faulkner1, Adam Frankish1, J Frankland1, P Garner1, J Garnett1, Susan M. Gribble1, C Griffiths1, Russell J. Grocock1, Erik Gustafson, S Hammond1, Joanna Harley1, E. Hart1, Paul Heath1, T P Ho, B Hopkins1, J Horne, Philip Howden1, Elizabeth J. Huckle1, C Hynds, Chris Johnson1, David W. Johnson1, A Kana, M. Kay1, A M Kimberley1, J K Kershaw1, M Kokkinaki2, Gavin K. Laird1, S Lawlor1, H M Lee, Daniel Leongamornlert1, G Laird1, Christine Lloyd1, D. M. Lloyd1, Jane E. Loveland1, J Lovell1, Stuart McLaren1, Kirsten McLay1, Amanda McMurray1, M Mashreghi-Mohammadi1, Lucy Matthews1, Sarah Milne1, T Nickerson1, M Nguyen, E K Overton-Larty1, Sophie Palmer1, A. V. Pearce1, A I Peck1, Sarah Pelan1, Benjamin Phillimore1, K M Porter1, Catherine M. Rice1, A Rogosin, Mark T. Ross1, Theologia Sarafidou2, Harminder Sehra1, Ratna Shownkeen1, C. D. Skuce1, Michelle Smith1, L Standring, N Sycamore1, J Tester1, A Thorpe1, W Torcasso, Alan Tracey1, A Tromans1, J Tsolas, Melanie M. Wall1, J Walsh, H Wang, Keith Weinstock, Anthony P. West1, David Willey1, S. Whitehead1, Laurens G. Wilming1, Paul Wray1, L Young1, Yuan Chen3, Ruth C. Lovering4, Nicholas K. Moschonas2, Reiner Siebert5, Kim Fechtel, David Bentley1, Richard Durbin1, Tim Hubbard1, Lynn Doucette-Stamm, Stephan Beck1, Douglas Smith, Jane Rogers1 
27 May 2004-Nature
TL;DR: Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.
Abstract: Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.

Journal ArticleDOI
TL;DR: Comparisons of genomic features with those of closely related bacteria retaining free-living stages indicate that rapid evolutionary change often occurs immediately after host restriction, which represents a general syndrome of genome evolution.

Journal ArticleDOI
TL;DR: FusionDB constitutes a resource dedicated to in-depth analysis of bacterial and archaeal gene fusion events, which can provide the 'Rosetta stone' in the search for potential protein-protein interactions, as well as metabolic and regulatory networks.
Abstract: FusionDB (http://igs-server.cnrs-mrs.fr/FusionDB/) constitutes a resource dedicated to in-depth analysis of bacterial and archaeal gene fusion events. Such events can provide the 'Rosetta stone' in the search for potential protein-protein interactions, as well as metabolic and regulatory networks. However, the false positive rate of this approach may be quite high, prompting a detailed scrutiny of putative gene fusion events. FusionDB readily provides much of the information required for that task. Moreover, FusionDB extends the notion of gene fusion from that of a single gene to that of a family of genes by assembling pairs of genes from different genomes that belong to the same Cluster of Orthogonal Groups (COG). Multiple sequence alignments and phylogenetic tree reconstruction for the N- and C-terminal parts of these 'COG fusion' events are provided to distinguish single and multiple fusion events from cases of gene fission, pseudogenes and other false positives. Finally, gene fusion events with matches to known structures of heterodimers in the Protein Data Bank (PDB) are identified and may be visualized. FusionDB is fully searchable with access to sequence and alignment data at all levels. A number of different scores are provided to easily differentiate 'real' from 'questionable' cases, especially when larger database searches are performed. FusionDB is cross-linked with the 'Phylogenomic Display of Bacterial Genes' (PhydBac) online web server. Together, these servers provide the complete set of information required for in-depth analysis of non-homology-based gene function attribution.

Journal ArticleDOI
TL;DR: This analysis explored the sequence of the human genome to define the composition of the PTP family and discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts.
Abstract: The protein tyrosine phosphatases (PTPs) are now recognized as critical regulators of signal transduction under normal and pathophysiological conditions. In this analysis we have explored the sequence of the human genome to define the composition of the PTP family. Using public and proprietary sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing and provide predicted amino acid sequences for four human PTPs that are currently defined by fragments only. Finally, we correlated each PTP locus with genetic disease markers and identified 4 PTPs that map to known susceptibility loci for type 2 diabetes and 19 PTPs that map to regions frequently deleted in human cancers. We have made our analysis available at http://ptp.cshl.edu or http://science.novonordisk.com/ptp and we hope this resource will facilitate the functional characterization of these key enzymes.

Journal ArticleDOI
TL;DR: Transfer of a cDNA corresponding to this transcript into human cells confers cyclosporin A-sensitive resistance to HIV-1 infection and appears to be a chimeric protein created by retrotransposon-mediated exon shuffling.
Abstract: Lv1 restriction of HIV-1 in the cells of Old World monkeys is associated with the expression of the Trim5 gene. Uniquely, in owl monkey kidney cells, HIV-1 restriction is dependent on the ability of incoming viral capsid protein to bind cyclophilin A (CypA). Cloning of the owl monkey Trim5 gene now reveals the presence of an inserted CypA pseudogene within intron 7 of the Trim5 gene. This insertion results in the formation of a chimeric Trim5-CypA transcript. Transfer of a cDNA corresponding to this transcript into human cells confers cyclosporin A-sensitive resistance to HIV-1 infection. The restriction factor appears to be a chimeric protein created by retrotransposon-mediated exon shuffling.

Journal ArticleDOI
01 Jan 2004-Genomics
TL;DR: Combined database analysis and cDNA cloning have demonstrated that the primary transcript of the mammalian TDP genes undergoes alternative splicing to generate 11 mRNAs, including the one encoding TDP-43, which provides further support for the functional complexity of the eukaryotic T DP genes.

Journal ArticleDOI
Sean Caenepeel1, Glen Charydczak, Sucha Sudarsanam, Tony Hunter, Gerard Manning 
TL;DR: The full protein kinase (PK) complement (kinome) of mouse is determined, which includes many novel kinases and corrections or extensions to >150 published sequences, and links 163 kinases to mutant phenotypes and unlocks the use of mouse genetics to determine functions of orthologous human kinases.
Abstract: We have determined the full protein kinase (PK) complement (kinome) of mouse. This set of 540 genes includes many novel kinases and corrections or extensions to >150 published sequences. The mouse has orthologs for 510 of the 518 human PKs. Nonorthologous kinases arise only by retrotransposition and gene decay. Orthologous kinase pairs vary in sequence conservation along their length, creating a map of functionally important regions for every kinase pair. Many species-specific sequence inserts exist and are frequently alternatively spliced, allowing for the creation of evolutionary lineage-specific functions. Ninety-seven kinase pseudogenes were found, all distinct from the 107 human kinase pseudogenes. Chromosomal mapping links 163 kinases to mutant phenotypes and unlocks the use of mouse genetics to determine functions of orthologous human kinases.

Journal ArticleDOI
TL;DR: The completion of the Arabidopsis thalianagenome sequencing project has enabled us, for the first time, to determine the total number of GH Family 1 members in a higher plant and to investigate the substrate specificity of each mature hydrolase after its heterologous expression in the Pichia pastoris expression system.
Abstract: In plants, Glycoside Hydrolase (GH) Family 1 beta -glycosidases are believed to play important roles in many diverse processes including chemical defense against herbivory, lignification, hydrolysis of cell wall-derived oligosaccharides during germination, and control of active phytohormone levels. Completion of the Arabidopsis thaliana genome sequencing project has enabled us, for the first time, to determine the total number of Family 1 members in a higher plant. Reiterative database searches revealed a multigene family of 48 members that includes eight probable pseudogenes. Manual reannotation and analysis of the entire family were undertaken to rectify existing misannotations and identify phylogenetic relationships among family members. Forty-seven members (designated BGLU1 through BGLU47 ) share a common evolutionary origin and were subdivided into approximately 10 subfamilies based on phylogenetic analysis and consideration of intron-exon organizations. The forty-eighth member of this family ( At3g06510; sfr2 ) is a beta -glucosidase-like gene that belongs to a distinct lineage. Information pertaining to expression patterns and potential functions of Arabidopsis GH Family 1 members is presented. To determine the biological function of all family members, we intend to investigate the substrate specificity of each mature hydrolase after its heterologous expression in the Pichia pastoris expression system. To test the validity of this approach, the BGLU44 -encoded hydrolase was expressed in P. pastoris and purified to homogeneity. When tested against a wide range of natural and synthetic substrates, this enzyme showed a preference for beta -mannosides including 1,4- beta -D-mannooligosaccharides, suggesting that it may be involved in A. thaliana in degradation of mannans, galactomannans, or glucogalactomannans. Supporting this notion, BGLU44 shared high sequence identity and similar gene organization with tomato endosperm beta -mannosidase and barley seed beta -glucosidase/ beta -mannosidase BGQ60.

Journal ArticleDOI
TL;DR: After eliminating genes, introns, ORFs, and plastid-derived DNA, nearly three-fourths of the maize NB mitochondrial genome is still of unknown origin and function.
Abstract: The NB mitochondrial genome found in most fertile varieties of commercial maize (Zea mays subsp. mays) was sequenced. The 569,630-bp genome maps as a circle containing 58 identified genes encoding 33 known proteins, 3 ribosomal RNAs, and 21 tRNAs that recognize 14 amino acids. Among the 22 group II introns identified, 7 are trans-spliced. There are 121 open reading frames (ORFs) of at least 300 bp, only 3 of which exist in the mitochondrial genome of rice (Oryza sativa). In total, the identified mitochondrial genes, pseudogenes, ORFs, and cis-spliced introns extend over 127,555 bp (22.39%) of the genome. Integrated plastid DNA accounts for an additional 25,281 bp (4.44%) of the mitochondrial DNA, and phylogenetic analyses raise the possibility that copy correction with DNA from the plastid is an ongoing process. Although the genome contains six pairs of large repeats that cover 17.35% of the genome, small repeats (20–500 bp) account for only 5.59%, and transposable element sequences are extremely rare. MultiPip alignments show that maize mitochondrial DNA has little sequence similarity with other plant mitochondrial genomes, including that of rice, outside of the known functional genes. After eliminating genes, introns, ORFs, and plastid-derived DNA, nearly three-fourths of the maize NB mitochondrial genome is still of unknown origin and function.

Journal ArticleDOI
TL;DR: Members of phylogenetic subgroups of the class 2 NBS–LRR genes mapped to as many as ten different chromosomes indicate that they were duplicated by many independent genetic events that have occurred continuously through the expansion of the NBS-LRR superfamily and the evolution of the modern rice genome.
Abstract: The availability of the rice genome sequence enabled the global characterization of nucleotide-binding site (NBS)-leucine-rich repeat (LRR) genes, the largest class of plant disease resistance genes. The rice genome carries approximately 500 NBS-LRR genes that are very similar to the non-Toll/interleukin-1 receptor homology region (TIR) class (class 2) genes of Arabidopsis but none that are homologous to the TIR class genes. Over 100 of these genes were predicted to be pseudogenes in the rice cultivar Nipponbare, but some of these are functional in other rice lines. Over 80 other NBS-encoding genes were identified that belonged to four different classes, only two of which are present in dicotyledonous plant sequences present in databases. Map positions of the identified genes show that these genes occur in clusters, many of which included members from distantly related groups. Members of phylogenetic subgroups of the class 2 NBS-LRR genes mapped to as many as ten different chromosomes. The patterns of duplication of the NBS-LRR genes indicate that they were duplicated by many independent genetic events that have occurred continuously through the expansion of the NBS-LRR superfamily and the evolution of the modern rice genome. Genetic events, such as inversions, that inhibit the ability of recently duplicated genes to recombine promote the divergence of their sequences by inhibiting concerted evolution.

Journal ArticleDOI
TL;DR: It is concluded that strain 91001 and other strains isolated from M. brandti might have evolved from ancestral Y. pestis in a different lineage and may contribute to its unique nonpathogenicity to humans and host-specificity.
Abstract: Genomics provides an unprecedented opportunity to probe in minute detail into the genomes of the world’s most deadly pathogenic bacteria-Yersinia pestis. Here we report the complete genome sequence of Y. pestis strain 91001, a human-avirulent strain isolated from the rodent Brandt’s vole-Microtus brandti. The genome of strain 91001 consists of one chromosome and four plasmids (pPCP1, pCD1, pMT1 and pCRY). The 9609-bp pPCP1 plasmid of strain 91001 is almost identical to the counterparts from reference strains (CO92 and KIM). There are 98 genes in the 70,159-bp range of plasmid pCD1. The 106,642-bp plasmid pMT1 has slightly different architecture compared with the reference ones. pCRY is a novel plasmid discovered in this work. It is 21,742 bp long and harbors a cryptic type IV secretory system. The chromosome of 91001 is 4,595,065 bp in length. Among the 4037 predicted genes, 141 are possible pseudogenes. Due to the rearrangements mediated by insertion elements, the structure of the 91001 chromosome shows dramatic differences compared with CO92 and KIM. Based on the analysis of plasmids and chromosome architectures, pseudogene distribution, nitrate reduction negative mechanism and gene comparison, we conclude that strain 91001 and other strains isolated from M. brandti might have evolved from ancestral Y. pestis in a different lineage. The large genome fragment deletions in the 91001 chromosome and some pseudogenes may contribute to its unique nonpathogenicity to humans and host-specificity.

Journal ArticleDOI
21 Jul 2004-Gene
TL;DR: It is suggested that there are no functionalHO-3 genes in rat and that the HO-3a and HO- 3b genes are processed pseudogenes derived from HO-2 transcripts.

Journal ArticleDOI
TL;DR: Plant MADS-box genes form a large gene family for transcription factors and are involved in various aspects of developmental processes, including flower development, and the higher rate of birth-and-death evolution in type I genes appeared partly due to a higher frequency of segmental gene duplication and weaker purifying selection.
Abstract: Plant MADS-box genes form a large gene family for transcription factors and are involved in various aspects of developmental processes, including flower development. They are known to be subject to birth-and-death evolution, but the detailed features of this mode of evolution remain unclear. To have a deeper insight into the evolutionary pattern of this gene family, we enumerated all available functional and nonfunctional (pseudogene) MADS-box genes from the Arabidopsis and rice genomes. Plant MADS-box genes can be classified into types I and II genes on the basis of phylogenetic analysis. Conducting extensive homology search and phylogenetic analysis, we found 64 presumed functional and 37 nonfunctional type I genes and 43 presumed functional and 4 nonfunctional type II genes in Arabidopsis. We also found 24 presumed functional and 6 nonfunctional type I genes and 47 presumed functional and 1 nonfunctional type II genes in rice. Our phylogenetic analysis indicated there were at least about four to eight type I genes and ≈15–20 type II genes in the most recent common ancestor of Arabidopsis and rice. It has also been suggested that type I genes have experienced a higher rate of birth-and-death evolution than type II genes in angiosperms. Furthermore, the higher rate of birth-and-death evolution in type I genes appeared partly due to a higher frequency of segmental gene duplication and weaker purifying selection in type I than in type II genes.

Journal ArticleDOI
TL;DR: Complementation analysis by stable potato transformation showed that the gene Gro1-4 conferred resistance to G. rostochiensis pathotype Ro1, and RT-PCR demonstrated that members of the Gro1 gene family are expressed in most potato tissues.
Abstract: The endoparasitic root cyst nematode Globodera rostochiensis causes considerable damage in potato cultivation. In the past, major genes for nematode resistance have been introgressed from related potato species into cultivars. Elucidating the molecular basis of resistance will contribute to the understanding of nematode-plant interactions and assist in breeding nematode-resistant cultivars. The Gro1 resistance locus to G. rostochiensis on potato chromosome VII co-localized with a resistance-gene-like (RGL) DNA marker. This marker was used to isolate from genomic libraries 15 members of a closely related candidate gene family. Analysis of inheritance, linkage mapping, and sequencing reduced the number of candidate genes to three. Complementation analysis by stable potato transformation showed that the gene Gro1-4 conferred resistance to G. rostochiensis pathotype Ro1. Gro1-4 encodes a protein of 1136 amino acids that contains Toll-interleukin 1 receptor (TIR), nucleotide-binding (NB), leucine-rich repeat (LRR) homology domains and a C-terminal domain with unknown function. The deduced Gro1-4 protein differed by 29 amino acid changes from susceptible members of the Gro1 gene family. Sequence characterization of 13 members of the Gro1 gene family revealed putative regulatory elements and a variable microsatellite in the promoter region, insertion of a retrotransposon-like element in the first intron, and a stop codon in the NB coding region of some genes. Sequence analysis of RT-PCR products showed that Gro1-4 is expressed, among other members of the family including putative pseudogenes, in non-infected roots of nematode-resistant plants. RT-PCR also demonstrated that members of the Gro1 gene family are expressed in most potato tissues.

Journal ArticleDOI
TL;DR: The complete genome sequence of R. typhi is presented and a three-way comparison allowed further in silico analysis of the SpoT split genes, leading to propose that the stringent response system is still functional in these rickettsiae.
Abstract: Rickettsia typhi, the causative agent of murine typhus, is an obligate intracellular bacterium with a life cycle involving both vertebrate and invertebrate hosts. Here we present the complete genome sequence of R. typhi (1,111,496 bp) and compare it to the two published rickettsial genome sequences: R. prowazekii and R. conorii. We identified 877 genes in R. typhi encoding 3 rRNAs, 33 tRNAs, 3 noncoding RNAs, and 838 proteins, 3 of which are frameshifts. In addition, we discovered more than 40 pseudogenes, including the entire cytochrome c oxidase system. The three rickettsial genomes share 775 genes: 23 are found only in R. prowazekii and R. typhi, 15 are found only in R. conorii and R. typhi, and 24 are unique to R. typhi. Although most of the genes are colinear, there is a 35-kb inversion in gene order, which is close to the replication terminus, in R. typhi, compared to R. prowazekii and R. conorii. In addition, we found a 124-kb R. typhi-specific inversion, starting 19 kb from the origin of replication, compared to R. prowazekii and R. conorii. Inversions in this region are also seen in the unpublished genome sequences of R. sibirica and R. rickettsii, indicating that this region is a hot spot for rearrangements. Genome comparisons also revealed a 12-kb insertion in the R. prowazekii genome, relative to R. typhi and R. conorii, which appears to have occurred after the typhus (R. prowazekii and R. typhi) and spotted fever (R. conorii) groups diverged. The three-way comparison allowed further in silico analysis of the SpoT split genes, leading us to propose that the stringent response system is still functional in these rickettsiae.

Journal ArticleDOI
TL;DR: This work has systematically identified approximately 5000 processed pseudogenes in the mouse genome, and estimated that approximately 60% are lineage specific, created after the mouse and human diverged.

Journal ArticleDOI
TL;DR: Changes in anatomical distribution of receptor expression may have played an important part in such functional switching along with changes in receptor structures and ligand preferences as shown by information from different classes of vertebrates.

Journal ArticleDOI
04 Mar 2004-Oncogene
TL;DR: Present data documents that mda-5 is a novel type I IFN-inducible gene, which may contribute to apoptosis induction during terminal differentiation and during IFN treatment.
Abstract: Melanoma differentiation associated gene-5 (mda-5) was identified by subtraction hybridization as a novel upregulated gene in HO-1 human melanoma cells induced to terminally differentiate by treatment with IFN-beta+MEZ. Considering its unique structure, consisting of a caspase recruitment domain (CARD) and an RNA helicase domain, it was hypothesized that mda-5 contributes to apoptosis occurring during terminal differentiation. We have currently examined the expression pattern of mda-5 in normal tissues, during induction of terminal differentiation and after treatment with type I IFNs. In addition, we have defined its genomic structure and chromosomal location. IFN-beta, a type I IFN, induces mda-5 expression in a biphasic and dose-dependent manner. Based on its temporal kinetics of induction and lack of requirement for prior protein synthesis mda-5 is an early type I IFN-responsive gene. The level of mda-5 mRNA is in low abundance in normal tissues, whereas expression is induced in a spectrum of normal and cancer cells by IFN-beta. Expression of mda-5 by means of a replication incompetent adenovirus, Ad.mda-5, induces apoptosis in HO-1 cells as confirmed by morphologic, biochemical and molecular assays. Additionally, the combination of Ad.mda-5+MEZ further augments apoptosis as observed in Ad.null or uninfected HO-1 cells induced to terminally differentiate by treatment with IFN-beta+MEZ. The mda-5 gene is located on human chromosome 2q24 and consists of 16 exons, without pseudogenes, and is conserved in the mouse genome. Present data documents that mda-5 is a novel type I IFN-inducible gene, which may contribute to apoptosis induction during terminal differentiation and during IFN treatment. The conserved genomic and protein structure of mda-5 in human and mouse will permit analysis of the evolution and developmental aspects of this gene.