scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2003"


Journal ArticleDOI
TL;DR: The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.
Abstract: The Arabidopsis genome contains ∼200 genes that encode proteins with similarity to the nucleotide binding site and other domains characteristic of plant resistance proteins. Through a reiterative process of sequence analysis and reannotation, we identified 149 NBS-LRR–encoding genes in the Arabidopsis (ecotype Columbia) genomic sequence. Fifty-six of these genes were corrected from earlier annotations. At least 12 are predicted to be pseudogenes. As described previously, two distinct groups of sequences were identified: those that encoded an N-terminal domain with Toll/Interleukin-1 Receptor homology (TIR-NBS-LRR, or TNL), and those that encoded an N-terminal coiled-coil motif (CC-NBS-LRR, or CNL). The encoded proteins are distinct from the 58 predicted adapter proteins in the previously described TIR-X, TIR-NBS, and CC-NBS groups. Classification based on protein domains, intron positions, sequence conservation, and genome distribution defined four subgroups of CNL proteins, eight subgroups of TNL proteins, and a pair of divergent NL proteins that lack a defined N-terminal motif. CNL proteins generally were encoded in single exons, although two subclasses were identified that contained introns in unique positions. TNL proteins were encoded in modular exons, with conserved intron positions separating distinct protein domains. Conserved motifs were identified in the LRRs of both CNL and TNL proteins. In contrast to CNL proteins, TNL proteins contained large and variable C-terminal domains. The extant distribution and diversity of the NBS-LRR sequences has been generated by extensive duplication and ectopic rearrangements that involved segmental duplications as well as microscale events. The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.

1,503 citations


Journal ArticleDOI
TL;DR: New alignment techniques that can handle large gaps in a robust fashion and discriminate between orthologous and paralogous alignments are developed and provide evidence that ≈2% of the genes in the human/mouse common ancestor have been deleted or partially deleted in the mouse.
Abstract: This study examines genomic duplications, deletions, and rearrangements that have happened at scales ranging from a single base to complete chromosomes by comparing the mouse and human genomes. From whole-genome sequence alignments, 344 large (>100-kb) blocks of conserved synteny are evident, but these are further fragmented by smaller-scale evolutionary events. Excluding transposon insertions, on average in each megabase of genomic alignment we observe two inversions, 17 duplications (five tandem or nearly tandem), seven transpositions, and 200 deletions of 100 bases or more. This includes 160 inversions and 75 duplications or transpositions of length >100 kb. The frequencies of these smaller events are not substantially higher in finished portions in the assembly. Many of the smaller transpositions are processed pseudogenes; we define a “syntenic” subset of the alignments that excludes these and other small-scale transpositions. These alignments provide evidence that ≈2% of the genes in the human/mouse common ancestor have been deleted or partially deleted in the mouse. There also appears to be slightly less nontransposon-induced genome duplication in the mouse than in the human lineage. Although some of the events we detect are possibly due to misassemblies or missing data in the current genome sequence or to the limitations of our methods, most are likely to represent genuine evolutionary events. To make these observations, we developed new alignment techniques that can handle large gaps in a robust fashion and discriminate between orthologous and paralogous alignments.

813 citations


Journal ArticleDOI
TL;DR: The insect chemoreceptor superfamily in Drosophila melanogaster is predicted to consist of 62 odorant receptor (Or) and 68 gustatory receptor (Gr) proteins, encoded by families of 60 Or and 60 Gr genes through alternative splicing.
Abstract: The insect chemoreceptor superfamily in Drosophila melanogaster is predicted to consist of 62 odorant receptor (Or) and 68 gustatory receptor (Gr) proteins, encoded by families of 60 Or and 60 Gr genes through alternative splicing. We include two previously undescribed Or genes and two previously undescribed Gr genes; two previously predicted Or genes are shown to be alternative splice forms. Three polymorphic pseudogenes and one highly defective pseudogene are recognized. Phylogenetic analysis reveals deep branches connecting multiple highly divergent clades within the Gr family, and the Or family appears to be a single highly expanded lineage within the superfamily. The genes are spread throughout the Drosophila genome, with some relatively recently diverged genes still clustered in the genome. The Gr5a gene on the X chromosome, which encodes a receptor for the sugar trehalose, has transposed from one such tandem cluster of six genes at cytological location 64, as has Gr61a, and all eight of these receptors might bind sugars. Analysis of intron evolution suggests that the common ancestor consisted of a long N-terminal exon encoding transmembrane domains 1-5 followed by three exons encoding transmembrane domains 6-7. As many as 57 additional introns have been acquired idiosyncratically during the evolution of the superfamily, whereas the ancestral introns and some of the older idiosyncratic introns have been lost at least 48 times independently. Altogether, these patterns of molecular evolution suggest that this is an ancient superfamily of chemoreceptors, probably dating back at least to the origin of the arthropods.

745 citations


Journal ArticleDOI
TL;DR: Identification of disease-associated mutations in an uncharacterized gene, SBDS, in the interval of 1.9 cM at 7q11 is reported, suggesting that SDS may be caused by a deficiency in an aspect of RNA metabolism essential for development of the exocrine pancreas, hematopoiesis and chrondrogenesis.
Abstract: Shwachman-Diamond syndrome (SDS; OMIM 260400) is an autosomal recessive disorder with clinical features that include pancreatic exocrine insufficiency, hematological dysfunction and skeletal abnormalities. Here, we report identification of disease-associated mutations in an uncharacterized gene, SBDS, in the interval of 1.9 cM at 7q11 previously shown to be associated with the disease. We report that SBDS has a 1.6-kb transcript and encodes a predicted protein of 250 amino acids. A pseudogene copy (SBDSP) with 97% nucleotide sequence identity resides in a locally duplicated genomic segment of 305 kb. We found recurring mutations resulting from gene conversion in 89% of unrelated individuals with SDS (141 of 158), with 60% (95 of 158) carrying two converted alleles. Converted segments consistently included at least one of two pseudogene-like sequence changes that result in protein truncation. SDBS is a member of a highly conserved protein family of unknown function with putative orthologs in diverse species including archaea and eukaryotes. Archaeal orthologs are located within highly conserved operons that include homologs of RNA-processing genes, suggesting that SDS may be caused by a deficiency in an aspect of RNA metabolism that is essential for development of the exocrine pancreas, hematopoiesis and chrondrogenesis.

672 citations


Journal ArticleDOI
12 Dec 2003-Science
TL;DR: Partitions of genes into inferred biological classes identified accelerated evolution in several functional classes, including olfaction and nuclear transport and human-accelerated genes are significantly more likely to underlie major known Mendelian disorders.
Abstract: Even though human and chimpanzee gene sequences are nearly 99% identical, sequence comparisons can nevertheless be highly informative in identifying biologically important changes that have occurred since our ancestral lineages diverged. We analyzed alignments of 7645 chimpanzee gene sequences to their human and mouse orthologs. These three-species sequence alignments allowed us to identify genes undergoing natural selection along the human and chimp lineage by fitting models that include parameters specifying rates of synonymous and nonsynonymous nucleotide substitution. This evolutionary approach revealed an informative set of genes with significantly different patterns of substitution on the human lineage compared with the chimpanzee and mouse lineages. Partitions of genes into inferred biological classes identified accelerated evolution in several functional classes, including olfaction and nuclear transport. In addition to suggesting adaptive physiological differences between chimps and humans, human-accelerated genes are significantly more likely to underlie major known Mendelian disorders.

648 citations


Journal ArticleDOI
19 Dec 2003-Science
TL;DR: It is proposed that stochastic activation of only one OR gene within the cluster and negative feedback regulation by that OR gene product are necessary to ensure the one receptor–one neuron rule.
Abstract: In the mouse olfactory system, each olfactory sensory neuron (OSN) expresses only one odorant receptor (OR) gene in a monoallelic and mutually exclusive manner. Such expression forms the genetic basis for OR-instructed axonal projection of OSNs to the olfactory bulb of the brain during development. Here, we identify an upstream cis-acting DNA region that activates the OR gene cluster in mouse and allows the expression of only one OR gene within the cluster. Deletion of the coding region of the expressed OR gene or a naturally occurring frame-shift mutation allows a second OR gene to be expressed. We propose that stochastic activation of only one OR gene within the cluster and negative feedback regulation by that OR gene product are necessary to ensure the one receptor-one neuron rule.

527 citations


Journal ArticleDOI
TL;DR: Analysis of the genome of Coxiella burnetii, Nine Mile phase I RSA493, a highly virulent zoonotic pathogen and category B bioterrorism agent, was sequenced by the random shotgun method, suggesting that the obligate intracellular lifestyle of C. burningetii may be a relatively recent innovation.
Abstract: The 1,995,275-bp genome of Coxiella burnetii, Nine Mile phase I RSA493, a highly virulent zoonotic pathogen and category B bioterrorism agent, was sequenced by the random shotgun method. This bacterium is an obligate intracellular acidophile that is highly adapted for life within the eukaryotic phagolysosome. Genome analysis revealed many genes with potential roles in adhesion, invasion, intracellular trafficking, host-cell modulation, and detoxification. A previously uncharacterized 13-member family of ankyrin repeat-containing proteins is implicated in the pathogenesis of this organism. Although the lifestyle and parasitic strategies of C. burnetii resemble that of Rickettsiae and Chlamydiae, their genome architectures differ considerably in terms of presence of mobile elements, extent of genome reduction, metabolic capabilities, and transporter profiles. The presence of 83 pseudogenes displays an ongoing process of gene degradation. Unlike other obligate intracellular bacteria, 32 insertion sequences are found dispersed in the chromosome, indicating some plasticity in the C. burnetii genome. These analyses suggest that the obligate intracellular lifestyle of C. burnetii may be a relatively recent innovation.

516 citations


Journal ArticleDOI
TL;DR: The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus, and represents a basal archaeal lineage and has a highly reduced genome.
Abstract: The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.

506 citations


Journal ArticleDOI
TL;DR: The Drosophila literature is reviewed and the proposal that pseudogenes be considered as potogenes, i.e., DNA sequences with a potentiality for becoming new genes is agreed.
Abstract: ▪ Abstract Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles. We first review the Drosophila literature and then extend the discussion to the various functional features identified in the pseudogenes of other organisms. A pseudogene that has arisen by duplication or retroposition may, a...

460 citations


Journal ArticleDOI
TL;DR: The complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp) was determined and it was found that the strain is distinctive in its large complement of insertion sequences, with several genomic rearrangements mediated by insertion sequences.
Abstract: We determined the complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp). Shigella species cause >1 million deaths per year from dysentery and diarrhea and have a lifestyle that is markedly different from those of closely related bacteria, including Escherichia coli. The genome exhibits the backbone and island mosaic structure of E. coli pathogens, albeit with much less horizontally transferred DNA and lacking 357 genes present in E. coli. The strain is distinctive in its large complement of insertion sequences, with several genomic rearrangements mediated by insertion sequences, 12 cryptic prophages, 372 pseudogenes, and 195 S. flexneri-specific genes. The 2457T genome was also compared with that of a recently sequenced S. flexneri 2a strain, 301. Our data are consistent with Shigella being phylogenetically indistinguishable from E. coli. The S. flexneri-specific regions contain many genes that could encode proteins with roles in virulence. Analysis of these will reveal the genetic basis for aspects of this pathogenic organism9s distinctive lifestyle that have yet to be explained.

419 citations


Journal ArticleDOI
TL;DR: It is concluded that a priori determinations of orthology and paralogy of nrDNA sequences should not be made based on the functionality or lack of functionality of those sequences, and the advantages of a tree-based approach to identifying pseudogenes based on comparisons of sequence substitution patterns from putatively conserved regions.

Journal ArticleDOI
TL;DR: Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides, however, it does vary with GC-content: Processed pseudogene occur mostly in intermediate GC- content regions.
Abstract: Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for approximately 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.

Journal ArticleDOI
01 May 2003-Nature
TL;DR: The role of an expressed pseudogene—regulation of messenger-RNA stability—in a transgene-insertion mouse mutant exhibiting polycystic kidneys and bone deformity is reported and point to the functional significance of non-coding RNAs.
Abstract: A pseudogene is a gene copy that does not produce a functional, full-length protein. The human genome is estimated to contain up to 20,000 pseudogenes. Although much effort has been devoted to understanding the function of pseudogenes, their biological roles remain largely unknown. Here we report the role of an expressed pseudogene-regulation of messenger-RNA stability-in a transgene-insertion mouse mutant exhibiting polycystic kidneys and bone deformity. The transgene was integrated into the vicinity of the expressing pseudogene of Makorin1, called Makorin1-p1. This insertion reduced transcription of Makorin1-p1, resulting in destabilization of Makorin1 mRNA in trans by way of a cis-acting RNA decay element within the 5' region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. Either Makorin1 or Makorin1-p1 transgenes could rescue these phenotypes. Our findings demonstrate a specific regulatory role of an expressed pseudogene, and point to the functional significance of non-coding RNAs.

Journal ArticleDOI
TL;DR: The 4.8-Mb complete genome sequence of Salmonella enterica serovar Typhi strain Ty2 is presented, a human-specific pathogen causing typhoid fever, and a half-genome interreplichore inversion in Ty2 relative to CT18 was confirmed.
Abstract: We present the 4.8-Mb complete genome sequence of Salmonella enterica serovar Typhi strain Ty2, a human-specific pathogen causing typhoid fever. A comparison with the genome sequence of recently isolated S. enterica serovar Typhi strain CT18 showed that 29 of the 4,646 predicted genes in Ty2 are unique to this strain, while 84 genes are unique to CT18. Both genomes contain more than 200 pseudogenes; 9 of these genes in CT18 are intact in Ty2, while 11 intact CT18 genes are pseudogenes in Ty2. A half-genome interreplichore inversion in Ty2 relative to CT18 was confirmed. The two strains exhibit differences in prophages, insertion sequences, and island structures. While CT18 carries two plasmids, one conferring multiple drug resistance, Ty2 has no plasmids and is sensitive to antibiotics.

Journal ArticleDOI
TL;DR: The grouping of var genes implies that var gene recombination preferentially occurs within var gene groups and it is speculated that the groups reflect a functional diversification evolved to cope with the varying conditions of transmission and host immune response met by the parasite.
Abstract: Background: The variant surface antigen family Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) is an important target for protective immunity and is implicated in the pathology of malaria through its ability to adhere to host endothelial receptors. The sequence diversity and organization of the 3D7 PfEMP1 repertoire was investigated on the basis of the complete genome sequence. Methods: Using two tree-building methods we analysed the coding and non-coding sequences of 3D7 var and rif genes as well as var genes of other parasite strains. Results: var genes can be sub-grouped into three major groups (group A, B and C) and two intermediate groups B/A and B/C representing transitions between the three major groups. The best defined var group, group A, comprises telomeric genes transcribed towards the telomere encoding PfEMP1s with complex domain structures different from the 4-domain type dominant of groups B and C. Two sequences belonging to the var1 and var2 subfamilies formed independent groups. A rif subgroup transcribed towards the centromere was found neighbouring var genes of group A such that the rif and var 5' regions merged. This organization appeared to be unique for the group A var genes Conclusion: The grouping of var genes implies that var gene recombination preferentially occurs within var gene groups and it is speculated that the groups reflect a functional diversification evolved to cope with the varying conditions of transmission and host immune response met by the parasite.

Journal ArticleDOI
TL;DR: A complete listing of all ALDH sequences known to date, along with the evolutionary analysis of the eukaryotic ALDHs are presented.

Journal ArticleDOI
TL;DR: A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA and revealed twice as many transcribed bases as have been reported previously.
Abstract: A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A) + RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.

Journal ArticleDOI
TL;DR: It is reported that the genomic DNA of human B7-H4 is mapped on chromosome 1 comprised of six exons and five introns spanning 66 kb, of which exon 6 is used for alternative splicing to generate two different transcripts.
Abstract: B7-H4 is a recently identified B7 family member that negatively regulates T cell immunity by the inhibition of T cell proliferation, cytokine production, and cell cycle progression. In this study, we report that the genomic DNA of human B7-H4 is mapped on chromosome 1 comprised of six exons and five introns spanning 66 kb, of which exon 6 is used for alternative splicing to generate two different transcripts. Similar B7-H4 structure is also found in mouse genomic DNA in chromosome 3. A human B7-H4 pseudogene is identified in chromosome 20p11.1 with a single exon and two stop codons in the coding region. Immunohistochemistry analysis using B7-H4-specific mAb demonstrates that B7-H4 is not expressed on the majority of normal human tissues. In contrast, up to 85% (22 of 26) of ovarian cancer and 31% (5 of 16) of lung cancer tissues constitutively express B7-H4. Our results indicate a tight regulation of B7-H4 expression in the translational level in normal peripheral tissues and a potential role of B7-H4 in the evasion of tumor immunity.

Journal ArticleDOI
TL;DR: It is found that humans have accumulated mutations that disrupt OR coding regions roughly 4-fold faster than any other species sampled, suggesting a human-specific process of OR gene disruption, likely due to a reduced chemosensory dependence relative to apes.
Abstract: Olfactory receptor (OR) genes constitute the basis for the sense of smell and are encoded by the largest mammalian gene superfamily of >1,000 genes. In humans, >60% of these are pseudogenes. In contrast, the mouse OR repertoire, although of roughly equal size, contains only ≈20% pseudogenes. We asked whether the high fraction of nonfunctional OR genes is specific to humans or is a common feature of all primates. To this end, we have compared the sequences of 50 human OR coding regions, regardless of their functional annotations, to those of their putative orthologs in chimpanzees, gorillas, orangutans, and rhesus macaques. We found that humans have accumulated mutations that disrupt OR coding regions roughly 4-fold faster than any other species sampled. As a consequence, the fraction of OR pseudogenes in humans is almost twice as high as in the non-human primates, suggesting a human-specific process of OR gene disruption, likely due to a reduced chemosensory dependence relative to apes.

Journal ArticleDOI
TL;DR: The phylogeny of plant glycosyltransferases is substantiated with complete phylogenetic analysis of the A. thaliana UGT multigene family, including intron-exon organization and chromosomal localization.

Journal ArticleDOI
TL;DR: It is found that deletions are about three times more common than insertions, and the frequencies of both these events follow characteristic power-law behavior associated with the size of the indel, but unexpectedly, the frequency of 3 bp deletions violates this trend.
Abstract: Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power-law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.

Journal ArticleDOI
TL;DR: All intergenic regions in the human genome are screened with a combination of homology searches and a functionality test using the ratio of silent to replacement nucleotide substitutions (KA/KS), and nonprocessed pseudogenes appear to be enriched in regions with high gene density.
Abstract: We screened all intergenic regions in the human genome to identify pseudogenes with a combination of homology searches and a functionality test using the ratio of silent to replacement nucleotide substitutions (KA/KS). We identified 19,724 regions of which 95% +/- 3% are estimated to evolve neutrally and thus are likely to encode pseudogenes. Half of these have no detectable truncation in their pseudocoding regions and therefore are not identifiable by methods that require the presence of truncations to prove nonfunctionality. A comparative analysis with the mouse genome showed that 70% of these pseudogenes have a retrotranspositional origin (processed), and the rest arose by segmental duplication (nonprocessed). Although the spread of both types of pseudogenes correlates with chromosome size, nonprocessed pseudogenes appear to be enriched in regions with high gene density. It is likely that the human pseudogenes identified here represent only a small fraction of the total, which probably exceeds the number of genes.

Journal ArticleDOI
TL;DR: The complete set of OR genes and their chromosomal locations from the latest human genome sequences are identified and it is shown that the class II OR genes can further be classified into 19 phylogenetic clades supported by high bootstrap values.
Abstract: Olfactory receptor (OR) genes form the largest known multigene family in the human genome. To obtain some insight into their evolutionary history, we have identified the complete set of OR genes and their chromosomal locations from the latest human genome sequences. We detected 388 potentially functional genes that have intact ORFs and 414 apparent pseudogenes. The number and the fraction (48%) of functional genes are considerably larger than the ones previously reported. The human OR genes can clearly be divided into class I and class II genes, as was previously noted. Our phylogenetic analysis has shown that the class II OR genes can further be classified into 19 phylogenetic clades supported by high bootstrap values. We have also found that there are many tandem arrays of OR genes that are phylogenetically closely related. These genes appear to have been generated by tandem gene duplication. However, the relationships between genomic clusters and phylogenetic clades are very complicated. There are a substantial number of cases in which the genes in the same phylogenetic clade are located on different chromosomal regions. In addition, OR genes belonging to distantly related phylogenetic clades are sometimes located very closely in a chromosomal region and form a tight genomic cluster. These observations can be explained by the assumption that several chromosomal rearrangements have occurred at the regions of OR gene clusters and the OR genes contained in different genomic clusters are shuffled.

Journal ArticleDOI
LaDeana W. Hillier1, Robert S. Fulton1, Lucinda Fulton1, Tina Graves1, Kymberlie H. Pepin1, Caryn Wagner-McPherson1, Dan Layman1, Jason Maas1, Sara Jaeger1, Rebecca S. Walker1, Kristine M. Wylie1, Mandeep Sekhon1, Michael C. Becker1, Michelle O'Laughlin1, Mark E. Schaller1, Ginger A. Fewell1, Kimberly D. Delehaunty1, Tracie L. Miner1, William E. Nash1, Matt Cordes1, Hui Du1, Hui Sun1, Jennifer Edwards1, Holland Bradshaw-Cordum1, Johar Ali1, Stephanie Andrews1, Amber Isak1, Andrew Vanbrunt1, Christine Nguyen1, Feiyu Du1, Betty Lamar1, Laura Courtney1, Joelle Kalicki1, Philip Ozersky1, Lauren Bielicki1, Kelsi Scott1, Andrea Holmes1, Richard Harkins1, Anthony R. Harris1, Cindy Strong1, Shunfang Hou1, Chad Tomlinson1, Sara Dauphin-Kohlberg1, Amy Kozlowicz-Reilly1, Shawn Leonard1, Theresa Rohlfing1, Susan M. Rock1, Aye-Mon Tin-Wollam1, Amanda Abbott1, Patrick Minx1, Rachel Maupin1, Catrina Strowmatt1, Phil Latreille1, Nancy Miller1, Doug Johnson1, Jennifer Murray1, Jeffrey Woessner1, Michael C. Wendl1, Shiaw-Pyng Yang1, Brian Schultz1, John W. Wallis1, John Spieth1, Tamberlyn Bieri1, Joanne O. Nelson1, Nicolas Berkowicz1, Patricia Wohldmann1, Lisa Cook1, Matthew T. Hickenbotham1, James M. Eldred1, Donald Williams1, Joseph A. Bedell1, Elaine R. Mardis1, Sandra W. Clifton1, Stephanie L. Chissoe1, Marco A. Marra2, Marco A. Marra1, Christopher K. Raymond3, Eric Haugen3, Will Gillett3, Yang Zhou3, R. James3, Karen A. Phelps3, Shawn Iadanoto3, Kerry L. Bubb3, Elizabeth Simms3, Ruth Levy3, James B. Clendenning3, Rajinder Kaul3, W. James Kent4, Terrence S. Furey4, Robert Baertsch4, Michael R. Brent1, Evan Keibler1, Paul Flicek1, Peer Bork5, Mikita Suyama5, Jeffrey A. Bailey6, Matthew E. Portnoy7, David Torrents5, Asif T. Chinwalla1, Warren Gish1, Sean R. Eddy1, John Douglas Mcpherson1, John Douglas Mcpherson8, Maynard V. Olson3, Evan E. Eichler6, Eric D. Green7, Robert H. Waterston3, Robert H. Waterston1, Richard K. Wilson1 
10 Jul 2003-Nature
TL;DR: The euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far, has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence.
Abstract: Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame.

Journal ArticleDOI
TL;DR: The striking differences in the intergenic landscape between the A and Am genomes that diverged 1 to 3 million years ago provide evidence for a dynamic and rapid genome evolution in wheat species.
Abstract: To study genome evolution in wheat, we have sequenced and compared two large physical contigs of 285 and 142 kb covering orthologous low molecular weight (LMW) glutenin loci on chromosome 1AS of a diploid wheat species (Triticum monococcum subsp monococcum) and a tetraploid wheat species (Triticum turgidum subsp durum). Sequence conservation between the two species was restricted to small regions containing the orthologous LMW glutenin genes, whereas >90% of the compared sequences were not conserved. Dramatic sequence rearrangements occurred in the regions rich in repetitive elements. Dating of long terminal repeat retrotransposon insertions revealed different insertion events occurring during the last 5.5 million years in both species. These insertions are partially responsible for the lack of homology between the intergenic regions. In addition, the gene space was conserved only partially, because different predicted genes were identified on both contigs. Duplications and deletions of large fragments that might be attributable to illegitimate recombination also have contributed to the differentiation of this region in both species. The striking differences in the intergenic landscape between the A and Am genomes that diverged 1 to 3 million years ago provide evidence for a dynamic and rapid genome evolution in wheat species.

Journal ArticleDOI
TL;DR: Genotyping 51 candidate genes in 189 ethnically diverse humans shows an unprecedented prevalence of segregating pseudogenes, identifying one of the most pronounced cases of functional population diversity in the human genome.
Abstract: Of more than 1,000 human olfactory receptor genes, more than half seem to be pseudogenes. We investigated whether the most recent of these disruptions might still segregate with the intact form by genotyping 51 candidate genes in 189 ethnically diverse humans. The results show an unprecedented prevalence of segregating pseudogenes, identifying one of the most pronounced cases of functional population diversity in the human genome.

Journal ArticleDOI
TL;DR: It is predicted that the availability of numerous animal genomes will give rise to a new field of genome zoology in which differences in animal physiology and ethology are illuminated by the study of genomic sequence variations.
Abstract: The extensive similarities between the genomes of human and model organisms are the foundation of much of modern biology, with model organism experimentation permitting valuable insights into biological function and the aetiology of human disease. In contrast, differences among genomes have received less attention. Yet these can be expected to govern the physiological and morphological distinctions apparent among species, especially if such differences are the result of evolutionary adaptation. A recent comparison of the draft sequences of mouse and human genomes has shed light on the selective forces that have predominated in their recent evolutionary histories. In particular, mouse-specific clusters of homologues associated with roles in reproduction, immunity and host defence appear to be under diversifying positive selective pressure, as indicated by high ratios of non-synonymous to synonymous substitution rates. These clusters are also frequently punctuated by homologous pseudogenes. They thus have experienced numerous gene death, as well as gene birth, events. These regions appear, therefore, to have borne the brunt of adaptive evolution that underlies physiological and behavioural innovation in mice. We predict that the availability of numerous animal genomes will give rise to a new field of genome zoology in which differences in animal physiology and ethology are illuminated by the study of genomic sequence variations.

Journal ArticleDOI
TL;DR: It is suggested that a burst of formation of PPs and Alus occurred in the genome of ancestral primates and one possible mechanism is that proteins encoded by members of particular L1 subfamilies acquired an enhanced ability to recognize cytosolic RNAs in trans.
Abstract: Background: Abundant pseudogenes are a feature of mammalian genomes. Processed pseudogenes (PPs) are reverse transcribed from mRNAs. Recent molecular biological studies show that mammalian long interspersed element 1 (L1)-encoded proteins may have been involved in PP reverse transcription. Here, we present the first comprehensive analysis of human PPs using all known human genes as queries. Results: The human genome was queried and 3,664 candidate PPs were identified. The most abundant were copies of genes encoding keratin 18, glyceraldehyde-3-phosphate dehydrogenase and ribosomal protein L21. A simple method was developed to estimate the level of nucleotide substitutions (and therefore the age) of PPs. A Poisson-like age distribution was obtained with a mean age close to that of the Alu repeats, the predominant human short interspersed elements. These data suggest a nearly simultaneous burst of PP and Alu formation in the genomes of ancestral primates. The peak period of amplification of these two distinct retrotransposons was estimated to be 40-50 million years ago. Concordant amplification of certain L1 subfamilies with PPs and Alus was observed. Conclusions: We suggest that a burst of formation of PPs and Alus occurred in the genome of ancestral primates. One possible mechanism is that proteins encoded by members of particular L1 subfamilies acquired an enhanced ability to recognize cytosolic RNAs in trans.

Journal ArticleDOI
TL;DR: The identification ofFXRβ as a novel functional receptor in nonprimate animals sheds new light on the species differences in cholesterol metabolism and has strong implications for the interpretation of genetic and pharmacological studies of FXR-directed physiologies and drug discovery programs.
Abstract: Nuclear receptors are ligand-modulated transcription factors. On the basis of the completed human genome sequence, this family was thought to contain 48 functional members. However, by mining human and mouse genomic sequences, we identified FXRβ as a novel family member. It is a functional receptor in mice, rats, rabbits, and dogs but constitutes a pseudogene in humans and primates. Murine FXRβ is widely coexpressed with FXR in embryonic and adult tissues. It heterodimerizes with RXRα and stimulates transcription through specific DNA response elements upon addition of 9-cis-retinoic acid. Finally, we identified lanosterol as a candidate endogenous ligand that induces coactivator recruitment and transcriptional activation by mFXRβ. Lanosterol is an intermediate of cholesterol biosynthesis, which suggests a direct role in the control of cholesterol biosynthesis in nonprimates. The identification of FXRβ as a novel functional receptor in nonprimate animals sheds new light on the species differences in cholesterol metabolism and has strong implications for the interpretation of genetic and pharmacological studies of FXR-directed physiologies and drug discovery programs.

Journal ArticleDOI
TL;DR: The fugu (pufferfish) genome has been sequenced, and a second genome assembly was released 17 May 2002, and all P450 genes and pseudogenes in the available fugu sequence data have been identified, compared to human P450s, and assigned names.