scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2000"


Journal ArticleDOI
01 Sep 2000-Genetics
TL;DR: Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is specific to the X chromosome.
Abstract: Many previous estimates of the mutation rate in humans have relied on screens of visible mutants. We investigated the rate and pattern of mutations at the nucleotide level by comparing pseudogenes in humans and chimpanzees to (i) provide an estimate of the average mutation rate per nucleotide, (ii) assess heterogeneity of mutation rate at different sites and for different types of mutations, (iii) test the hypothesis that the X chromosome has a lower mutation rate than autosomes, and (iv) estimate the deleterious mutation rate. Eighteen processed pseudogenes were sequenced, including 12 on autosomes and 6 on the X chromosome. The average mutation rate was estimated to be approximately 2.5 x 10(-8) mutations per nucleotide site or 175 mutations per diploid genome per generation. Rates of mutation for both transitions and transversions at CpG dinucleotides are one order of magnitude higher than mutation rates at other sites. Single nucleotide substitutions are 10 times more frequent than length mutations. Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is specific to the X chromosome. Using conservative calculations of the proportion of the genome subject to purifying selection, we estimate that the genomic deleterious mutation rate (U) is at least 3. This high rate is difficult to reconcile with multiplicative fitness effects of individual mutations and suggests that synergistic epistasis among harmful mutations may be common.

1,217 citations


Journal ArticleDOI
20 Nov 2000-Oncogene
TL;DR: The completion of the human tyrosine kinase family tree provides a framework for further advances in biomedical science and identifies several novel genes and enabled the creation of a nonredundant catalog of tyrosines kinase genes.
Abstract: As the sequencing of the human genome is completed by the Human Genome Project, the analysis of this rich source of information will illuminate many areas in medicine and biology. The protein tyrosine kinases are a large multigene family with particular relevance to many human diseases, including cancer. A search of the human genome for tyrosine kinase coding elements identified several novel genes and enabled the creation of a nonredundant catalog of tyrosine kinase genes. Ninety unique kinase genes can be identified in the human genome, along with five pseudogenes. Of the 90 tyrosine kinases, 58 are receptor type, distributed into 20 subfamilies. The 32 nonreceptor tyrosine kinases can be placed in 10 subfamilies. Additionally, mouse orthologs can be identified for nearly all the human tyrosine kinases. The completion of the human tyrosine kinase family tree provides a framework for further advances in biomedical science.

1,103 citations


Journal ArticleDOI
TL;DR: It is shown–by introducing deletions within either coding sequence of the human LINE–that both ORFs are necessary for the formation of the processed pseudogenes, and that retroviral-like elements are not able to produce similar structures in the same assay, strengthening the unique versatility of LINEs as genome modellers.
Abstract: Long interspersed elements (LINEs) are endogenous mobile genetic elements that have dispersed and accumulated in the genomes of higher eukaryotes via germline transposition, with up to 100,000 copies in mammalian genomes. In humans, LINEs are the major source of insertional mutagenesis, being involved in both germinal and somatic mutant phenotypes. Here we show that the human LINE retrotransposons, which transpose through the reverse transcription of their own transcript, can also mobilize transcribed DNA not associated with a LINE sequence by a process involving the diversion of the LINE enzymatic machinery by the corresponding mRNA transcripts. This results in the 'retroposition' of the transcribed gene and the formation of new copies that disclose features characteristic of the widespread and naturally occurring processed pseudogenes: loss of intron and promoter, acquisition of a poly(A) 3' end and presence of target-site duplications of varying length. We further show-by introducing deletions within either coding sequence of the human LINE-that both ORFs are necessary for the formation of the processed pseudogenes, and that retroviral-like elements are not able to produce similar structures in the same assay. Our results strengthen the unique versatility of LINEs as genome modellers.

789 citations


Journal Article
TL;DR: A single present-day representative of the Toll-like proteins in Drosophila has striking cytoplasmic domain homology to mammalian Tlrs within the cluster that embraces TLRs 1, 2, 4, and 6, which would suggest that an ancestral (pre-vertebrate) Tlr may have adopted a pro-inflammatory function 500 million years ago.
Abstract: We describe three novel genes, encoding members of the Toll-like receptor (Tlr) family (TLR7, TLR8, and TLR9). These Tlr family members, unlike others reported to date, were identified within a genomic database. TLR7 and TLR8 each have three exons, two of which have coding function, and lie in close proximity to one another at Xp22, alongside a pseudogene. The remaining gene (TLR9) resides at 3p21.3 (in linkage with the MyD88 gene), and is expressed in at least two splice forms, one of which is monoexonic and one of which is biexonic, the latter encoding a protein with 57 additional amino acids at the N-terminus. The novel Tlrs comprise a cluster as nearest phylogenetic neighbors. Combining all sequence data related to Toll-like receptors, we have drawn several inferences concerning the phylogeny of vertebrate and invertebrate Tlrs. According to our best estimates, mammalian TLRs 1 and 6 diverged from a common mammalian ancestral gene 95 million years ago. TLR4, which encodes the endotoxin sensor in present-day mammals, emerged as a distinct entity 180 million years ago. TLRs 3 and 5 diverged from a common ancestral gene approximately 150 million years ago, as did Tlr7 and Tlr8. Very likely, fewer Tlrs existed during early vertebrate evolution: at most three or four were transmitted with the primordial vertebrate line. Phylogenetic data that we have adduced in the course of this work also suggest the existence of a Drosophila equivalent of MyD88, and indicate that the plasma membrane protein SIGIRR is close functional relative of MyD88 in mammals. Finally, a single present-day representative of the Toll-like proteins in Drosophila has striking cytoplasmic domain homology to mammalian Tlrs within the cluster that embraces TLRs 1, 2, 4, and 6. This would suggest that an ancestral (pre-vertebrate) Tlr may have adopted a pro-inflammatory function 500 million years ago.

504 citations


Journal ArticleDOI
TL;DR: The complete nucleotide sequence of the mitochondrial genome of an angiosperm, sugar beet (Beta vulgaris cv TK81-O) is determined and a novel tRNA(Cys) gene (trnC2-GCA) is identified which shows no sequence homology with any tRNAs reported so far in higher plants.
Abstract: We determined the complete nucleotide sequence of the mitochondrial genome of an angiosperm, sugar beet (Beta vulgaris cv TK81-O). The 368 799 bp genome contains 29 protein, five rRNA and 25 tRNA genes, most of which are also shared by the mitochondrial genome of Arabidopsis thaliana, the only other completely sequenced angiosperm mitochondrial genome. However, four genes identified here (namely rps13, trnF-GAA, ccb577 and trnC2-GCA) are missing in Arabidopsis mitochondria. In addition, four genes found in Arabidopsis (ccb228, rpl2, rpl16 and trnY2-GUA) are entirely absent in sugar beet or present only in severely truncated form. Introns, duplicated sequences, additional reading frames and inserted foreign sequences (chloroplast, nuclear and plasmid DNA sequences) contribute significantly to the overall size of the sugar beet mitochondrial genome. Nevertheless, 55.6% of the genome has no obvious features of information. We identified a novel tRNA(Cys) gene (trnC2-GCA) which shows no sequence homology with any tRNA(Cys) genes reported so far in higher plants. Intriguingly, this tRNA gene is actually transcribed into a mature tRNA, whereas the native tRNA(Cys) gene (trnC1-GCA) is most likely a pseudogene.

323 citations


Journal ArticleDOI
11 Feb 2000-Science
TL;DR: The indel spectrum in Laupala crickets, which have a genome size 11 times larger than that of Drosophila, is examined to test the hypothesis that some variation in genome size can be attributed to differences in the patterns of insertion and deletion (indel) mutations among organisms.
Abstract: Eukaryotic genome sizes range over five orders of magnitude. This variation cannot be explained by differences in organismic complexity (the C value paradox). To test the hypothesis that some variation in genome size can be attributed to differences in the patterns of insertion and deletion (indel) mutations among organisms, this study examines the indel spectrum in Laupala crickets, which have a genome size 11 times larger than that of Drosophila. Consistent with the hypothesis, DNA loss is more than 40 times slower in Laupala than in Drosophila.

322 citations


Journal ArticleDOI
TL;DR: An integrated physical, genetic, and transcriptional map of the WMS and flanking regions is generated using multicolor metaphase and interphase fluorescence in situ hybridization of bacterial artificial chromosomes and P1 artificial chromosomes, BAC end sequencing, PCR gene marker and microsatellite, large-scale sequencing, cDNA library, and database analyses, which establish regions and consequent gene candidates for WMS features including mental retardation, hypersociability, and facial features.
Abstract: Williams syndrome (WMS) is a most compelling model of human cognition, of human genome organization, and of evolution. Due to a deletion in chromosome band 7q11.23, subjects have cardiovascular, connective tissue, and neurodevelopmental deficits. Given the striking peaks and valleys in neurocognition including deficits in visual-spatial and global processing, preserved language and face processing, hypersociability, and heightened affect, the goal of this work has been to identify the genes that are responsible, the cause of the deletion, and its origin in primate evolution. To do this, we have generated an integrated physical, genetic, and transcriptional map of the WMS and flanking regions using multicolor metaphase and interphase fluorescence in situ hybridization (FISH) of bacterial artificial chromosomes (BACs) and P1 artificial chromosomes (PACs), BAC end sequencing, PCR gene marker and microsatellite, large-scale sequencing, cDNA library, and database analyses. The results indicate the genomic organization of the WMS region as two nested duplicated regions flanking a largely single-copy region. There are at least two common deletion breakpoints, one in the centromeric and at least two in the telomeric repeated regions. Clones anchoring the unique to the repeated regions are defined along with three new pseudogene families. Primate studies indicate an evolutionary hot spot for chromosomal inversion in the WMS region. A cognitive phenotypic map of WMS is presented, which combines previous data with five further WMS subjects and three atypical WMS subjects with deletions; two larger (deleted for D7S489L) and one smaller, deleted for genes telomeric to FZD9, through LIMK1, but not WSCR1 or telomeric. The results establish regions and consequent gene candidates for WMS features including mental retardation, hypersociability, and facial features. The approach provides the basis for defining pathways linking genetic underpinnings with the neuroanatomical, functional, and behavioral consequences that result in human cognition.

292 citations


Journal ArticleDOI
TL;DR: Neither frequent deamination of 5-methylcytosines nor interchromosomal gene conversion may account for the high mutation rate of the NF1 gene, as determined in this study.
Abstract: More than 500 unrelated patients with neurofibromatosis type 1 (NF1) were screened for mutations in the NF1 gene. For each patient, the whole coding sequence and all splice sites were studied for aberrations, either by the protein truncation test (PTT), temperature-gradient gel electrophoresis (TGGE) of genomic PCR products, or, most often, by direct genomic sequencing (DGS) of all individual exons. A total of 301 sequence variants, including 278 bona fide pathogenic mutations, were identified. As many as 216 or 183 of the genuine mutations, comprising 179 or 161 different ones, can be considered novel when compared to the recent findings of Upadhyaya and Cooper, or to the NNFF mutation database. Mutation-detection efficiencies of the various screening methods were similar: 47.1% for PTT, 53.7% for TGGE, and 54.9% for DGS. Some 224 mutations (80.2%) yielded directly or indirectly premature termination codons. These mutations showed even distribution over the whole gene from exon 1 to exon 47. Of all sequence variants determined in our study, T or G-->A transitions within a CpG dinucleotide, and only six different mutations also occur in NF1 pseudogenes, with five being typical C-->T transitions in a CpG. Thus, neither frequent deamination of 5-methylcytosines nor interchromosomal gene conversion may account for the high mutation rate of the NF1 gene. As opposed to the truncating mutations, the 28 (10.1%) missense or single-amino-acid-deletion mutations identified clustered in two distinct regions, the GAP-related domain (GRD) and an upstream gene segment comprising exons 11-17. The latter forms a so-called cysteine/serine-rich domain with three cysteine pairs suggestive of ATP binding, as well as three potential cAMP-dependent protein kinase (PKA) recognition sites obviously phosphorylated by PKA. Coincidence of mutated amino acids and those conserved between human and Drosophila strongly suggest significant functional relevance of this region, with major roles played by exons 12a and 15 and part of exon 16.

286 citations


Journal ArticleDOI
TL;DR: It is hypothesized that under relaxed selective constraints, primates would have progressively accumulated pseudogenes with the highest level seen in hominoids, which could parallel the evolution of the olfactory sensory function.
Abstract: Olfactory receptors (ORs) located in the cell membrane of olfactory sensory neurons of the nasal epithelium are responsible for odor detection by binding specific odorant ligands. Primates are thought to have a reduced sense of smell (microsmatic) with respect to other mammals such as dogs or rodents. We have previously demonstrated that over 70% of the human OR genes have become nonfunctional pseudogenes, leading us to hypothesize that the reduced sense of smell could correlate with the loss of functional genes. To extend these results, we sampled the OR gene repertoire of 10 primate species, from prosimian lemur to human, in addition to mouse. About 221 previously unidentified primate sequences and 33 mouse sequences were analyzed. These sequences encode ORs distributed in seven families and 56 subfamilies. Analysis showed a high fraction (≈50% on average) of pseudogenes in hominoids. In contrast, only ≈27% of OR genes are pseudogenes in Old World monkeys, and New World monkeys are almost free of pseudogenes. The prosimian branch seems to have evolved differently from the other primates and has ≈37% pseudogene content. No pseudogenes were found in mouse. With the exception of New World monkeys, we demonstrate that primates have a high fraction of OR pseudogenes compared with mouse. We hypothesize that under relaxed selective constraints, primates would have progressively accumulated pseudogenes with the highest level seen in hominoids. The fraction of pseudogenes in the OR gene repertoire could parallel the evolution of the olfactory sensory function.

264 citations


Journal ArticleDOI
TL;DR: Analysis of the different duplicons involved in human genomic disorders identifies features that may predispose to recombination, including large size and high sequence identity between the recombining copies, putative recombination promoting features, and the presence of multiple genes/pseudogenes that may include genes expressed in germ cells.
Abstract: Chromosome-specific low-copy repeats, or duplicons, occur in multiple regions of the human genome. Homologous recombination between different duplicon copies leads to chromosomal rearrangements, such as deletions, duplications, inversions, and inverted duplications, depending on the orientation of the recombining duplicons. When such rearrangements cause dosage imbalance of a developmentally important gene(s), genetic diseases now termed genomic disorders result, at a frequency of 0.7-1/1000 births. Duplicons can have simple or very complex structures, with variation in copy number from 2 to >10 repeats, and each varying in size from a few kilobases in length to hundreds of kilobases. Analysis of the different duplicons involved in human genomic disorders identifies features that may predispose to recombination, including large size and high sequence identity between the recombining copies, putative recombination promoting features, and the presence of multiple genes/pseudogenes that may include genes expressed in germ cells. Most of the chromosome rearrangements involve duplicons near pericentromeric regions, which may relate to the propensity of such regions to accumulate duplicons. Detailed analyses of the structure, polymorphic variation, and mechanisms of recombination in genomic disorders, as well as the evolutionary origin of various duplicons will further our understanding of the structure, function, and fluidity of the human genome.

259 citations


Journal ArticleDOI
01 May 2000-Genetics
TL;DR: The data indicate remarkably rapid evolution of R-gene homologues during diversification of plant families and the existence of a null allele within Lycopersicon.
Abstract: The presence of a single resistance (R) gene allele can determine plant disease resistance. The protein products of such genes may act as receptors that specifically interact with pathogen-derived factors. Most functionally defined R-genes are of the nucleotide binding site-leucine rich repeat (NBS-LRR) supergene family and are present as large multigene families. The specificity of R-gene interactions together with the robustness of plant-pathogen interactions raises the question of their gene number and diversity in the genome. Genomic sequences from tomato showing significant homology to genes conferring race-specific resistance to pathogens were identified by systematically "scanning" the genome using a variety of primer pairs based on ubiquitous NBS motifs. Over 70 sequences were isolated and 10% are putative pseudogenes. Mapping of the amplified sequences on the tomato genetic map revealed their organization as mixed clusters of R-gene homologues that showed in many cases linkage to genetically characterized tomato resistance loci. Interspecific examination within Lycopersicon showed the existence of a null allele. Consideration of the tomato and potato comparative genetic maps unveiled conserved syntenic positions of R-gene homologues. Phylogenetic clustering of R-gene homologues within tomato and other Solanaceae family members was observed but not with R-gene homologues from Arabidopsis thaliana. Our data indicate remarkably rapid evolution of R-gene homologues during diversification of plant families.

Journal ArticleDOI
TL;DR: Analysis of the first identification of OR sequences from a marsupial and a monotreme suggests that the ancestral mammal had a small OR repertoire, which expanded independently in all three mammalian subclasses.
Abstract: The vertebrate olfactory receptor (OR) subgenome har- bors the largest known gene family, which has been expanded by the need to provide recognition capacity for millions of potential odorants. We implemented an automated procedure to identify all OR coding regions from published sequences. This led us to the identification of 831 OR coding regions (including pseudogenes) from 24 vertebrate species. The resulting dataset was subjected to neighbor-joining phylogenetic analysis and classified into 32 dis- tinct families, 14 of which include only genes from tetrapodan species (Class II ORs). We also report here the first identification of OR sequences from a marsupial (koala) and a monotreme (plat- ypus). Analysis of these OR sequences suggests that the ancestral mammal had a small OR repertoire, which expanded indepen- dently in all three mammalian subclasses. Classification of "fish- like" (Class I) ORs indicates that some of these ancient ORs were maintained and even expanded in mammals. A nomenclature system for the OR gene superfamily is pro- posed, based on a divergence evolutionary model. The nomencla- ture consists of the root symbol 'OR', followed by a family nu- meral, subfamily letter(s), and a numeral representing the indi- vidual gene within the subfamily. For example, OR3A1 is an OR gene of family 3, subfamily A, and OR7E12P is an OR pseudo- gene of family 7, subfamily E. The symbol is to be preceded by a species indicator. We have assigned the proposed nomenclature symbols for all 330 human OR genes in the database. A WWW tool for automated name assignment is provided.

Journal ArticleDOI
TL;DR: This analysis compared the structure and expression of genes with or without known retropseudogenes to reject previous hypotheses that widely expressed genes are GC rich, and suggest that genes with a wide tissue distribution are GC poor.
Abstract: The human genome is estimated to contain 23,000 to 33,000 retropseudogenes. To study the properties of genes giving rise to these retroelements, we compared the structure and expression of genes with or without known retropseudogenes. Four main features have emerged from the analysis of 181 genes associated to retropseudogenes: Reverse-transcribed genes are (1) widely expressed, (2) highly conserved, (3) short, and (4) GC-poor. The first two properties probably reflect the fact that genes giving rise to retropseudogenes have to be expressed in the germ-line. The two latter points suggest that reverse-transcription and transposition is more efficient for short GC-poor mRNAs. In addition, this analysis allowed us to reject previous hypotheses that widely expressed genes are GC rich. Rather, globally, genes with a wide tissue distribution are GC poor.

Journal ArticleDOI
30 Dec 2000-Gene
TL;DR: A generalized evolutionary process with genes having the potential to capture neighboring sequences and use them as functional exons may be represented by the human cytochrome P450 3A (CYP3A) locus, which is mapped using a bacterial artificial chromosome clone.

Journal ArticleDOI
TL;DR: It was established that there have been at least 12 separate mtDNA integrations into P. pedestris nuclear genomes, which is the highest reported rate of horizontal transfer between organellar and nuclear genomes within a single animal species.
Abstract: Multiple copies of mitochondrial-like DNA were found in the brown mountain grasshopper, Podisma pedestris (Orthoptera: Acrididae), paralogous to COI and ND5 regions. The same was discovered using the ND5 regions of nine other grasshopper species from four separate subfamilies (Podisminae, Calliptaminae, Cyrtacanthacridinae, and Gomphocerinae). The extra ND5-like sequences were shown to be nuclear in the desert locust, Schistocerca gregaria (Cyrtacanthacridinae), and probably so in P. pedestris and an Italopodisma sp. (Podisminae). Eighty-seven different ND5-like nuclear mitochondrial pseudogenes (Numts) were sequenced from 12 grasshopper individuals. Different nuclear mitochondrial pseudogenes, if descended from the same mitochondrial immigrant, will have diverged from each other under no selective constraints because of their loss of functionality. Evidence of selective constraints in the differences between any two Numt sequences (e.g., if most differences are at third positions of codons) implies that they have separate mitochondrial origins. Through pairwise comparisons of pseudogene sequences, it was established that there have been at least 12 separate mtDNA integrations into P. pedestris nuclear genomes. This is the highest reported rate of horizontal transfer between organellar and nuclear genomes within a single animal species. The occurrence of numerous mitochondrial pseudogenes in nuclear genomes derived from separate integration events appears to be a common phenomenon among grasshoppers. More than one type of mechanism appears to have been involved in generating the observed grasshopper Numts.

Journal ArticleDOI
TL;DR: Deep deficiencies in complex II activity resulting from mutations in the Fp gene of the SDH present only as LS are reported, a striking observation in view of the ubiquitous expression of this typical housekeeping gene in humans.
Abstract: Succinate dehydrogenase (SDH) deficiency represents a minor cause of Leigh syndrome (LS) Noticeably, the first mutation in a nuclear-encoded respiratory chain component, a mutation in the 5p15 copy of the flavoprotein (Fp) subunit gene of the SDH, was reported 4 years ago in two siblings with LS and SDH deficiency We now report a new patient with LS and SDH deficiency Because two copies of the Fp gene are present in the human genome, we first determined the complete structure of these two copies This allowed us to identify a 1 bp deletion creating a frameshift in the 3q29 copy, confirming that this second copy was a pseudogene We also sequenced the promoter region of the 5p15 gene and, in addition, screened for mutations in the patient Sequencing of the Fp SDH cDNA in the patient only allowed us to identify a heterozygous C to T transition, changing an alanine to a valine in one allele This transition was found to be heterozygous in the patient's father but was absent from 150 controls Transfection of the corresponding mutant cDNA into human Fp-deficient cells failed to restore normal SDH activity, confirming the deleterious effect of this mutation The second allele, inherited from the mother, carried an A to C substitution changing the methionine translation initiation codon to a leucine This mutant transcript represented only 10% of total Fp transcript suggesting instability of this transcript So far, profound deficiencies in complex II activity resulting from mutations in the Fp gene of the SDH present only as LS, a striking observation in view of the ubiquitous expression of this typical housekeeping gene in humans

Journal ArticleDOI
TL;DR: The srh family of chemoreceptors in the nematode Caenorhabditis elegans is very large, containing 214 genes and 90 pseudogenes, and consideration of all deletions and terminal truncations of srh pseudogene reveals that large deletions are common.
Abstract: The srh family of chemoreceptors in the nematode Caenorhabditis elegans is very large, containing 214 genes and 90 pseudogenes. It is related to the str, stl, and srd families of seven-transmembrane or serpentine receptors. Like these three families, most srh genes are concentrated on chromosome V, and mapping of their chromosomal locations on a phylogenetic tree reveals 27 different movements of genes to other chromosomes. Mapping of intron gains and losses onto the phylogenetic tree reveals that the last common ancestral gene of the family had five introns, which are inferred to have been lost 70 times independently during evolution of the family. In addition, seven intron gains are revealed, three of which are fairly recent. Comparisons with 20 family members in the C. briggsae genome confirms these patterns, including two intron losses in C. briggsae since the species split. There are 14 clear C. elegans orthologs for these 20 genes, whose average amino acid divergence of 68% allows estimation of 85 gene duplications in the C. elegans lineage since the species split. The absence of six orthologs in C. elegans also indicates that gene loss occurs; consideration of all deletions and terminal truncations of srh pseudogenes reveals that large deletions are common. Together these observations provide insight into the evolutionary dynamics of this compact animal genome.

Journal ArticleDOI
TL;DR: The observations suggest that the members of the ubiquitin gene family evolve almost independently by silent nucleotide substitution and are subjected to birth-and-death evolution at the DNA level.
Abstract: Ubiquitin is a highly conserved protein that is encoded by a multigene family. It is generally believed that this gene family is subject to concerted evolution, which homogenizes the member genes of the family. However, protein homogeneity can be attained also by strong purifying selection. We therefore studied the proportion (pS) of synonymous nucleotide differences between members of the ubiquitin gene family from 28 species of fungi, plants, and animals. The results have shown that pS is generally very high and is often close to the saturation level, although the protein sequence is virtually identical for all ubiquitins from fungi, plants, and animals. A small proportion of species showed a low level of pS values, but these values appeared to be caused by recent gene duplication. It was also found that the number of repeat copies of the gene family varies considerably with species, and some species harbor pseudogenes. These observations suggest that the members of this gene family evolve almost independently by silent nucleotide substitution and are subjected to birth-anddeath evolution at the DNA level.

Journal ArticleDOI
TL;DR: The striking similarity between the evolutionary patterns of the EAR genes and those of the major histocompatibility complex, immunoglobulin, and T cell receptor genes stands in strong support of the hypothesis that host-defense and generation of diversity are among the primary physiological function of the rodent EARs.
Abstract: The mammalian RNase A superfamily comprises a diverse array of ribonucleolytic proteins that have a variety of biochemical activities and physiological functions Two rapidly evolving RNases of higher primates are of particular interest as they are major secretory proteins of eosinophilic leukocytes and have been found to possess anti-pathogen activities in vitro To understand how these RNases acquired this function during evolution and to develop animal models for the study of their functions in vivo, it is necessary to investigate these genes in many species Here, we report the sequences of 38 functional genes and 23 pseudogenes of the eosinophil-associated RNase (EAR) family from 5 rodent species Our phylogenetic analysis of these genes showed a clear pattern of evolution by a rapid birth-and-death process and gene sorting, a process characterized by rapid gene duplication and deactivation occurring differentially among lineages This process ultimately generates distinct or only partially overlapping inventories of the genes, even in closely related species Positive Darwinian selection also contributed to the diversification of these EAR genes The striking similarity between the evolutionary patterns of the EAR genes and those of the major histocompatibility complex, immunoglobulin, and T cell receptor genes stands in strong support of the hypothesis that host-defense and generation of diversity are among the primary physiological function of the rodent EARs The discovery of a large number of divergent EARs suggests the intriguing possibility that these proteins have been specifically tailored to fight against distinct rodent pathogens

Journal ArticleDOI
TL;DR: Examination of sequential clones of M. synoviae established that unidirectional recombination occurs between the pseudogenes and the expressed vlhA, with duplication of pseudogene sequence and loss of the corresponding region previously seen in the expressed gene, suggesting that completely distinct mechanisms have been adopted to control antigenic variation in homologous gene families.
Abstract: High-frequency phase and antigenic variation of homologous lipoprotein haemagglutinins has been seen in both the major avian mycoplasma pathogens, Mycoplasma synoviae and Mycoplasma gallisepticum. The expression and, hence, antigenic variation of the pMGA gene family (encoding these lipoproteins in M. gallisepticum) is controlled by variation in the length of a trinucleotide repeat motif 5' to the promoter of each gene. However, such a mechanism was not detected in preliminary observations on M. synoviae. Thus, the basis for control of variation in the vlhA gene family (which encodes the homologous haemagglutinin in M. synoviae) was investigated to enable comparison with its homologue in M. gallisepticum and with other lipoprotein gene families in mycoplasmas. The start point of transcription was identified 119 bp upstream of the initiation codon, but features associated with control of transcription in other mycoplasma lipoprotein genes were not seen. Comparison of three copies of vlhA revealed considerable sequence divergence at the 3' end of the gene, but conservation of the 5' end. Southern blot analysis of M. synoviae genomic DNA revealed that the promoter region and part of the conserved 5' coding sequence occurred as a single copy, whereas the remainder of the coding sequence occurred as multiple copies. A 9.7 kb fragment of the genome was found to contain eight tandemly repeated regions partially homologous to vlhA, all lacking the putative promoter region and the single-copy 5' end of vlhA, but extending over one of four distinct overlapping regions of the 3' coding sequence. Examination of sequential clones of M. synoviae established that unidirectional recombination occurs between the pseudogenes and the expressed vlhA, with duplication of pseudogene sequence and loss of the corresponding region previously seen in the expressed gene. Expression of the 5' end of two variants of the vlhA gene showed that they differed in their reaction with monoclonal antibodies specific for this region. These data suggest that the control of vlhA antigenic variation in M. synoviae is achieved by multiple gene conversion events using a repertoire of coding sequences to generate a chimeric expressed gene, with the greatest potential for variation generated in the region encoding the haemagglutinin. Thus, completely distinct mechanisms have been adopted to control antigenic variation in homologous gene families.

Journal ArticleDOI
TL;DR: The present work characterizes adjacent genes encoding novel serine proteases, termed γ-tryptases, and generates a refined map of the multitryptase locus, which suggests their unique features suggest possibly novel functions.
Abstract: Previously, this laboratory identified clusters of α-, β-, and mast cell protease-7-like tryptase genes on human chromosome 16p13.3. The present work characterizes adjacent genes encoding novel serine proteases, termed γ-tryptases, and generates a refined map of the multitryptase locus. Each γ gene lies between an α1H Ca 2+ channel gene ( CACNA1H ) and a βII- or βIII-tryptase gene and is ∼30 kb from polymorphic minisatellite MS205. The tryptase locus also contains at least four tryptase-like pseudogenes, including mastin, a gene expressed in dogs but not in humans. Genomic DNA blotting results suggest that γI- and γII-tryptases are alleles at the same site. βII- and βIII-tryptases appear to be alleles at a neighboring site, and αII- and βI-tryptases appear to be alleles at a third site. γ-Tryptases are transcribed in lung, intestine, and in several other tissues and in a mast cell line (HMC-1) that also expresses γ-tryptase protein. Immunohistochemical analysis suggests that γ-tryptase is expressed by airway mast cells. γ-Tryptase catalytic domains are ∼48% identical with those of known mast cell tryptases and possess mouse homologues. We predict that γ-tryptases are glycosylated oligomers with tryptic substrate specificity and a distinct mode of activation. A feature not found in described tryptases is a C-terminal hydrophobic domain, which may be a membrane anchor. Although the catalytic domains contain tryptase-like features, the hydrophobic segment and intron-exon organization are more closely related to another recently described protease, prostasin. In summary, this work describes γ-tryptases, which are novel members of chromosome 16p tryptase/prostasin gene families. Their unique features suggest possibly novel functions.

Journal ArticleDOI
23 Dec 2000-Gene
TL;DR: In eukaryote genomes, many examples of current gene families suggest that both drift and selection are at work on their evolution, including those with uniform members and those with diverse functions.

Journal ArticleDOI
15 Mar 2000-Blood
TL;DR: A sequence analysis of 28 unrelated, racially diverse A47 degrees CGD patients and 37 healthy individuals concluded that recombination events between the p47-phox gene and its highly homologous pseudogenes result in the incorporation of triangle upGT into the p 47- phox gene, thereby leading to the high frequency of GT deletion in A47 DegreesCGD patients.

Journal ArticleDOI
15 Jan 2000-Genomics
TL;DR: The olfactory receptor (OR) gene cluster on human chromosome 17p13.3 was subjected to mixed shotgun automated DNA sequencing and revealed a common gene structure with an intronless coding region and at least one upstream noncoding exon.

Journal ArticleDOI
17 Oct 2000-Gene
TL;DR: Several protease genes expressed in skin show higher expression levels in psoriatic lesion samples than in non-lesional skin samples from the same patient, suggesting that the imbalance of a complex protease cascade in skin may contribute to the pathology of disease.

Journal ArticleDOI
TL;DR: Two distal enhancers that specify apolipoprotein (apo) E gene expression in isolated macrophages and adipose tissue were identified in transgenic mice that were generated with constructs of the human apoE/C-I/ C-I′ /C-IV/ c-II gene cluster, suggesting that their activity in specific cell types may be determined by common regulatory elements.

Journal ArticleDOI
TL;DR: It is indicated that transcription, or its lack, are consistently associated with particular variants of this gene and substitution within a putative binding site for the transcription factor acute myeloid leukemia gene 1 could determine the lack of expression for some KIR2DL5 variants.
Abstract: Two variants of the novel KIR2DL5 gene (KIR2DL5.1 and.2) were identified in genomic DNA of a single donor. However, only the KIR2DL5.1 variant was transcribed in PBMC. In this study, analysis of seven additional donors reveals two new variants of the KIR2DL5 gene and indicates that transcription, or its lack, are consistently associated with particular variants of this gene. Comparison of the complete nucleotide sequences of the exons and introns of KIR2DL5.1 and KIR2DL5.2 reveals no structural abnormalities, but similar open reading frames for both variants. In contrast, the promoter region of KIR2DL5 shows a high degree of sequence polymorphism that is likely relevant for expression. Substitution within a putative binding site for the transcription factor acute myeloid leukemia gene 1 could determine the lack of expression for some KIR2DL5 variants.

Journal ArticleDOI
TL;DR: Pseudogene transcription was supported by CAT reporter gene assays showing substantial promoter activity of 5'-flanking regions of two TPT1 pseudogenes and an extensive transcriptional control and involvement of tissue-specific factors.
Abstract: In humans and rabbits, the TPT1 gene encoding the translationally controlled tumour protein TCTP generates two mRNAs (TCTP mRNA1 and TCTP mRNA2) which differ in the length of their 3′ untranslated regions. The distribution of these mRNAs was investigated in 10 rabbit and 50 human tissues. They were transcribed in all tissues investigated, but differed considerably in their quantity and ratio of expression. This indicates an extensive transcriptional control and involvement of tissue-specific factors. In the rabbit genome numerous processed, intronless pseudogenes were detected. Four, corresponding to both types of mRNAs, were sequenced and analysed in detail; all displayed only few mutations and were either preserved completely in the original amino acid sequence of the intron containing gene, or contained only minor mutations in the coding region which did not interrupt the open reading frame. In the mRNA population of rabbit reticulocytes two additional TCTP RNAs of the TCTP mRNA2 type were detected, which have the characteristics of pseudogene transcripts. Pseudogene transcription was supported further by CAT reporter gene assays showing substantial promoter activity of 5′-flanking regions of two TPT1 pseudogenes.


Journal ArticleDOI
TL;DR: A model in which the bodies of the genes are predominantly methylated and thus insulated from the interaction with DNA-binding proteins is supported, which proposes to use DNA methylation studies in conjunction with large-scale sequencing approaches as a tool for the prediction of cis-acting genomic regions, for the identification of cryptic and potentially active CpG islands and for the preliminary distinction of genes and pseudogenes.
Abstract: Cytosine in CpG dinucleotides is frequently found to be methylated in the DNA of higher eukaryotes and differential methylation has been proposed to be a key element in the organization of gene expression in man. To address this question systematically, we used bisulfite genomic sequencing to study the methylation patterns of three X-linked genes and one autosomal pseudogene in two adult individuals and across nine different tissues. Two of the genes, SLC6A8 and MSSK1, are tissue-specifically expressed. CDM is expressed ubiquitously. The pseudogene, ψSLC6A8, is exclusively expressed in the testis. The promoter regions of the SLC6A8, MSSK1 and CDM genes were found to be essentially unmethylated in all tissues, regardless of their relative expression level. In contrast, the pseudogene ψSLC6A8 shows high methylation of the CpG islands in all somatic tissues but complete demethylation in testis. Methylation profiles in different tissues are similar in shape but not identical. The data for the two investigated individuals suggest that methylation profiles of individual genes are tissue specific. Taken together, our findings support a model in which the bodies of the genes are predominantly methylated and thus insulated from the interaction with DNA-binding proteins. Only unmethylated promoter regions are accessible for binding and interaction. Based on this model we propose to use DNA methylation studies in conjunction with large-scale sequencing approaches as a tool for the prediction of cis-acting genomic regions, for the identification of cryptic and potentially active CpG islands and for the preliminary distinction of genes and pseudogenes.