scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2001"


Journal ArticleDOI
22 Feb 2001-Nature
TL;DR: Comparing the 3.27-megabase genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis provides clear explanations for these properties and reveals an extreme case of reductive evolution.
Abstract: Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

1,620 citations


Journal ArticleDOI
TL;DR: Methods for avoiding Numts have now been tested, and several recent studies demonstrate the potential utility of Numt DNA sequences in evolutionary studies.
Abstract: Nuclear copies of mitochondrial DNA (mtDNA) have contaminated PCR-based mitochondrial studies of over 64 different animal species. Since the last review of these nuclear mitochondrial pseudogenes (Numts) in animals, Numts have been found in 53 of the species studied. The recent evidence suggests that Numts are not equally abundant in all species, for example they are more common in plants than in animals, and also more numerous in humans than in Drosophila. Methods for avoiding Numts have now been tested, and several recent studies demonstrate the potential utility of Numt DNA sequences in evolutionary studies. As relics of ancient mtDNA, these pseudogenes can be used to infer ancestral states or root mitochondrial phylogenies. Where they are numerous and selectively unconstrained, Numts are ideal for the study of spontaneous mutation in nuclear genomes.

1,055 citations


Journal ArticleDOI
TL;DR: The evidence showing that deletional bias is a major force that shapes bacterial genomes is discussed, in which dramatic reductions in genome size can result not from selection to lose DNA, but from decreased selection to maintain gene functionality.

770 citations


Journal ArticleDOI
TL;DR: The results of this analysis suggest the following genome expansion history: first, the generation of a "tetrapod-specific" Class II OR cluster on chromosome 11 by local duplication, then a single-step duplication of this cluster to chromosome 1, and finally an avalanche of duplication events out of chromosome 1 to most other chromosomes.
Abstract: Olfactory receptors likely constitute the largest gene superfamily in the vertebrate genome. Here we present the nearly complete human olfactory subgenome elucidated by mining the genome draft with gene discovery algorithms. Over 900 olfactory receptor genes and pseudogenes (ORs) were identified, two-thirds of which were not annotated previously. The number of extrapolated ORs is in good agreement with previous theoretical predictions. The sequence of at least 63% of the ORs is disrupted by what appears to be a random process of pseudogene formation. ORs constitute 17 gene families, 4 of which contain more than 100 members each. "Fish-like" Class I ORs, previously considered a relic in higher tetrapods, constitute as much as 10% of the human repertoire, all in one large cluster on chromosome 11. Their lower pseudogene fraction suggests a functional significance. ORs are disposed on all human chromosomes except 20 and Y, and nearly 80% are found in clusters of 6-138 genes. A novel comparative cluster analysis was used to trace the evolutionary path that may have led to OR proliferation and diversification throughout the genome. The results of this analysis suggest the following genome expansion history: first, the generation of a "tetrapod-specific" Class II OR cluster on chromosome 11 by local duplication, then a single-step duplication of this cluster to chromosome 1, and finally an avalanche of duplication events out of chromosome 1 to most other chromosomes. The results of the data mining and characterization of ORs can be accessed at the Human Olfactory Receptor Data Exploratorium Web site (http://bioinfo.weizmann.ac.il/HORDE).

659 citations


Journal ArticleDOI
TL;DR: A model is proposed suggesting that Hyal-2 and HyAl-1 are the major mammalian hyaluronidases in somatic tissues, and that they act in concert to degrade high molecular weight hyalurin to the tetrasaccharide.

596 citations


Journal ArticleDOI
TL;DR: It is shown that a hitherto unknown L1Hs antisense promoter (ASP) drives the transcription of adjacent genes, and this type of transcriptional control may be widespread.
Abstract: In the human genome, retrotranspositionally competent long interspersed nuclear elements (L1Hs) are involved in the generation of processed pseudogenes and mobilization of unrelated sequences into existing genes. Transcription of each L1Hs is initiated from its internal promoter but may also be driven from the promoters of adjacent cellular genes. Here I show that a hitherto unknown L1Hs antisense promoter (ASP) drives the transcription of adjacent genes. The ASP is located in the L1Hs 5* untranslated region (5*UTR) and works in the opposite direction. Fifteen cDNAs, isolated from a human NTera2D1 cDNA library by a differential screening method, contained L1Hs 5*UTRs spliced to the sequences of known genes or non-proteincoding sequences. Four of these chimeric transcripts, selected for detailed analysis, were detected in total RNA of different cell lines. Their abundance accounted for roughly 1 to 500% of the transcripts of four known genes, suggesting a large variation in the efficiency of L1Hs ASP-driven transcription. ASP-directed transcription was also revealed from expressed sequence tag sequences and confirmed by using an RNA dot blot analysis. Nine of the 15 randomly selected genomic L1Hs 5*UTRs had ASP activities about 7- to 50-fold higher than background in transient transfection assays. ASP was assigned to the L1Hs 5*UTR between nucleotides 400 to 600 by deletion and mutation analysis. These results indicate that many L1Hs contain active ASPs which are capable of interfering with normal gene expression, and this type of transcriptional control may be widespread.

387 citations


Journal ArticleDOI
TL;DR: The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.
Abstract: The mammalian olfactory apparatus is able to recognize and distinguish thousands of structurally diverse volatile chemicals. This chemosensory function is mediated by a very large family of seven-transmembrane olfactory (odorant) receptors encoded by approximately 1,000 genes, the majority of which are believed to be pseudogenes in humans. The strategy of our sequence database mining for full-length, functional candidate odorant receptor genes was based on the high overall sequence similarity and presence of a number of conserved sequence motifs in all known mammalian odorant receptors as well as the absence of introns in their coding sequences. We report here the identification and physical cloning of 347 putative human full-length odorant receptor genes. Comparative sequence analysis of the predicted gene products allowed us to identify and define a number of consensus sequence motifs and structural features of this vast family of receptors. A new nomenclature for human odorant receptors based on their chromosomal localization and phylogenetic analysis is proposed. We believe that these sequences represent the essentially complete repertoire of functional human odorant receptors. The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.

327 citations


Journal ArticleDOI
TL;DR: It is concluded that the functional homology of human KIR and mouse Ly49 genes arose by convergent evolution and has interesting parallels with the major histocompatibility complex (MHC) in which some of the polymorphic genes are ligands for NK molecules.
Abstract: The two sets of inhibitory and activating natural killer (NK) receptor genes belong either to the Ig or to the C-type lectin superfamilies. Both are extensive and diverse, comprising genes of varying degrees of relatedness, indicative of a process of iterative duplication. We have constructed gene maps to help understand how and when NK receptor genes developed and the nature of their polymorphism. A cluster of over 15 C-type lectin genes, the natural killer complex is located on human chromosome 12p13.1, syntenic with a region in mouse that borders multiple Ly49 loci. The equivalent locus in man is occupied by a single pseudogene, LY49L. The immunoglobulin superfamily of loci, the leukocyte receptor complex (LRC), on chromosome 19q13.4, contains many polymorphic killer cell immunoglobulin-like receptor (KIR) genes as well as multiple related sequences. These include immunoglobulin-like transcript (ILT) (or leukocyte immunoglobulin-like receptor genes), leukocyte-associated inhibitory receptor genes (LAIR), NKp46, Fc alphaR and the platelet glycoprotein receptor VI locus, which encodes a collagen-binding molecule. KIRs are expressed mostly on NK cells and some T cells. The other LRC loci are more widely expressed. Further centromeric of the LRC are sets of additional loci with weak sequence similarity to the KIRs, including the extensive CD66(CEA) and Siglec families. The LRC-syntenic region in mice contains no orthologues of KIRs. Some of the KIR genes are highly polymorphic in terms of sequence as well as for presence/absence of genes on different haplotypes. Some anchor loci, such as KIR2DL4, are present on most haplotypes. A few ILT loci, such as ILT5 and ILT8, are polymorphic, but only ILT6 exhibits presence/absence variation. This knowledge of the genomic organisation of the extensive NK superfamilies underpins efforts to understand the functions of the encoded NK receptor molecules. It leads to the conclusion that the functional homology of human KIR and mouse Ly49 genes arose by convergent evolution. NK receptor immunogenetics has interesting parallels with the major histocompatibility complex (MHC) in which some of the polymorphic genes are ligands for NK molecules. There are hints of an ancient genetic relationship between NK receptor genes and MHC-paralogous regions on chromosomes 1, 9 and 19. The picture that emerges from both complexes is of eternal evolutionary restlessness, presumably in response to resistance to disease.

309 citations


Journal ArticleDOI
TL;DR: Assessment of cognate EST numbers suggests that r-protein gene family members are differentially expressed, which confirms extensive duplications of large chromosome fragments and sheds light on the evolutionary history of the Arabidopsis genome.
Abstract: Eukaryotic ribosomes are made of two components, four ribosomal RNAs, and approximately 80 ribosomal proteins (r-proteins). The exact number of r-proteins and r-protein genes in higher plants is not known. The strong conservation in eukaryotic r-protein primary sequence allowed us to use the well-characterized rat (Rattus norvegicus) r-protein set to identify orthologues on the five haploid chromosomes of Arabidopsis. By use of the numerous expressed sequence tag (EST) accessions and the complete genomic sequence of this species, we identified 249 genes (including some pseudogenes) corresponding to 80 (32 small subunit and 48 large subunit) cytoplasmic r-protein types. None of the r-protein genes are single copy and most are encoded by three or four expressed genes, indicative of the internal duplication of the Arabidopsis genome. The r-proteins are distributed throughout the genome. Inspection of genes in the vicinity of r-protein gene family members confirms extensive duplications of large chromosome fragments and sheds light on the evolutionary history of the Arabidopsis genome. Examination of large duplicated regions indicated that a significant fraction of the r-protein genes have been either lost from one of the duplicated fragments or inserted after the initial duplication event. Only 52 r-protein genes lack a matching EST accession, and 19 of these contain incomplete open reading frames, confirming that most genes are expressed. Assessment of cognate EST numbers suggests that r-protein gene family members are differentially expressed.

308 citations


Journal ArticleDOI
TL;DR: The limits of the analysis are described, the striking unevenness of pseudogene derivation in the IF multigene family is discussed and the nomenclature of Moll and colleagues is proposed to extend to any novel keratin.
Abstract: We screened the draft sequence of the human genome for genes that encode intermediate filament (IF) proteins in general, and keratins in particular. The draft covers nearly all previously established IF genes including the recent cDNA and gene additions, such as pancreatic keratin 23, synemin and the novel muscle protein syncoilin. In the draft, seven novel type II keratins were identified, presumably expressed in the hair follicle/epidermal appendages. In summary, 65 IF genes were detected, placing IF among the 100 largest gene families in humans. All functional keratin genes map to the two known keratin clusters on chromosomes 12 (type II plus keratin 18) and 17 (type I), whereas other IF genes are not clustered. Of the 208 keratin-related DNA sequences, only 49 reflect true keratin genes, whereas the majority describe inactive gene fragments and processed pseudogenes. Surprisingly, nearly 90% of these inactive genes relate specifically to the genes of keratins 8 and 18. Other keratin genes, as well as those that encode non-keratin IF proteins, lack either gene fragments/pseudogenes or have only a few derivatives. As parasitic derivatives of mature mRNAs, the processed pseudogenes of keratins 8 and 18 have invaded most chromosomes, often at several positions. We describe the limits of our analysis and discuss the striking unevenness of pseudogene derivation in the IF multigene family. Finally, we propose to extend the nomenclature of Moll and colleagues to any novel keratin.

295 citations


Journal ArticleDOI
10 Jan 2001-Gene
TL;DR: Gene organization appears to be a useful tool in the study of the regulation, the physiological role and the function of these P450s, and a relatively good correlation between intron conservation and phylogenetic relationship between members of the P450 subfamilies.

Journal ArticleDOI
TL;DR: This work combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set and identified two novel NR sequences, indicating that both are pseudogenes.
Abstract: The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.

Journal ArticleDOI
TL;DR: The identification and classification of all nuclear receptor genes in the human genome are reported, and corresponding transcriptome and proteome diversity are discussed.

Journal ArticleDOI
TL;DR: The identification of a new member of the CYP3A family and the characterization of the full CYP 3A locus will aid efforts to identify the genetic variants underlying its variable expression, which will lead to a better optimization of therapies involving the numerous substrates of CYP2A proteins.
Abstract: Proteins encoded by the human CYP3A genes metabolize every second drug currently in use. The activity of CYP3A gene products in the general population is highly variable and may affect the efficacy and safety of drugs metabolized by these enzymes. The mechanisms underlying this variability are poorly understood, but they include gene induction, protein inhibition and unknown genetic polymorphisms. To better understand the regulation of CYP3A expression and to provide a basis for a screen of genetic polymorphisms, we determined and analysed the sequence of the human CYP3A locus. The 231 kb locus sequence contains the three CYP3A genes described previously (CYP3A4, CYP3A5 and CYP3A7), three pseudogenes as well as a novel CYP3A gene termed CYP3A43. The gene encodes a putative protein with between 71.5% and 75.8% identity to the other CYP3A proteins. The highest expression level of CYP3A43 mRNA is observed in the prostate, an organ with extensive steroid metabolism. CYP3A43 is also expressed in several other tissues including liver, where it can be induced by rifampicin. CYP3A43 transcripts undergo extensive splicing. The identification of a new member of the CYP3A family and the characterization of the full CYP3A locus will aid efforts to identify the genetic variants underlying its variable expression. This, in turn, will lead to a better optimization of therapies involving the numerous substrates of CYP3A proteins.

Journal ArticleDOI
30 Aug 2001-Nature
TL;DR: The results support the idea that gene conversion and somatic hypermutation constitute distinct pathways for processing a common lesion in the immunoglobulin V gene.
Abstract: After gene rearrangement, immunoglobulin V genes are further diversified by either somatic hypermutation or gene conversion. Hypermutation (in man and mouse) occurs by the fixation of individual, non-templated nucleotide substitutions. Gene conversion (in chicken) is templated by a set of upstream V pseudogenes. Here we show that if the RAD51 paralogues XRCC2, XRCC3 or RAD51B are ablated the pattern of diversification of the immunoglobulin V gene in the chicken DT40 B-cell lymphoma line exhibits a marked shift from one of gene conversion to one of somatic hypermutation. Non-templated, single-nucleotide substitutions are incorporated at high frequency specifically into the V domain, largely at G/C and with a marked hotspot preference. These mutant DT40 cell lines provide a tractable model for the genetic dissection of immunoglobulin hypermutation and the results support the idea that gene conversion and somatic hypermutation constitute distinct pathways for processing a common lesion in the immunoglobulin V gene. The marked induction of somatic hypermutation that is achieved by ablating the RAD51 paralogues is probably a consequence of modifying the recombination-mediated repair of such initiating lesions.

Journal ArticleDOI
05 Sep 2001-Gene
TL;DR: The identification, cloning and tissue distributions of ten novel human genes encoding G protein-coupled receptors (GPCRs) and a pseudogene, psi GPR79, are reported, which can now be used in assays to determine endogenous and pharmacological ligands.

Journal ArticleDOI
TL;DR: It is found that the murine paucity of lymph node T cell (plt) mutation is due to the loss of both SLC and EBI-1 ligand chemokine (ELC) expression in secondary lymphoid organs.
Abstract: The murine paucity of lymph node T cell (plt) mutation leads to abnormalities in leukocyte migration and immune response. The causative defect is thought to be a loss of secondary lymphoid-organ chemokine (SLC) expression in lymphoid tissues. We now find that the plt defect is due to the loss of both SLC and EBI-1 ligand chemokine (ELC) expression in secondary lymphoid organs. In an examination of the plt locus, we find that commonly used inbred mouse strains demonstrate at least three different haplotypes. Polymorphism at this locus is due to duplications of at least four genes, three of them encoding chemokines. At least two cutaneous T cell-attracting chemokine (CTACK), three SLC, and four ELC genes or pseudogenes are present in some haplotypes. All haplotypes share a duplication that includes two SLC genes, which demonstrate different expression patterns, a single functional ELC gene, and an ELC pseudogene. The plt mutation represents a deletion that includes the SLC gene expressed in secondary lymphoid organs and the single functional ELC gene, leaving only an SLC gene that is expressed in lymphatic endothelium and an ELC pseudogene. This lack of CCR7 ligands in the secondary lymphoid organs of plt mice provides a basis for their severe abnormalities in leukocyte migration and immune response.

Journal ArticleDOI
01 Apr 2001-Blood
TL;DR: Results indicate that allelic loss and mutation of a gene within the MDR is an unlikely pathogenetic mechanism for B-CLL, however, haplo-insufficiency of one of the identified genes may contribute to tumorigenesis.

Journal ArticleDOI
TL;DR: It is proposed that the influx of dangerous genetic elements such as transposons and bacteriophages selects for the maintenance of relatively high deletion rates in most bacteria; the sheltered lifestyle of intracellular parasites removes this threat, leading to reduced deletion rates and larger pseudogene loads.

Journal ArticleDOI
TL;DR: It is shown that multiple DNA sequences, similar to the mitochondrial cytochrome oxidase I (COI) gene, occur within single individuals in at least 10 species of the snapping shrimp genus Alpheus, and that genetic material has been repeatedly transferred from the mtDNA to the nuclear genome of snapping shrimp.
Abstract: Here we show that multiple DNA sequences, similar to the mitochondrial cytochrome oxidase I (COI) gene, occur within single individuals in at least 10 species of the snapping shrimp genus Alpheus. Cloning of amplified products revealed the presence of copies that differed in length and (more frequently) in base substitutions. Although multiple copies were amplified in individual shrimp from total genomic DNA (gDNA), only one sequence was amplified from cDNA. These results are best explained by the presence of nonfunctional duplications of a portion of the mtDNA, probably located in the nuclear genome, since transfer into the nuclear gene would render the COI gene nonfunctional due to differences in the nuclear and mitochondrial genetic codes. Analysis of codon variation suggests that there have been 21 independent transfer events in the 10 species examined. Within a single animal, differences between the sequences of these pseudogenes ranged from 0.2% to 20.6%, and those between the real mtDNA and pseudogene sequences ranged from 0.2% to 18.8% (uncorrected). The large number of integration events and the large range of divergences between pseudogenes and mtDNA sequences suggest that genetic material has been repeatedly transferred from the mtDNA to the nuclear genome of snapping shrimp. Unrecognized pseudogenes in phylogenetic or population studies may result in spurious results, although previous estimates of rates of molecular evolution based on Alpheus sister taxa separated by the Isthmus of Panama appear to remain valid. Especially worrisome for researchers are those pseudogenes that are not obviously recognizable as such. An effective solution may be to amplify transcribed copies of protein-coding mitochondrial genes from cDNA rather than using genomic DNA.

Journal ArticleDOI
TL;DR: A patient with mild Gaucher disease but impaired horizontal saccadic eye movements who developed a tremor at age 42, followed by rapid deterioration of her gait is described, which progressed despite enzyme replacement therapy.

Journal ArticleDOI
TL;DR: Sequencing and PCR-RFLP analyses of individual COX-negative muscle fibres from a patient with a previously described heteroplasmic COX II (T7587C) mutation indicate that mutant loads as low as 30% can be reliably detected by sequencing.
Abstract: Studies of single cells have previously shown intracellular clonal expansion of mitochondrial DNA (mtDNA) mutations to levels that can cause a focal cytochrome c oxidase (COX) defect. Whilst techniques are available to study mtDNA rearrangements at the level of the single cell, recent interest has focused on the possible role of somatic mtDNA point mutations in ageing, neurodegenerative disease and cancer. We have therefore developed a method that permits the reliable determination of the entire mtDNA sequence from single cells without amplifying contaminating, nuclear-embedded pseudogenes. Sequencing and PCR–RFLP analyses of individual COX-negative muscle fibres from a patient with a previously described heteroplasmic COX II (T7587C) mutation indicate that mutant loads as low as 30% can be reliably detected by sequencing. This technique will be particularly useful in identifying the mtDNA mutational spectra in age-related COX-negative cells and will increase our understanding of the pathogenetic mechanisms by which they occur.

Journal ArticleDOI
TL;DR: The hypothesis that the contrasting phylogenetic histories drawn from Quercus using ITS data are not strictly related to technical differences between laboratories, but that they have rather been generated from the analysis of paralogous sequences, best reconciles the available data.

Journal ArticleDOI
TL;DR: The results suggest that inactivated genetic material in the Rickettsia genomes deteriorates spontaneously due to a mutation bias for deletions and that the noncoding sequences represent DNA in the final stages of this degenerative process.
Abstract: Studies of neutrally evolving sequences suggest that differences in eukaryotic genome sizes result from different rates of DNA loss. However, very few pseudogenes have been identified in microbial species, and the processes whereby genes and genomes deteriorate in bacteria remain largely unresolved. The typhus-causing agent, Rickettsia prowazekii, is exceptional in that as much as 24% of its 1.1-Mb genome consists of noncoding DNA and pseudogenes. To test the hypothesis that the noncoding DNA in the R. prowazekii genome represents degraded remnants of ancestral genes, we systematically examined all of the identified pseudogenes and their flanking sequences in three additional Rickettsia species. Consistent with the hypothesis, we observe sequence similarities between genes and pseudogenes in one species and intergenic DNA in another species. We show that the frequencies and average sizes of deletions are larger than insertions in neutrally evolving pseudogene sequences. Our results suggest that inactivated genetic material in the Rickettsia genomes deteriorates spontaneously due to a mutation bias for deletions and that the noncoding sequences represent DNA in the final stages of this degenerative process.

Journal ArticleDOI
TL;DR: The results add to the growing body of work indicating that under some circumstances duplicated mitochondrial control regions are retained through evolutionary time rather than degenerating and being lost, presumably due to selection for a small mitochondrial genome.
Abstract: We report a duplication and rearrangement of the mitochondrial genome involving the control region of parrots in the genus Amazona. This rearrangement results in a gene order of cytochrome b/tRNA(Thr)/pND6/pGlu/CR1/tRNA(Pro)/NADH dehydrogenase 6/tRNA(Glu)/CR2/tRNA(Phe)/12s rRNA, where CR1 and CR2 refer to duplicate control regions, and pND6 and pGlu indicate presumed pseudogenes. In contrast to previous reports of duplications involving the control regions of birds, neither copy of the parrot control region shows any indications of degeneration. Rather, both copies contain many of the conserved sequence features typically found in avian control regions, including the goose hairpin, TASs, the F, C, and D boxes, conserved sequence box 1 (CSB1), and an apparent homolog to the mammalian CSB3. We conducted a phylogenetic analysis of homologous portions of the duplicate control regions from 21 individuals representing four species of Amazona (A. ochrocephala, A. autumnalis, A. farinosa, and A. amazonica) and Pionus chalcopterus. This analysis revealed that an individual's two control region copies (i.e., the paralogous copies) were typically more closely related to one another than to corresponding segments of other individuals (i.e., the orthologous copies). The average sequence divergence of the paralogous control region copies within an individual was 1.4%, versus a mean value of 4.1% between control region orthologs representing nearest phylogenetic neighbors. No differences were found between the paralogous copies in either the rate or the pattern in which the two copies accumulated base pair changes. This pattern suggests concerted evolution of the two control regions, perhaps through occasional gene conversion events. We estimated that gene conversion events occurred on average every 34,670 +/- 18,400 years based on pairwise distances between the paralogous control region sequences of each individual. Our results add to the growing body of work indicating that under some circumstances duplicated mitochondrial control regions are retained through evolutionary time rather than degenerating and being lost, presumably due to selection for a small mitochondrial genome.

Journal ArticleDOI
10 Jan 2001-Gene
TL;DR: A detailed analysis of the RUNX1 locus is presented, showing the transition from a ~1 Mb of gene-poor region containing only pseudogenes, to a gene-rich region containing several functional genes, and the large repertoire of RUNx1 proteins generated through usage of alternatively spliced exons some of which contain in frame stop codons.

Journal ArticleDOI
05 Sep 2001-Gene
TL;DR: The results provide new insights into host-pathogen interactions and a basis for further functional characterization of the gene family and resolve discrepancies in annotation between gene family members.

Journal ArticleDOI
TL;DR: Results from in vitro phosphorylation indicate that the absence of arginine codons at positions 362 and 376 completely abolishes phosphorylated in the connexin43 channel regulation domain suggesting a possible mechanism for the pathologies associated with HLHS.
Abstract: Gap junction channels formed by the connexin43 protein are considered to play crucial roles in development and function because they allow the direct cell-to-cell exchange of molecules that mediate multiple signaling events. Previous results have shown that connexin43 channels are intricately gated by phosphorylation and that disruption of this regulation gives rise to severe heart malformations and defects of laterality in human, chick and frog. Here we report the identification of connexin43 gene mutations that represent a minor population of connexin43 alleles, which could be reliably detected by using denaturing gradient gel electrophoresis (DGGE) to visualize normal and mutant DNAs that were separately sequenced. In contrast, sequencing of total PCR products without DGGE-pre-selection failed to consistently identify these mutations. Forty-six controls and 20 heart transplant recipients were examined in this study. In the latter group, 14 children had hypoplastic left heart syndrome (HLHS) in which connexin43 gene defects were detected in eight. The remaining six transplant patients with HLHS and all controls showed no defects. All eight HLHS children with gene defects had the same four substitutions: two that were silent polymorphisms, and two that were missense, replacing arginine codons at positions 362 and 376 with codons for glutamines. All four of these substitutions are identical to the nucleotide sequence of the connexin43 pseudogene, suggesting the possibility of an illicit recombination. A breakpoint region was identified 5' to the mutation site in a 63bp domain that is 100% identical in the gene and pseudogene. Results from in vitro phosphorylation indicate that the absence of arginines 362 and 376 completely abolishes phosphorylation in the connexin43 channel regulation domain suggesting a possible mechanism for the pathologies associated with HLHS.

Journal ArticleDOI
TL;DR: In Anaplasma marginale pseudogenes for two antigenically variable gene families, msp2 and msp3, appear in concert, which would allow these two gene families to act synergistically to evade the host immune response.
Abstract: Ehrlichiae are responsible for important tick-transmitted diseases, including anaplasmosis, the most prevalent tick-borne infection of livestock worldwide, and the emerging human diseases monocytic and granulocytic ehrlichiosis. Antigenic variation of major surface proteins is a key feature of these pathogens that allows persistence in the mammalian host, a requisite for subsequent tick transmission. In Anaplasma marginale pseudogenes for two antigenically variable gene families, msp2 and msp3, appear in concert. These pseudogenes can be recombined into the functional expression site to generate new antigenic variants. Coordinated control of the recombination of these genes would allow these two gene families to act synergistically to evade the host immune response.

Journal ArticleDOI
TL;DR: Comparison between physical and genetic maps revealed a striking difference in recombination rates between the sexes with a lower recombination frequency in males than females, which may enable a chromosomal misalignment at proximal and distal CMT1A-REPs and promote unequal crossing over, which occurs 10 times more frequently in male meiosis.
Abstract: Duplication and deletion of the 1.4-Mb region in 17p12 that is delimited by two 24-kb low copy number repeats (CMT1A–REPs) represent frequent genomic rearrangements resulting in two common inherited peripheral neuropathies, Charcot-Marie-Tooth disease type 1A (CMT1A) and hereditary neuropathy with liability to pressure palsy (HNPP). CMT1A and HNPP exemplify a paradigm for genomic disorders wherein unique genome architectural features result in susceptibility to DNA rearrangements that cause disease. A gene within the 1.4-Mb region, PMP22, is responsible for these disorders through a gene-dosage effect in the heterozygous duplication or deletion. However, the genomic structure of the 1.4-Mb region, including other genes contained within the rearranged genomic segment, remains essentially uncharacterized. To delineate genomic structural features, investigate higher-order genomic architecture, and identify genes in this region, we constructed PAC and BAC contigs and determined the complete nucleotide sequence. This CMT1A/HNPP genomic segment contains 1,421,129 bp of DNA. A low copy number repeat (LCR) was identified, with one copy inside and two copies outside of the 1.4-Mb region. Comparison between physical and genetic maps revealed a striking difference in recombination rates between the sexes with a lower recombination frequency in males (0.67 cM/Mb) versus females (5.5 cM/Mb). Hypothetically, this low recombination frequency in males may enable a chromosomal misalignment at proximal and distal CMT1A–REPs and promote unequal crossing over, which occurs 10 times more frequently in male meiosis. In addition to three previously described genes, five new genes (TEKT3, HS3ST3B1, NPD008/CGI-148, CDRT1, and CDRT15) and 13 predicted genes were identified. Most of these predicted genes are expressed only in embryonic stages. Analyses of the genomic region adjacent to proximal CMT1A–REP indicated an evolutionary mechanism for the formation of proximal CMT1A–REP and the creation of novel genes by DNA rearrangement during primate speciation.