scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2007"


Journal ArticleDOI
TL;DR: It is shown that pseudogene formation and gene loss are the principal forces shaping the different genomes of Leishmania, and genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
Abstract: Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader–associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

721 citations


Journal ArticleDOI
TL;DR: It is shown that the vast majority of nonconserved ORFs present by chance in RNA transcripts are random occurrences, and the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.
Abstract: Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of ≈24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs—specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to ≈20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.

616 citations


Journal ArticleDOI
TL;DR: The MITOMAP data system for the human mitochondrial genome has been greatly enhanced by the addition of a navigable mutational mitochondrial DNA (mtDNA) phylogenetic tree of ∼3000 mtDNA coding region sequences plus expanded pathogenic mutation tables and a nuclear-mtDNA pseudogene (NUMT) data base.
Abstract: The MITOMAP (http://www.mitomap.org) data system for the human mitochondrial genome has been greatly enhanced by the addition of a navigable mutational mitochondrial DNA (mtDNA) phylogenetic tree of 3000 mtDNA coding region sequences plus expanded pathogenic mutation tables and a nuclear-mtDNA pseudogene (NUMT) data base. The phylogeny reconstructs the entire mutational history of the human mtDNA, thus defining the mtDNA haplogroups and differentiating ancient from recent mtDNA mutations. Pathogenic mutations are classified by both genotype and phenotype, and the NUMT sequences permits detection of spurious inclusion of pseudogene variants during mutation analysis. These additions position MITOMAP for the implementation of our automated mtDNA sequence analysis system, Mitomaster.

578 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of homeobox genes and pseudogenes in the human genome is conducted, many new loci are described, and the classification and nomenclature of homeOBox genes are revised.
Abstract: The homeobox genes are a large and diverse group of genes, many of which play important roles in the embryonic development of animals. Increasingly, homeobox genes are being compared between genomes in an attempt to understand the evolution of animal development. Despite their importance, the full diversity of human homeobox genes has not previously been described. We have identified all homeobox genes and pseudogenes in the euchromatic regions of the human genome, finding many unannotated, incorrectly annotated, unnamed, misnamed or misclassified genes and pseudogenes. We describe 300 human homeobox loci, which we divide into 235 probable functional genes and 65 probable pseudogenes. These totals include 3 genes with partial homeoboxes and 13 pseudogenes that lack homeoboxes but are clearly derived from homeobox genes. These figures exclude the repetitive DUX1 to DUX5 homeobox sequences of which we identified 35 probable pseudogenes, with many more expected in heterochromatic regions. Nomenclature is established for approximately 40 formerly unnamed loci, reflecting their evolutionary relationships to other loci in human and other species, and nomenclature revisions are proposed for around 30 other loci. We use a classification that recognizes 11 homeobox gene 'classes' subdivided into 102 homeobox gene 'families'. We have conducted a comprehensive survey of homeobox genes and pseudogenes in the human genome, described many new loci, and revised the classification and nomenclature of homeobox genes. The classification scheme may be widely applicable to homeobox genes in other animal genomes and will facilitate comparative genomics of this important gene superclass.

377 citations


Journal ArticleDOI
TL;DR: A model is described by which selection continuously favors both maintenance of the duplicate copy and divergence of that copy from the parent gene, which would restrict the freedom to diverge.
Abstract: New genes with novel functions arise by duplication and divergence, but the process poses a problem. After duplication, an extra gene copy must rise to sufficiently high frequency in the population and remain free of common inactivating lesions long enough to acquire the rare mutations that provide a new selectable function. Maintaining a duplicated gene by selection for the original function would restrict the freedom to diverge. (We refer to this problem as Ohno's dilemma). A model is described by which selection continuously favors both maintenance of the duplicate copy and divergence of that copy from the parent gene. Before duplication, the original gene has a trace side activity (the innovation) in addition to its original function. When an altered ecological niche makes the minor innovation valuable, selection favors increases in its level (the amplification), which is most frequently conferred by increased dosage of the parent gene. Selection for the amplified minor function maintains the extra copies and raises the frequency of the amplification in the population. The same selection favors mutational improvement of any of the extra copies, which are not constrained to maintain their original function (the divergence). The rate of mutations (per genome) that improve the new function is increased by the multiplicity of target copies within a genome. Improvement of some copies relaxes selection on others and allows their loss by mutation (becoming pseudogenes). Ultimately one of the extra copies is able to provide all of the new activity.

345 citations


Journal ArticleDOI
08 Aug 2007-PLOS ONE
TL;DR: It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes, aided by the stochastic nature of OR gene expression.
Abstract: Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR genes in platypuses, opossums, cows, dogs, rats, and macaques and studied the evolutionary change of the genes together with those of humans and mice. We found that platypuses and primates have ,400 functional OR genes while the other species have 800–1,200 functional OR genes. We then estimated the numbers of gains and losses of OR genes for each branch of the phylogenetic tree of mammals. This analysis showed that (i) gene expansion occurred in the placental lineage each time after it diverged from monotremes and from marsupials and (ii) hundreds of gains and losses of OR genes have occurred in an orderspecific manner, making the gene repertoires highly variable among different orders. It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes. This fluctuation seems to have been aided by the stochastic nature of OR gene expression.

301 citations


Journal ArticleDOI
TL;DR: Data obtained from sequencing of bacterially cloned rDNA genes can substantially exaggerate the level of eukaryotic microbial diversity inferred from natural samples if appropriate controls are not applied.
Abstract: Molecular approaches have revolutionized our ability to study the ecology and evolution of micro-organisms. Among the most widely used genetic markers for these studies are genes and spacers of the rDNA operon. However, the presence of intragenomic rDNA variation, especially among eukaryotes, can potentially confound estimates of microbial diversity. To test this hypothesis, bacterially cloned PCR products of the internal transcribed spacer (ITS) region from clonal isolates of Symbiodinium, a large genus of dinoflagellates that live in symbiosis with many marine protists and invertebrate metazoa, were sequenced and analysed. We found widely differing levels of intragenomic sequence variation and divergence in representatives of Symbiodinium clades A to E, with only a small number of variants attributed to Taq polymerase/bacterial cloning error or PCR chimeras. Analyses of 5.8S-rDNA and ITS2 secondary structure revealed that some variants possessed base substitutions and/or indels that destabilized the folded form of these molecules; given the vital nature of secondary structure to the function of these molecules, these likely represent pseudogenes. When similar controls were applied to bacterially cloned ITS sequences from a recent survey of Symbiodinium diversity in Hawaiian Porites spp., most variants (~87.5%) possessed unstable secondary structures, had unprecedented mutations, and/or were PCR chimeras. Thus, data obtained from sequencing of bacterially cloned rDNA genes can substantially exaggerate the level of eukaryotic microbial diversity inferred from natural samples if appropriate controls are not applied. These considerations must be taken into account when interpreting sequence data generated by bacterial cloning of multicopy genes such as rDNA.

273 citations


Journal ArticleDOI
31 Oct 2007-PLOS ONE
TL;DR: The genome of strain ET3-1, a representative isolate of a common bovine mastitis-causing S. aureus clone, is sequenced and revealed a set of molecular genetic features that distinguish clones of highly successful bovin-associated S.Aureus clones through a combination of foreign DNA acquisition and gene decay.
Abstract: Background. The majority of Staphylococcus aureus isolates that are recovered from either serious infections in humans or from mastitis in cattle represent genetically distinct sets of clonal groups. Moreover, population genetic analyses have provided strong evidence of host specialization among S. aureus clonal groups associated with human and ruminant infection. However, the molecular basis of host specialization in S. aureus is not understood. Methodology/Principal Findings. We sequenced the genome of strain ET3-1, a representative isolate of a common bovine mastitis-causing S. aureus clone. Strain ET3-1 encodes several genomic elements that have not been previously identified in S. aureus, including homologs of virulence factors from other Gram-positive pathogens. Relative to the other sequenced S. aureus associated with human infection, allelic variation in ET3-1 was high among virulence and surface-associated genes involved in host colonization, toxin production, iron metabolism, antibiotic resistance, and gene regulation. Interestingly, a number of well-characterized S. aureus virulence factors, including protein A and clumping factor A, exist as pseudogenes in ET3-1. Whole-genome DNA microarray hybridization revealed considerable similarity in the gene content of highly successful S. aureus clones associated with bovine mastitis, but not among those clones that are only infrequently recovered from bovine hosts. Conclusions/Significance. Whole genome sequencing and comparative genomic analyses revealed a set of molecular genetic features that distinguish clones of highly successful bovine-associated S. aureus optimized for mastitis pathogenesis in cattle from those that infect human hosts or are only infrequently recovered from bovine sources. Further, the results suggest that modern bovine specialist clones diverged from a common ancestor resembling human-associated S. aureus clones through a combination of foreign DNA acquisition and gene decay.

246 citations


Journal ArticleDOI
TL;DR: Zscan4 seems to be essential for preimplantation development, as reduction of Zscan4 transcript levels by siRNAs delays the progression from the 2-cell to the 4-cell stage and produces blastocysts that fail to implant or proliferate in blastocyst outgrowth culture.

239 citations


Journal ArticleDOI
TL;DR: This work extensively examined the transcriptional activity of the ENCODE pseudogenes and performed systematic series of pseudogene-specific RACE analyses, demonstrating that at least a fifth of the 201 pseudogene are transcribed in one or more cell lines or tissues.
Abstract: Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

214 citations


Journal ArticleDOI
01 Nov 2007-Genetics
TL;DR: The resistance (R) gene Pi37, present in the rice cultivar St. No. 1, was isolated by an in silico map-based cloning procedure and complementation analysis revealed Pi37-3 to be the functional gene, while -1, -2, and -4 are probably pseudogenes.
Abstract: The resistance (R) gene Pi37, present in the rice cultivar St. No. 1, was isolated by an in silico map-based cloning procedure. The equivalent genetic region in Nipponbare contains four nucleotide binding site–leucine-rich repeat (NBS–LRR) type loci. These four candidates for Pi37 (Pi37-1, -2, -3, and -4) were amplified separately from St. No. 1 via long-range PCR, and cloned into a binary vector. Each construct was individually transformed into the highly blast susceptible cultivar Q1063. The subsequent complementation analysis revealed Pi37-3 to be the functional gene, while -1, -2, and -4 are probably pseudogenes. Pi37 encodes a 1290 peptide NBS–LRR product, and the presence of substitutions at two sites in the NBS region (V239A and I247M) is associated with the resistance phenotype. Semiquantitative expression analysis showed that in St. No. 1, Pi37 was constitutively expressed and only slightly induced by blast infection. Transient expression experiments indicated that the Pi37 product is restricted to the cytoplasm. Pi37-3 is thought to have evolved recently from -2, which in turn was derived from an ancestral -1 sequence. Pi37-4 is likely the most recently evolved member of the cluster and probably represents a duplication of -3. The four Pi37 paralogs are more closely related to maize rp1 than to any of the currently isolated rice blast R genes Pita, Pib, Pi9, Pi2, Piz-t, and Pi36.

Journal ArticleDOI
TL;DR: Human hyperosmia to isovaleric acid is a complex trait, contributed to by both receptor and other mechanisms in the olfactory signaling pathway, resulting in an overrepresentation of individuals who were hyperosmic to several odorants.
Abstract: The genetic basis of odorant-specific variations in human olfactory thresholds, and in particular of enhanced odorant sensitivity (hyperosmia), remains largely unknown. Olfactory receptor (OR) segregating pseudogenes, displaying both functional and nonfunctional alleles in humans, are excellent candidates to underlie these differences in olfactory sensitivity. To explore this hypothesis, we examined the association between olfactory detection threshold phenotypes of four odorants and segregating pseudogene genotypes of 43 ORs genome-wide. A strong association signal was observed between the single nucleotide polymorphism variants in OR11H7P and sensitivity to the odorant isovaleric acid. This association was largely due to the low frequency of homozygous pseudogenized genotype in individuals with specific hyperosmia to this odorant, implying a possible functional role of OR11H7P in isovaleric acid detection. This predicted receptor–ligand functional relationship was further verified using the Xenopus oocyte expression system, whereby the intact allele of OR11H7P exhibited a response to isovaleric acid. Notably, we also uncovered another mechanism affecting general olfactory acuity that manifested as a significant inter-odorant threshold concordance, resulting in an overrepresentation of individuals who were hyperosmic to several odorants. An involvement of polymorphisms in other downstream transduction genes is one possible explanation for this observation. Thus, human hyperosmia to isovaleric acid is a complex trait, contributed to by both receptor and other mechanisms in the olfactory signaling pathway.

Journal ArticleDOI
01 Aug 2007-Genetics
TL;DR: An RT–PCR analysis showed that Pi36 is constitutively expressed in Kasalath, and this gene is more closely related to the barley powdery mildew resistance genes Mla1 and Mla6 than to the rice blast R genes Pita, Pib, Pi9, and Piz-t.
Abstract: The indica rice variety Kasalath carries Pi36, a gene that determines resistance to Chinese isolates of rice blast and that has been located to a 17-kb interval on chromosome 8. The genomic sequence of the reference japonica variety Nipponbare was used for an in silico prediction of the resistance (R) gene content of the interval and hence for the identification of candidate gene(s) for Pi36. Three such sequences, which all had both a nucleotide-binding site and a leucine-rich repeat motif, were present. The three candidate genes were amplified from the genomic DNA of a number of varieties by long-range PCR, and the resulting amplicons were inserted into pCAMBIA1300 and/or pYLTAC27 vectors to determine sequence polymorphisms correlated to the resistance phenotype and to perform transgenic complementation tests. Constructs containing each candidate gene were transformed into the blast-susceptible variety Q1063, which allowed the identification of Pi36-3 as the functional gene, with the other two candidates being probable pseudogenes. The Pi36-encoded protein is composed of 1056 amino acids, with a single substitution event (Asp to Ser) at residue 590 associated with the resistant phenotype. Pi36 is a single-copy gene in rice and is more closely related to the barley powdery mildew resistance genes Mla1 and Mla6 than to the rice blast R genes Pita, Pib, Pi9, and Piz-t. An RT-PCR analysis showed that Pi36 is constitutively expressed in Kasalath.

Journal ArticleDOI
15 Jun 2007-Science
TL;DR: Improved methods revealed that more than 77% of this heterochromatin sequence, including introns and intergenic regions, is composed of fragmented and nested transposable elements and other repeated DNAs.
Abstract: The repetitive DNA that constitutes most of the heterochromatic regions of metazoan genomes has hindered the comprehensive analysis of gene content and other functions. We have generated a detailed computational and manual annotation of 24 megabases of heterochromatic sequence in the Release 5 Drosophila melanogaster genome sequence. The heterochromatin contains a minimum of 230 to 254 protein-coding genes, which are conserved in other Drosophilids and more diverged species, as well as 32 pseudogenes and 13 noncoding RNAs. Improved methods revealed that more than 77% of this heterochromatin sequence, including introns and intergenic regions, is composed of fragmented and nested transposable elements and other repeated DNAs. Drosophila heterochromatin contains “islands” of highly conserved genes embedded in these “oceans” of complex repeats, which may require special expression and splicing mechanisms.

Journal ArticleDOI
TL;DR: The largest diversification is seen for GPCRs that respond to exogenous stimuli indicating that the variation in their repertoires reflects to a large extent the adaptation of the species to their environment.
Abstract: The superfamily of G protein-coupled receptors (GPCRs) is one of the largest within most mammals. GPCRs are important targets for pharmaceuticals and the rat is one of the most widely used model organisms in biological research. Accurate comparisons of protein families in rat, mice and human are thus important for interpretation of many physiological and pharmacological studies. However, current automated protein predictions and annotations are limited and error prone. We searched the rat genome for GPCRs and obtained 1867 full-length genes and 739 pseudogenes. We identified 1277 new full-length rat GPCRs, whereof 1235 belong to the large group of olfactory receptors. Moreover, we updated the datasets of GPCRs from the human and mouse genomes with 1 and 43 new genes, respectively. The total numbers of full-length genes (and pseudogenes) identified were 799 (583) for human and 1783 (702) for mouse. The rat, human and mouse GPCRs were classified into 7 families named the Glutamate, Rhodopsin, Adhesion, Frizzled, Secretin, Taste2 and Vomeronasal1 families. We performed comprehensive phylogenetic analyses of these families and provide detailed information about orthologues and species-specific receptors. We found that 65 human Rhodopsin family GPCRs are orphans and 56 of these have an orthologue in rat. Interestingly, we found that the proportion of one-to-one GPCR orthologues was only 58% between rats and humans and only 70% between the rat and mouse, which is much lower than stated for the entire set of all genes. This is in mainly related to the sensory GPCRs. The average protein sequence identities of the GPCR orthologue pairs is also lower than for the whole genomes. We found these to be 80% for the rat and human pairs and 90% for the rat and mouse pairs. However, the proportions of orthologous and species-specific genes vary significantly between the different GPCR families. The largest diversification is seen for GPCRs that respond to exogenous stimuli indicating that the variation in their repertoires reflects to a large extent the adaptation of the species to their environment. This report provides the first overall roadmap of the GPCR repertoire in rat and detailed comparisons with the mouse and human repertoires.

Journal ArticleDOI
TL;DR: A subset of VSG-related genes are found, differing from VSGs in genomic environment and expression patterns, and predicted they have distinct function, and appear to be fundamental in providing the interacting donors for mosaic formation.
Abstract: Trypanosoma brucei evades host acquired immunity through differential activation of its large archive of silent variant surface glycoprotein (VSG) genes, most of which are pseudogenes in subtelomeric arrays. We have analyzed 940 VSGs, representing one half to two thirds of the arrays. Sequence types A and B of the VSG N-terminal domains were confirmed, while type C was found to be a constituent of type A. Two new C-terminal domain types were found. Nearly all combinations of domain types occurred, with some bias to particular combinations. One-third of encoded N-terminal domains, but only 13% of C-terminal domains, are intact, indicating a particular need for silent VSGs to gain a functional C-terminal domain to be expressed. About 60% of VSGs are unique, the rest occurring in subfamilies of two to four close homologs (>50%–52% peptide identity). We found a subset of VSG-related genes, differing from VSGs in genomic environment and expression patterns, and predict they have distinct function. Almost all (92%) full-length array VSGs have the partially conserved flanks associated with the duplication mechanism that activates silent genes, and these sequences have also contributed to archive evolution, mediating most of the conversions of segments, containing ≥1 VSG, within and between arrays. During infection, intact array genes became activated by duplication after two weeks, and mosaic VSGs assembled from pseudogenes became expressed by week three and predominated by week four. The small subfamily structure of the archive appears to be fundamental in providing the interacting donors for mosaic formation.

Journal ArticleDOI
TL;DR: The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation, including a collection of human annotations compiled from 16 sources, and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community.
Abstract: The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current 'consensus' set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100,000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources.

Journal ArticleDOI
TL;DR: Genotype–phenotype analysis failed to reveal a significant correlation between the types of mutations identified or their predicted effect on the expression of the protein and the age of onset and severity of the disease, but the principal role of ABCC6 mutations is emphasised.
Abstract: Aims: Pseudoxanthoma elasticum (PXE), an autosomal recessive disorder with considerable phenotypic variability, mainly affects the eyes, skin and cardiovascular system, characterized by dystrophic mineralization of connective tissues. It is caused by mutations in the ABCC6 gene (ATP binding cassette family C member 6) which encodes MRP6 (multidrug resistance-associated protein 6). This study aimed to investigate the mutation spectrum of ABCC6 and possible genotype-phenotype correlations. Patients and methods: Mutation data were collected on an international case series of 270 PXE patients (239 probands; 31 affected family members). A dHPLC-based assay was developed to screen for mutations in all 31 exons eliminating pseudogene co-amplification. In 134 patients with a known phenotype and both mutations identified, genotype-phenotype correlations were assessed. Results: Overall 316 mutant alleles in ABCC6 were identified in 239 probands, including 39 novel mutations. Mutations were found to cluster in exons 24 and 28 corresponding to the second nucleotide binding fold and the last intracellular domain of the protein. Together with the recurrent R1141X and del23-29 mutations, these mutations accounted for 71.5% of the total individual mutations identified. Genotype-phenotype analysis failed to reveal a significant correlation between the types of mutations identified or their predicted effect on the expression of the protein and the age of onset and severity of the disease. Conclusions: This study emphasizes the principal role of ABCC6 mutations in the pathogenesis of PXE, while the reasons for phenotypic variability remain to be explored.

Journal ArticleDOI
TL;DR: The Rickettsia genus is a group of obligate intracellular α-proteobacteria representing a paradigm of reductive evolution, and the propensity of loss was variable across genes during this process, which suggests a major role of enhanced background mutation rates on the fast protein divergence in the obligates.
Abstract: The Rickettsia genus is a group of obligate intracellular α-proteobacteria representing a paradigm of reductive evolution. Here, we investigate the evolutionary processes that shaped the genomes of the genus. The reconstruction of ancestral genomes indicates that their last common ancestor contained more genes, but already possessed most traits associated with cellular parasitism. The differences in gene repertoires across modern Rickettsia are mainly the result of differential gene losses from the ancestor. We demonstrate using computer simulation that the propensity of loss was variable across genes during this process. We also analyzed the ratio of nonsynonymous to synonymous changes (Ka/Ks) calculated as an average over large sets of genes to assay the strength of selection acting on the genomes of Rickettsia, Anaplasmataceae, and free-living γ-proteobacteria. As a general trend, Ka/Ks were found to decrease with increasing divergence between genomes. The high Ka/Ks for closely related genomes are probably due to a lag in the removal of slightly deleterious nonsynonymous mutations by natural selection. Interestingly, we also observed a decrease of the rate of gene loss with increasing divergence, suggesting a similar lag in the removal of slightly deleterious pseudogene alleles. For larger divergence (Ks > 0.2), Ka/Ks converge toward similar values indicating that the levels of selection are roughly equivalent between intracellular α-proteobacteria and their free-living relatives. This contrasts with the view that obligate endocellular microorganisms tend to evolve faster as a consequence of reduced effectiveness of selection, and suggests a major role of enhanced background mutation rates on the fast protein divergence in the obligate intracellular α-proteobacteria.

Journal ArticleDOI
TL;DR: The V2R genes are expressed in the mammalian vomeronasal organ, and their products are involved in detecting pheromones, and it is found that the human, chimpanzee, macaque, cow and dog V1R gene families have completely degenerated.

Journal ArticleDOI
TL;DR: A DNA microarray is developed that contains probes for most predicted human OR loci and used to examine OR gene expression profiles in olfactory epithelium tissues from three individuals, finding that a large number of putative human OR genes are expressed in non-olfactory tissues, sometimes exclusively so.
Abstract: Background Olfactory receptor (OR) genes were discovered more than a decade ago, when Buck and Axel observed that, in rats, certain G-protein coupled receptors are expressed exclusively in the olfactory epithelium. Subsequently, protein sequence similarity was used to identify entire OR gene repertoires of a number of mammalian species, but only in mouse were these predictions followed up by expression studies in olfactory epithelium. To rectify this, we have developed a DNA microarray that contains probes for most predicted human OR loci and used that array to examine OR gene expression profiles in olfactory epithelium tissues from three individuals.

Journal ArticleDOI
TL;DR: Investigation of CNVs in sensory receptor genes among 270 healthy humans indicated that olfactory receptor, taste receptor type 2, and vomeronasal receptor type 1 genes show a high level of intraspecific CNVs, and found that genomic drift plays an important role for generating intra- and interspecific CNVs of sensory receptors.
Abstract: The number of sensory receptor genes varies extensively among different mammalian species. This variation is believed to be caused partly by physiological requirements of animals and partly by genomic drift due to random duplication and deletion of genes. If the contribution of genomic drift is substantial, each species should contain a significant amount of copy number variation (CNV). We therefore investigated CNVs in sensory receptor genes among 270 healthy humans by using published CNV data. The results indicated that olfactory receptor (OR), taste receptor type 2, and vomeronasal receptor type 1 genes show a high level of intraspecific CNVs. In particular, >30% of the ≈800 OR gene loci in humans were polymorphic with respect to copy number, and two randomly chosen individuals showed a copy number difference of ≈11 in functional OR genes on average. There was no significant difference in the amount of CNVs between functional and nonfunctional OR genes. Because pseudogenes are expected to evolve in a neutral fashion, this observation suggests that functional OR genes also have evolved in a similar manner with respect to copy number change. In addition, we found that the evolutionary change of copy number of OR genes approximately follows the Gaussian process in probability theory, and the copy number divergence between populations has increased with evolutionary time. We therefore conclude that genomic drift plays an important role for generating intra- and interspecific CNVs of sensory receptor genes. Similar results were obtained when all annotated genes were analyzed.

Journal ArticleDOI
TL;DR: The results showed that these genes were expressed primarily in mature larvae and the adult moth, suggesting silkworm CSPs may be involved in development.

Journal ArticleDOI
TL;DR: expression analysis of zebrafish and stickleback TAAR genes revealed that many TAARs in these fishes were expressed in the olfactory organ, suggesting the relatively high importance ofTAARs as chemosensory receptors in fishes.
Abstract: The trace amine-associated receptors (TAARs) form a specific family of G protein-coupled receptors in vertebrates. TAARs were initially considered neurotransmitter receptors, but recent study showed that mouse TAARs function as chemosensory receptors in the olfactory epithelium. To clarify the evolutionary dynamics of the TAAR gene family in vertebrates, near-complete repertoires of TAAR genes and pseudogenes were identified from the genomic assemblies of 4 teleost fishes (zebrafish, fugu, stickleback, and medaka), western clawed frogs, chickens, 3 mammals (humans, mice, and opossum), and sea lampreys. Database searches revealed that fishes had many putatively functional TAAR genes (13-109 genes), whereas relatively small numbers of TAAR genes (3-22 genes) were identified in tetrapods. Phylogenetic analysis of these genes indicated that the TAAR gene family was subdivided into 5 subfamilies that diverged before the divergence of ray-finned fishes and tetrapods. In tetrapods, virtually all TAAR genes were located in 1 specific region of their genomes as a gene cluster; however, in fishes, TAAR genes were scattered throughout more than 2 genomic locations. This possibly reflects a whole-genome duplication that occurred in the common ancestor of ray-finned fishes. Expression analysis of zebrafish and stickleback TAAR genes revealed that many TAARs in these fishes were expressed in the olfactory organ, suggesting the relatively high importance of TAARs as chemosensory receptors in fishes. A possible evolutionary history of the vertebrate TAAR gene family was inferred from the phylogenetic and comparative genomic analyses.

Journal ArticleDOI
TL;DR: A comprehensive analysis provides a defining framework for the classification of animal TALE homeobox genes and the understanding of their evolution and a novel class is identified, termed MOHAWK (MKX).
Abstract: TALE homeodomain proteins are an ancient subgroup within the group of homeodomain transcription factors that play important roles in animal, plant, and fungal development. We have extracted the full complement of TALE superclass homeobox genes from the genome projects of seven protostomes, seven deuterostomes, and Nematostella. This was supplemented with TALE homeobox genes from additional species and phylogenetic analyses were carried out with 276 sequences. We found 20 homeobox genes and 4 pseudogenes in humans, 21 genes in mouse, 8 genes in Drosophila, and 5 genes plus one truncated gene in Caenorhabditis elegans. Apart from the previously identified TALE classes MEIS, PBC, IRO, and TGIF, a novel class is identified, termed MOHAWK (MKX). Further, we show that the MEIS class can be divided into two families, PREP and MEIS. Prep genes have previously only been described in vertebrates but are lacking in Drosophila. Here we identify orthologues in other insect taxa as well as in the cnidarian Nematostella. In C. elegans, a divergent Prep protein has lost the homeodomain. Full-length multiple sequence alignment of the protostome and deuterostome sequences allowed us to identify several novel conserved motifs within the MKX, TGIF, and MEIS classes. Phylogenetic analyses revealed fast-evolving PBC class genes; in particular, some X-linked PBC genes in nematodes are subject to rapid evolution. In addition, several instances of gene loss were identified. In conclusion, our comprehensive analysis provides a defining framework for the classification of animal TALE homeobox genes and the understanding of their evolution.

Journal ArticleDOI
TL;DR: This work describes the first large-scale supernetwork for the Brassiccaeae built from gene trees for 5 loci (adh, chs, matK, trnL-F, and ITS) and reports multiple independent origins for trnF pseudogenes in crucifers.
Abstract: The occurrence of nonfunctional trnF pseudogenes has been rarely described in flowering plants. However, we describe the first large-scale supernetwork for the Brassiccaeae built from gene trees for 5 loci (adh, chs, matK, trnL-F, and ITS) and report multiple independent origins for trnF pseudogenes in crucifers. The duplicated regions of the original trnF gene are comprised of its anticodon domain and several other highly structured motifs not related to the original gene. Length variation of the trnL-F intergenic spacer region in different taxa ranges from 219 to 900 bp as a result of differences in pseudocopy number (1-14). It is speculated that functional constraints favor 2-3 or 5-6 copies, as found in Arabidopsis and Boechera. The phylogenetic distribution of microstructural changes for the trnL-F region supports ancient patterns of divergence in crucifer evolution for some but not all gene loci.

Journal ArticleDOI
TL;DR: It is hypothesized that Or genes conferred the basic olfactory repertoire to ancestral flies before the speciation of the Drosophila and Sophophora subgenera about 40 Mya, whereas lineage-specific gene duplication seems to have led to additional specialization in some species in response to specific ecological conditions.
Abstract: A total of 752 odorant receptor (Or) genes, including pseudogenes, were identified in 11 Drosophila species and named after their orthologs in Drosophila melanogaster. The 813 Or genes, including 61 from D. melanogaster, were classified into 59 orthologous groups that are well supported by gene phylogeny. By reconciling with the gene family phylogeny, we estimated the number of gene duplication/loss events and intron gain/loss events in the species phylogeny. We found that these events are particularly frequent in Drosophila grimshawi, Drosophila willistoni, and obscura group. More than half of the duplicated genes stay as tandem arrays, whose size range from 2 to 8. These genes vary in sequence and some likely underwent positive selection, indicating that the gene duplication was important for flies to acquire new olfactory functions. We hypothesize that Or genes conferred the basic olfactory repertoire to ancestral flies before the speciation of the Drosophila and Sophophora subgenera about 40 Mya. This repertoire has been largely maintained in the current species, whereas lineage-specific gene duplication seems to have led to additional specialization in some species in response to specific ecological conditions.

Journal ArticleDOI
TL;DR: Analysis of complementation assays and immunoblotting suggest that, unlike the knockout mouse model, total absence of FANCD2 does not exist in FA-D2 patients, because of constraints on viable combinations of F ANCD2 mutations.
Abstract: FANCD2 is an evolutionarily conserved Fanconi anemia (FA) gene that plays a key role in DNA double-strand–type damage responses. Using complementation assays and immunoblotting, a consortium of American and European groups assigned 29 patients with FA from 23 families and 4 additional unrelated patients to complementation group FA-D2. This amounts to 3%–6% of FA-affected patients registered in various data sets. Malformations are frequent in FA-D2 patients, and hematological manifestations appear earlier and progress more rapidly when compared with all other patients combined (FA–non-D2) in the International Fanconi Anemia Registry. FANCD2 is flanked by two pseudogenes. Mutation analysis revealed the expected total of 66 mutated alleles, 34 of which result in aberrant splicing patterns. Many mutations are recurrent and have ethnic associations and shared allelic haplotypes. There were no biallelic null mutations; residual FANCD2 protein of both isotypes was observed in all available patient cell lines. These analyses suggest that, unlike the knockout mouse model, total absence of FANCD2 does not exist in FA-D2 patients, because of constraints on viable combinations of FANCD2 mutations. Although hypomorphic mutations arie involved, clinically, these patients have a relatively severe form of FA.

Journal ArticleDOI
01 Apr 2007-Genomics
TL;DR: Motifs specific for species or families were found in OR and V1R genes, which may result in the differential pheromone-dependent behaviors and perception of odors between mouse and rat.

Journal ArticleDOI
TL;DR: The position of the genes that were lost in the ancestor's genome revealed that the process of function loss and degradation mainly took place through a gene-to-gene inactivation process, followed by the gradual loss of their DNA, suggesting a scenario of massive genome reduction through many nearly simultaneous pseudogenization events, leading to a highly specialized pathogen.
Abstract: We have reconstructed the gene content and order of the last common ancestor of the human pathogens Mycobacterium leprae and Mycobacterium tuberculosis. During the reductive evolution of M. leprae, 1537 of 2977 ancestral genes were lost, among which we found 177 previously unnoticed pseudogenes. We find evidence that a massive gene inactivation took place very recently in the M. leprae lineage, leading to the loss of hundreds of ancestral genes. A large proportion of their nucleotide content (∼89%) still remains in the genome, which allowed us to characterize and date them. The age of the pseudogenes was computed using a new methodology based on the rates and patterns of substitution in the pseudogenes and functional orthologous genes of closely related genomes. The position of the genes that were lost in the ancestor’s genome revealed that the process of function loss and degradation mainly took place through a gene-to-gene inactivation process, followed by the gradual loss of their DNA. This suggests a scenario of massive genome reduction through many nearly simultaneous pseudogenization events, leading to a highly specialized pathogen.