scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2002"


Journal ArticleDOI
06 Dec 2002-Science
TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.
Abstract: We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

7,486 citations


Journal ArticleDOI
22 Feb 2002-Cell
TL;DR: Recent progress has revealed that many of the steps in the pathway from gene sequence to active protein are connected, suggesting a unified theory of gene expression.

936 citations


Journal ArticleDOI
TL;DR: Human ORs cover a similar 'receptor space' as the mouse ORs, suggesting that the human olfactory system has retained the ability to recognize a broad spectrum of chemicals even though humans have lost nearly two-thirds of the OR genes as compared to mice.
Abstract: Olfactory receptor (OR) genes are the largest gene superfamily in vertebrates. We have identified the mouse OR genes from the nearly complete Celera mouse genome by a comprehensive data mining strategy. We found 1,296 mouse OR genes (including ∼20% pseudogenes), which can be classified into 228 families. OR genes are distributed in 27 clusters on all mouse chromosomes except 12 and Y. One OR gene cluster matches a known locus mediating a specific anosmia, indicating the anosmia may be due directly to the loss of receptors. A large number of apparently functional 'fish-like' Class I OR genes in the mouse genome may have important roles in mammalian olfaction. Human ORs cover a similar 'receptor space' as the mouse ORs, suggesting that the human olfactory system has retained the ability to recognize a broad spectrum of chemicals even though humans have lost nearly two-thirds of the OR genes as compared to mice.

899 citations


Journal ArticleDOI
TL;DR: It is determined that Borrelia burgdorferi strain B31 MI carries 21 extrachromosomal DNA elements, the largest number known for any bacterium, and the nucleotide sequence of three linear and seven circular plasmids in this infectious isolate is reported.
Abstract: We have determined that Borrelia burgdorferi strain B31 MI carries 21 extrachromosomal DNA elements, the largest number known for any bacterium. Among these are 12 linear and nine circular plasmids, whose sequences total 610 694 bp. We report here the nucleotide sequence of three linear and seven circular plasmids (comprising 290 546 bp) in this infectious isolate. This completes the genome sequencing project for this organism; its genome size is 1 521 419 bp (plus about 2000 bp of undetermined telomeric sequences). Analysis of the sequence implies that there has been extensive and sometimes rather recent DNA rearrangement among a number of the linear plasmids. Many of these events appear to have been mediated by recombinational processes that formed duplications. These many regions of similarity are reflected in the fact that most plasmid genes are members of one of the genome's 161 paralogous gene families; 107 of these gene families, which vary in size from two to 41 members, contain at least one plasmid gene. These rearrangements appear to have contributed to a surprisingly large number of apparently non-functional pseudogenes, a very unusual feature for a prokaryotic genome. The presence of these damaged genes suggests that some of the plasmids may be in a period of rapid evolution. The sequence predicts 535 plasmid genes ≥300 bp in length that may be intact and 167 apparently mutationally damaged and/or unexpressed genes (pseudogenes). The large majority, over 90%, of genes on these plasmids have no convincing similarity to genes outside Borrelia, suggesting that they perform specialized functions.

811 citations


Journal ArticleDOI
TL;DR: The studies support a model for KIR haplotype diversity based on six basic gene compositions, and suggest that the centromeric half of the KIR genomic region is comprised of three major combinations, while the telomeric half can assume a short form with either 2DS4 or KIR1D or a long form with multiple combinations of several stimulatory KIR genes.
Abstract: Killer Ig-like receptor (KIR) genes constitute a multigene family whose genomic diversity is achieved through differences in gene content and allelic polymorphism. KIR haplotypes containing a single activating KIR gene (A-haplotypes), and KIR haplotypes with multiple activating receptor genes (B-haplotypes) have been described. We report the evaluation of KIR gene content in extended families, sibling pairs, and an unrelated Caucasian panel through identification of the presence or absence of 14 KIR genes and 2 pseudogenes. Haplotype definition included subtyping for the expressed and nonexpressed KIR2DL5 variants, for two alleles of pseudogene 3DP1, and for two alleles of 2DS4, including a novel 2DS4 allele, KIR1D. KIR1D appears functionally homologous to the rhesus monkey KIR1D and likely arose as a consequence of a 22 nucleotide deletion in the coding sequence of 2DS4, leading to disruption of Ig-domain 2D and a premature termination codon following the first amino acid in the putative transmembrane domain. Our investigations identified 11 haplotypes within 12 families. From 49 sibling pairs and 17 consanguineous DNA samples, an additional 12 haplotypes were predicted. Our studies support a model for KIR haplotype diversity based on six basic gene compositions. We suggest that the centromeric half of the KIR genomic region is comprised of three major combinations, while the telomeric half can assume a short form with either 2DS4 or KIR1D or a long form with multiple combinations of several stimulatory KIR genes. Additional rare haplotypes can be identified, and may have arisen by gene duplication, intergenic recombination, or deletions.

387 citations


Journal ArticleDOI
TL;DR: Phylogenetic analyses highlight events in the divergence of the TPS paralogs and suggest orthologous genes and a model for the evolution of theTPS gene family.
Abstract: A family of 40 terpenoid synthase genes (AtTPS) was discovered by genome sequence analysis in Arabidopsis thaliana. This is the largest and most diverse group of TPS genes currently known for any species. AtTPS genes cluster into five phylogenetic subfamilies of the plant TPS superfamily. Surprisingly, thirty AtTPS closely resemble, in all aspects of gene architecture, sequence relatedness and phylogenetic placement, the genes for plant monoterpene synthases, sesquiterpene synthases or diterpene synthases of secondary metabolism. Rapid evolution of these AtTPS resulted from repeated gene duplication and sequence divergence with minor changes in gene architecture. In contrast, only two AtTPS genes have known functions in basic (primary) metabolism, namely gibberellin biosynthesis. This striking difference in rates of gene diversification in primary and secondary metabolism is relevant for an understanding of the evolution of terpenoid natural product diversity. Eight AtTPS genes are interrupted and are likely to be inactive pseudogenes. The localization of AtTPS genes on all five chromosomes reflects the dynamics of the Arabidopsis genome; however, several AtTPS genes are clustered and organized in tandem repeats. Furthermore, some AtTPS genes are localized with prenyltransferase genes (AtGGPPS, geranylgeranyl diphosphate synthase) in contiguous genomic clusters encoding consecutive steps in terpenoid biosynthesis. The clustered organization may have implications for TPS gene evolution and the evolution of pathway segments for the synthesis of terpenoid natural products. Phylogenetic analyses highlight events in the divergence of the TPS paralogs and suggest orthologous genes and a model for the evolution of the TPS gene family.

368 citations


Journal ArticleDOI
TL;DR: This study provides the basis for functional studies of the transcriptional regulation and ligand-binding capabilities of the OR gene family, and finds orthologous clusters at syntenic human locations for most mouse genes, indicating that most OR gene clusters predate primate-rodent divergence.
Abstract: We report a comprehensive comparative analysis of human and mouse olfactory receptor (OR) genes. The OR family is the largest mammalian gene family known. We identify ∼93% of an estimated 1500 mouse ORs, exceeding previous estimates and the number of human ORs by 50%. Only 20% are pseudogenes, giving a functional OR repertoire in mice that is three times larger than that of human. The proteins encoded by intact human ORs are less highly conserved than those of mouse, in patterns that suggest that even some apparently intact human OR genes may encode non-functional proteins. Mouse ORs are clustered in 46 genomic locations, compared to a much more dispersed pattern in human. We find orthologous clusters at syntenic human locations for most mouse genes, indicating that most OR gene clusters predate primate–rodent divergence. However, many recent local OR duplications in both genomes obscure one-to-one orthologous relationships, thereby complicating cross-species inferences about OR–ligand interactions. Local duplications are the major force shaping the gene family. Recent interchromosomal duplications of ORs have also occurred, but much more frequently in human than in mouse. In addition to clarifying the evolutionary forces shaping this gene family, our study provides the basis for functional studies of the transcriptional regulation and ligand-binding capabilities of the OR gene family.

320 citations


Journal ArticleDOI
TL;DR: The studies indicate that the CMAH gene was inactivated shortly before the time when brain expansion began in humankind's ancestry, ≈2.1–2.2 mya.
Abstract: Humans are genetically deficient in the common mammalian sialic acid N-glycolylneuraminic acid (Neu5Gc) because of an Alu-mediated inactivating mutation of the gene encoding the enzyme CMP-N-acetylneuraminic acid (CMP-Neu5Ac) hydroxylase (CMAH). This mutation occurred after our last common ancestor with bonobos and chimpanzees, and before the origin of present-day humans. Here, we take multiple approaches to estimate the timing of this mutation in relationship to human evolutionary history. First, we have developed a method to extract and identify sialic acids from bones and bony fossils. Two Neandertal fossils studied had clearly detectable Neu5Ac but no Neu5Gc, indicating that the CMAH mutation predated the common ancestor of humans and Neandertals, ≈0.5–0.6 million years ago (mya). Second, we date the insertion event of the inactivating human-specific sahAluY element that replaced the ancestral AluSq element found adjacent to exon 6 of the CMAH gene in the chimpanzee genome. Assuming Alu source genes based on a phylogenetic tree of human-specific Alu elements, we estimate the sahAluY insertion time at ≈2.7 mya. Third, we apply molecular clock analysis to chimpanzee and other great ape CMAH genes and the corresponding human pseudogene to estimate an inactivation time of ≈2.8 mya. Taken together, these studies indicate that the CMAH gene was inactivated shortly before the time when brain expansion began in humankind's ancestry, ≈2.1–2.2 mya. In this regard, it is of interest that although Neu5Gc is the major sialic acid in most organs of the chimpanzee, its expression is selectively down-regulated in the brain, for as yet unknown reasons.

303 citations


Journal ArticleDOI
TL;DR: Physical map data and phylogenetic analysis indicated that multiple genomic duplication events have increased the numbers of TX and TN genes in Arabidopsis, suggesting that these genes encode functional proteins rather than resulting from degradation or deletion of TNL genes.
Abstract: The Toll/interleukin-1 receptor (TIR) domain is found in one of the two large families of homologues of plant disease resistance proteins (R proteins) in Arabidopsis and other dicotyledonous plants. In addition to these TIR-NBS-LRR (TNL) R proteins, we identified two families of TIR-containing proteins encoded in the Arabidopsis Col-0 genome. The TIR-X (TX) family of proteins lacks both the nucleotide-binding site (NBS) and the leucine rich repeats (LRRs) that are characteristic of the R proteins, while the TIR-NBS (TN) proteins contain much of the NBS, but lack the LRR. In Col-0, the TX family is encoded by 27 genes and three pseudogenes; the TN family is encoded by 20 genes and one pseudogene. Using massively parallel signature sequencing (MPSS), expression was detected at low levels for approximately 85% of the TN-encoding genes. Expression was detected for only approximately 40% of the TX-encoding genes, again at low levels. Physical map data and phylogenetic analysis indicated that multiple genomic duplication events have increased the numbers of TX and TN genes in Arabidopsis. Genes encoding TX, TN and TNL proteins were demonstrated in conifers; TX and TN genes are present in very low numbers in grass genomes. The expression, prevalence, and diversity of TX and TN genes suggests that these genes encode functional proteins rather than resulting from degradation or deletions of TNL genes. These TX and TN proteins could be plant analogues of small TIR-adapter proteins that function in mammalian innate immune responses such as MyD88 and Mal.

258 citations


Journal ArticleDOI
TL;DR: This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae, and suggests a new model for the evolution of sex.
Abstract: This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.

251 citations


Journal ArticleDOI
01 Jun 2002-Genomics
TL;DR: The homology, expression profile, and functional similarity of the receptors in the dog, ferret, and rhesus to that of human support the potential use of these species as preclinical animal models in the development of therapeutic agents for obesity or other MCH-mediated disorders.

Journal ArticleDOI
TL;DR: Because of the limited window of opportunity for mtDNA transfer to the germline, sperm mtDNA, which is released from degenerating mitochondria after fertilization, could be an important source of nuclear mtDNA pseudogenes.
Abstract: Mitochondrial pseudogenes in the human nuclear genome have been previously described, mostly as a source of artifacts during the analysis of the mitochondrial genome. With the availability of the complete human genome sequence, we performed a comprehensive analysis of mtDNA insertions into the nucleus. We found 612 independent integrations that are evenly distributed among all chromosomes as well as within each individual chromosome. The identified pseudogenes account for a content of at least 0.016% of the human nuclear DNA. Up to 30% of a chromosome's mtDNA pseudogene content is composed of fragments that encompass two or more adjacent mitochondrial genes, and we found no correlation between the abundance of mitochondrial transcripts and the multiplicity of integrations. These observations indicate that the migrations of mitochondrial DNA sequences to the nucleus were predominantly DNA mediated. Phylogenetic analysis of the mtDNA pseudogenes and mtDNA sequences of primates indicate a continuous transfer into the nucleus. Because of the limited window of opportunity for mtDNA transfer to the germline, sperm mtDNA, which is released from degenerating mitochondria after fertilization, could be an important source of nuclear mtDNA pseudogenes.

Journal ArticleDOI
TL;DR: The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation and an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species.
Abstract: We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5' of Xist that was recently shown to attract histone modification early after the onset of X inactivation.

Journal ArticleDOI
TL;DR: The large-scale distribution of RP pseudogenes throughout the genome appears to result, chiefly, from random insertions with the numbers on each chromosome, consequently, proportional to its size, with the highest density in GC-intermediate regions of the genome.
Abstract: Mammals have 79 ribosomal proteins (RP). Using a systematic procedure based on sequence-homology, we have comprehensively identified pseudogenes of these proteins in the human genome. Our assignments are available at http://www.pseudogene.org or http://bioinfo.mbb.yale.edu/genome/pseudogene. In total, we found 2090 processed pseudogenes and 16 duplications of RP genes. In relation to the matching parent protein, each of the processed pseudogenes has an average relative sequence length of 97% and an average sequence identity of 76%. A small number (258) of them do not contain obvious disablements (stop codons or frameshifts) and, therefore, could be mistaken as functional genes, and 178 are disrupted by one or more repetitive elements. On average, processed pseudogenes have a longer truncation at the 5' end than the 3' end, consistent with the target-primed-reverse-transcription (TPRT) mechanism. Interestingly, on chromosome 16, an RPL26 processed pseudogene was found in the intron region of a functional RPS2 gene. The large-scale distribution of RP pseudogenes throughout the genome appears to result, chiefly, from random insertions with the numbers on each chromosome, consequently, proportional to its size. In contrast to RP genes, the RP pseudogenes have the highest density in GC-intermediate regions (41%-46%) of the genome, with the density pattern being between that of LINEs and Alus. This can be explained by a negative selection theory as we observed that GC-rich RP pseudogenes decay faster in GC-poor regions. Also, we observed a correlation between the number of processed pseudogenes and the GC content of the associated functional gene, i.e., relatively GC-poor RPs have more processed pseudogenes. This ranges from 145 pseudogenes for RPL21 down to 3 pseudogenes for RPL14. We were able to date the RP pseudogenes based on their sequence divergence from present-day RP genes, finding an age distribution similar to that for Alus. The distribution is consistent with a decline in retrotransposition activity in the hominid lineage during the last 40 Myr. We discuss the implications for retrotransposon stability and genome dynamics based on these new findings.

Journal ArticleDOI
TL;DR: Sequence and copy number polymorphisms in OR genes have been described, which may account for interindividual differences in odorant detection thresholds.
Abstract: Olfactory receptor (OR) proteins interact with odorant molecules in the nose, initiating a neuronal response that triggers the perception of a smell. The OR family is one of the largest known mammalian gene families, with around 900 genes in human and 1500 in mouse. After discounting pseudogenes, the functional repertoire in mouse is more than three times larger than that of human. OR genes encode G-protein-coupled receptors containing seven transmembrane domains. ORs are arranged in clusters of up to 100 genes dispersed in 40-100 genomic locations. Each neuron in the olfactory epithelium expresses only one allele of one OR gene. The mechanism of gene choice is still unknown, but must involve locus, gene, and allele selection. The gene family has expanded mainly by tandem duplications, many of which have occurred since the divergence of the rodent and primate lineages. Interchromosomal segmental duplications including OR genes have also occurred, but more commonly in the human than the mouse family. As a result, many human OR genes have several possible mouse orthologs, and vice versa. Sequence and copy number polymorphisms in OR genes have been described, which may account for interindividual differences in odorant detection thresholds.

Journal ArticleDOI
TL;DR: The main populations and clusters of pseudogenes on chromosomes 21 and 22 are determined, and it is found that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.
Abstract: Pseudogenes are disabled copies of genes that do not produce a functional, full-length copy of a protein (Mighell et al. 2000; Vanin 1985). They are of two types: First, processed pseudogenes result from reverse transcription of messenger RNA transcripts followed by reintegration into genomic DNA (presumably in germ-line cells) and subsequent degradation with disablements (premature stop codons and frameshifts) (Vanin 1985). Second, nonprocessed pseudogenes result from duplication of a gene, followed by an initial disablement if the duplicated copy is not “useful” (Mighell et al. 2000). These then also accumulate further coding disablements. The extent of the pseudogene population in the human genome is unclear. Estimates for the number of human genes range from ∼22,000 to ∼75,000 (Crollius et al. 2000; Ewing and Green 2000; Lander et al. 2001; Venter et al. 2001; Wright et al. 2001). From previous reports, it is thought that up to 22% of these gene predictions may be pseudogenic (Lander et al. 2001; Yeh et al. 2001). It is important to characterize the human processed and nonprocessed pseudogene populations as their existence interferes with gene identification and prediction (particularly nonprocessed pseudogenes or individual pseudogenic exons). They are also an important resource for the study of the evolution of protein families (see, e.g., studies on the human olfactory receptor subgenome [e.g. Glusman et al. 2001]). Here, we have performed a detailed analysis of the pseudogene populations of human chromosomes 21 and 22, which have been sequenced contiguously to high quality. This is similar in spirit to previous surveys we have performed on pseudogenes and other genomic features in other organisms (Harrison et al. 2001; Gerstein 1997, 1998; Hegyi and Gerstein 1999). We have examined the main populations and clusters of pseudogenes for the two chromosomes. Patterns of distribution of both nonprocessed and processed pseudogenes indicate the existence of pseudogenic hot-spots in the human genome. In addition, we have estimated the total numbers and proportions of processed and nonprocessed pseudogenes in the whole human genome.

Journal ArticleDOI
TL;DR: A first global draft of the mouse V1r repertoire is obtained, including eight new and extremely isolated families in addition to the four families previously identified, which reflects a specialization of different receptor classes in the detection of specific types of chemicals.
Abstract: Seven-transmembrane-domain proteins encoded by the vomeronasal receptor V1r and V2r gene superfamilies, and expressed by vomeronasal sensory neurons, are believed to be pheromone receptors in rodents. Four V1r gene families have been described in the mouse (V1ra, V1rb, V1rc and V3r). Here we have screened near-complete mouse genomic databases to obtain a first global draft of the mouse V1r repertoire, including 104 new V1r genes. It comprises eight new and extremely isolated families in addition to the four families previously identified. Members of these new families were expressed in vomeronasal sensory neurons. The genome-wide view revealed great sequence diversity within the V1r superfamily. Phylogenetic analyses suggested an ancient original radiation, followed by the isolation, divergence and expansion of families by extensive gene duplications and frequent gene loss. The isolated nature of these gene families probably reflects a specialization of different receptor classes in the detection of specific types of chemicals.

Journal ArticleDOI
TL;DR: Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts, and there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome.

Journal ArticleDOI
TL;DR: Analysis of genomic regions surrounding the Siglec-11 gene suggests that it is actually a chimeric molecule that arose from relatively recent gene duplication and recombination events, involving the extracellular domain of a closely related ancestral SigleC gene and a transmembrane and cytosolic tail derived from another ancestral SigLec.

Journal ArticleDOI
TL;DR: Two particularly exciting new ideas are that SINEs may help cells survive physiological stress, and that the evolution of Sines and LINEs has been shaped by the forces of RNA interference.

Journal ArticleDOI
TL;DR: This genomic study supports the hypothesis that COP1 acts as a repressor of photomorphogenesis, possibly by controlling the degradation of transcription factors and their target gene expression.
Abstract: Microarray gene expression profiling was used to examine the role of COP1 in the light control of Arabidopsis genome expression. Qualitatively similar gene expression profiles were observed between wild-type seedlings grown in white light and multiple cop1 mutant alleles grown in the dark. Furthermore, overexpression of the dominant-negative-acting N terminus of COP1 (N282) in darkness produced a genome expression profile similar to those produced by white light and the cop1 mutations. Different cop1 mutant alleles, N282, and light treatment also resulted in distinct expression profiles in a small fraction of the genes examined. In the light, the genome expression of cop1 mutations displayed an exaggerated light response. COP1-regulated genes in the dark were estimated to account for >20% of the genome. Analysis of these COP1-regulated genes revealed that >28 cellular pathways are coordinately but antagonistically regulated by light and COP1. Interestingly, the gene expression regulation attributable to HY5 in the light is included largely within those genes regulated by COP1 in the dark. Thus, this genomic study supports the hypothesis that COP1 acts as a repressor of photomorphogenesis, possibly by controlling the degradation of transcription factors and their target gene expression. The majority of light-controlled genome expression could be accounted for by the negative regulation of COP1 activity.

Journal ArticleDOI
TL;DR: While the knockout mice appear outwardly normal, a number of important findings have been discovered using these mice and these will be covered in this review.

Journal ArticleDOI
01 Dec 2002-Genomics
TL;DR: The multiple duplications of GLUT genes suggest that the GLUT family probably emerged by gene duplications and mutations during evolution in different lineages.

Journal ArticleDOI
TL;DR: An upgraded PCR-SSP method for KIR genotyping that integrates recent achievements in the research of the diversity of this gene family and permits detection of all known KIR genes and pseudogenes in a 16-reaction set.
Abstract: Killer-cell Immunoglobulin-like Receptors (KIR) help human natural killer (NK) cells counteract infections by pathogens that evade the immune system by inducing down-regulation of HLA class I molecules in infected cells. KIRs are structural and functionally diverse receptors encoded by a family of polymorphic genes. The most extreme aspect of KIR polymorphism is the varying content of KIR-genes in the genome of different individuals, as first demonstrated by KIR genotyping using the PCR-SSP method. Knowledge on the KIR-gene family has been recently expanded by the identification of new genes, pseudogenes and multiple gene variants, several of which escaped detection by the original genotyping technique. We present here an upgraded PCR-SSP method for KIR genotyping that integrates recent achievements in the research of the diversity of this gene family. Our method permits detection of all known KIR genes and pseudogenes in a 16-reaction set. Furthermore, an additional set of six reactions permits subtyping of KIR2DL5 variants, each of which shows well-differentiated functional and genetic features 1 , 2 .

Book
01 Sep 2002
TL;DR: The history of molecular phylogenetics, including the Smith-Waterman algorithm, the nature of chemical bonds, and strategies for faster searches are reviewed.
Abstract: I. MOLECULAR BIOLOGY AND BIOLOGICAL CHEMISTRY. The genetic material. Nucleotides. Orientation. Base pairing. The central dogma of molecular biology. Gene structure and information content. Promoter sequences. The genetic code. Open reading frames. Introns and exons. Protein structure and function. Primary structure. Secondary, tertiary and quaternary structure. The nature of chemical bonds. Anatomy of an atom. Valence. Electronegativity. Hydrophilicity and hydrophobicity. Molecular biology tools. Restriction enzymes. Gel electrophoresis. Blotting, hybridization and microarrays. Cloning. Polymerase chain reaction (PCR). DNA sequencing. Genomic information content. C value paradox. Reassociation kinetics. II. DATA SEARCHES AND PAIRWISE ALIGNMENTS. Dot plots. Simple alignments. Scoring. Gaps. Simple gap penalties. Origination and length penalties. Scoring matrices. Dynamic programming: The Needleman and Wunsch algorithm. Local and global alignments. Global and Semi-global alignments. The Smith-Waterman algorithm. Database searches. BLAST and its relatives. Other algorithms. Multiple sequence alignments. III. SUBSTITUTION PATTERNS. Patterns of substitutions within genes. Mutation rates. Functional constraint. Synonymous vs. nonsynonymous changes. Indels and psuedogenes. Substitutions vs. mutations. Fixation. Estimating substitution numbers. Jukes/Cantor model. Transitions and transversions. Kimura's two-parameter model. Models with even more parameters. Substitutions between protein sequences. Variations in substitution rates between genes. Molecular clocks. Relative rate tests. Causes of rate variation in lineages. Evolution in organelles. IV. DISTANCE-BASED METHODS OF PHYLOGENETICS. History of molecular phylogenetics. Advantages to molecular phylogenies. Phylogenetic trees. Terminology of tree reconstruction. Rooted and unrooted trees. Gene vs. species trees. Character and distance data. Distance matrix methods. UPGMA. Estimation of branch lengths. Transformed distance method. Neighbor's relation method. Neighbor-joining methods. Maximum likelihood approaches. Multiple sequence alignments. V. CHARACTER-BASED APPROACHES TO PHYLOGENETICS. Parsimony. Informative and uninformative sites. Unweighted parsimony. Weighted parsimony. Inferred ancestral sequences. Strategies for faster searches. Branch and bound. Heuristic. Consensus trees. Tree confidence. Bootstrapping. Parametric tests. Comparison of phylogenetic methods. Molecular phylogenies. The tree of life. Human origins. VI. GENOMICS AND GENE RECOGNITION. Prokaryotic genomes. Prokaryotic gene structure. Promoter elements. Open reading frames. Conceptual translation. Termination sequences. GC-content. Prokaryotic gene density. Eukaryotic genomes. Eukaryotic gene structure. Promoter elements. Regulatory protein binding sites. Open reading frames. Introns and exons. Alternative splicing. CpG islands. GC-content. Isochores. Codon usage bias. Gene expression. cDNAs and ESTs. Serial analysis of gene expression (SAGE). Microarrays. Transposition. Repetitive elements. Eukaryotic gene density. VII. PROTEIN FOLDING. Polypeptide composition. Amino acids. Backbone flexibility, phi and psi. Secondary structure. Accuracy of predictions. Chou-Fasman/GOR method. Tertiary and quaternary structure. Hydrophobicity. Disulfide bonds. Active structures vs. most stable structures. Protein folding. Lattice models. Off-lattice models. Energy functions and optimization. Structure prediction. Comparative modeling. Threading: Reverse protein folding. Predicting RNA secondary structures. VIII. PROTEOMICS. From genomes to proteomes. Protein classification. Enzyme nomenclature. Families and superfamilies. Folds. Experimental techniques. 2D electrophoresis. Mass spectrometry. Protein microarrays. Inhibitors and drug design. Ligand screening. Docking. Database screening. X-ray crystal structures. Empirical methods and prediction techniques. Postranslational modification prediction. Protein sorting. Proteolytic cleavage. Glycosylation. Phosporylation and sulfation. Appendix 1: A gentle introduction to programming and data structures. Introduction. The basics. Creating and compiling computer programs. Variables and values. Data typing. Basic operations. Program control. Statements and blocks. Conditional execution. Loops. Readability. Structured programming. Comments. Descriptive variable names. Data structures. Arrays. Pointers and dynamic memory allocation. Strings in PERL. Input and output. Appendix 2: Enzyme kinetics. Enzymes as biological catalysts. The Henri-Michaelis-Menten equation. Vmax and Km. Direct plot. Lineweaver-Burk reciprocal plot. Eadie-Hofstee plot. Simple inhibition systems. Competitive inhibition. Noncompetitive inhibition. Reversible and irreversible inhibition. Effects of pH and temperature. Appendix 3: Sample programs in PERL and worksets. Conceptual translation. Dot matrix. Relative rate test. UPGMA. Common ancestor. Splice junction recognition. Hydrophobicity calculator. DNA binding domains. Lineweaver-Burk plot.

Journal ArticleDOI
21 Aug 2002-Gene
TL;DR: Most SUMO-1/2/3 proteins were shown to be localized on nuclear membrane, nuclear bodies and cytoplasm, respectively, while many cellular proteins of high molecular weights were covalently modified bysumolyzation.

Journal ArticleDOI
TL;DR: It is demonstrated that recombination of a whole pseudogene is followed by a second level of variation in which small segments of pseudogenes recombine into the expression site by gene conversion, providing for a combinatorial number of possible expressed MSP2 variants sufficient for lifelong persistence.
Abstract: The rickettsial pathogen Anaplasma marginale establishes lifelong persistent infection in the mammalian reservoir host, during which time immune escape variants continually arise in part because of variation in the expressed copy of the immunodominant outer membrane protein MSP2. A key question is how the small 1.2 Mb A. marginale genome generates sufficient variants to allow long-term persistence in an immunocompetent reservoir host. The recombination of whole pseudogenes into the single msp2 expression site has been previously identified as one method of generating variants, but is inadequate to generate the number of variants required for persistent infection. In the present study, we demonstrate that recombination of a whole pseudogene is followed by a second level of variation in which small segments of pseudogenes recombine into the expression site by gene conversion. Evidence for four short sequential changes in the hypervariable region of msp2 coupled with the identification of nine pseudogenes from a single strain of A. marginale provides for a combinatorial number of possible expressed MSP2 variants sufficient for lifelong persistence.

Journal ArticleDOI
01 May 2002-Genetica
TL;DR: Here the robustness of indel bias measurements in Drosophila are demonstrated, by comparing indel patterns in different types of nonfunctional sequences, both euchromatic and heterochromatic, transposable and non-transposable, repetitive and unique.
Abstract: Mutation is often said to be random. Although it must be true that mutation is ignorant about the adaptive needs of the organism and thus is random relative to them as a rule, mutation is not truly random in other respects. Nucleotide substitutions, deletions, insertions, inversions, duplications and other types of mutation occur at different rates and are effected by different mechanisms. Moreover the rates of different mutations vary from organism to organism. Differences in mutational biases, along with natural selection, could impact gene and genome evolution in important ways. For instance, several recent studies have suggested that differences in insertion/deletion biases lead to profound differences in the rate of DNA loss in animals and that this difference per se can lead to significant changes in genome size. In particular, Drosophila melanogaster appears to have a very high rate of deletions and the correspondingly high rate of DNA loss and a very compact genome. To assess the validity of these studies we must first assess the validity of the measurements of indel biases themselves. Here I demonstrate the robustness of indel bias measurements in Drosophila, by comparing indel patterns in different types of nonfunctional sequences. The indel pattern and the high rate of DNA loss appears to be shared by all known nonfunctional sequences, both euchromatic and heterochromatic, transposable and non-transposable, repetitive and unique. Unfortunately all available nonfunctional sequences are untranscribed and thus effects of transcription on indel bias cannot be assessed. I also discuss in detail why it is unlikely that natural selection for or against DNA loss significantly affects current estimates of indel biases.

Journal ArticleDOI
01 Jul 2002-Genomics
TL;DR: Using a BLAST approach, 1105 DNA sequences homologous to mitochondrial DNA (mtDNA) in the August 2001 Goldenpath human genome database are found and assembled manually into 286 pseudogenes on the basis of single insertion events and a chromosomal map of these Numts is constructed.

Journal ArticleDOI
TL;DR: Comparisons of orthologous regions indicated that gene density in wheat is about one-half compared with rice, mainly because of amplification of the gene-poor regions, and insertional inactivation by adjoining retro-elements and selection seem to have played a major role in stabilizing genomes.
Abstract: Deletion line-based high-density physical maps revealed that the wheat (Triticum aestivum) genome is partitioned into gene-rich and -poor compartments. Available deletion lines have bracketed the gene-containing regions to about 10% of the genome. Emerging sequence data suggest that these may further be partitioned into "mini" gene-rich and gene-poor regions. An average of about 10% of each gene-rich region seem to contain genes. Sequence analyses in various species suggest that uneven distribution of genes may be a characteristic of all grasses and perhaps all higher organisms. Comparison of the physical maps with genetic linkage maps showed that recombination in wheat and barley (Hordeum vulgare) is confined to the gene-containing regions. Number of genes, gene density, and the extent of recombination vary greatly among the gene-rich regions. The gene order, relative region size, and recombination are highly conserved within the tribe Triticeae and moderately conserved within the family. Gene-poor regions are composed of retrotransposon-like non-transcribing repeats and pseudogenes. Direct comparisons of orthologous regions indicated that gene density in wheat is about one-half compared with rice (Oryza sativa). Genome size difference between wheat and rice is, therefore, mainly because of amplification of the gene-poor regions. Presence of species-, genera-, and family-specific repeats reveal a repeated invasion of the genomes by different retrotransposons over time. Preferential transposition to adjacent locations and presence of vital genes flanking a gene-rich region may have restricted retrotransposon amplification to gene-poor regions, resulting into tandem blocks of non-transcribing repeats. Insertional inactivation by adjoining retro-elements and selection seem to have played a major role in stabilizing genomes.