scispace - formally typeset
Search or ask a question

Showing papers on "Intron published in 2004"


Journal ArticleDOI
TL;DR: It is strongly suggested that miRNAs are transcribed in parallel with their host transcripts, and that the two different transcription classes of miRNAAs ('exonic' and 'intronic') identified here may require slightly different mechanisms of biogenesis.
Abstract: To derive a global perspective on the transcription of microRNAs (miRNAs) in mammals, we annotated the genomic position and context of this class of noncoding RNAs (ncRNAs) in the human and mouse genomes. Of the 232 known mammalian miRNAs, we found that 161 overlap with 123 defined transcription units (TUs). We identified miRNAs within introns of 90 protein-coding genes with a broad spectrum of molecular functions, and in both introns and exons of 66 mRNA-like noncoding RNAs (mlncRNAs). In addition, novel families of miRNAs based on host gene identity were identified. The transcription patterns of all miRNA host genes were curated from a variety of sources illustrating spatial, temporal, and physiological regulation of miRNA expression. These findings strongly suggest that miRNAs are transcribed in parallel with their host transcripts, and that the two different transcription classes of miRNAs (`exonic' and `intronic') identified here may require slightly different mechanisms of biogenesis.

2,043 citations


Journal ArticleDOI
28 May 2004-Science
TL;DR: There are 481 segments longer than 200 base pairs that are absolutely conserved between orthologous regions of the human, rat, and mouse genomes, which represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserving between these species than are proteins.
Abstract: There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

1,690 citations


Journal ArticleDOI
TL;DR: SSRs within genes evolve through mutational processes similar to those for SSRs located in other genomic regions including replication slippage, point mutation, and recombination and may provide a molecular basis for fast adaptation to environmental changes in both prokaryotes and eukaryotes.
Abstract: Recently, increasingly more microsatellites, or simple sequence repeats (SSRs) have been found and characterized within protein-coding genes and their untranslated regions (UTRs). These data provide useful information to study possible SSR functions. Here, we review SSR distributions within expressed sequence tags (ESTs) and genes including protein-coding, 3'-UTRs and 5'-UTRs, and introns; and discuss the consequences of SSR repeat-number changes in those regions of both prokaryotes and eukaryotes. Strong evidence shows that SSRs are nonrandomly distributed across protein-coding regions, UTRs, and introns. Substantial data indicates that SSR expansions and/or contractions in protein-coding regions can lead to a gain or loss of gene function via frameshift mutation or expanded toxic mRNA. SSR variations in 5'-UTRs could regulate gene expression by affecting transcription and translation. The SSR expansions in the 3'-UTRs cause transcription slippage and produce expanded mRNA, which can be accumulated as nuclear foci, and which can disrupt splicing and, possibly, disrupt other cellular function. Intronic SSRs can affect gene transcription, mRNA splicing, or export to cytoplasm. Triplet SSRs located in the UTRs or intron can also induce heterochromatin-mediated-like gene silencing. All these effects caused by SSR expansions or contractions within genes can eventually lead to phenotypic changes. SSRs within genes evolve through mutational processes similar to those for SSRs located in other genomic regions including replication slippage, point mutation, and recombination. These mutational processes generate DNA changes that should be connected by DNA mismatch repair (MMR) system. Mutation that has escaped from the MMR system correction would become new alleles at the SSR loci, and then regulate and/or change gene products, and eventually lead to phenotype changes. Therefore, SSRs within genes should be subjected to stronger selective pressure than other genomic regions because of their functional importance. These SSRs may provide a molecular basis for fast adaptation to environmental changes in both prokaryotes and eukaryotes.

1,039 citations


Journal ArticleDOI
TL;DR: Intramolecular pairs of Alu elements are identified as a major target for editing in the human transcriptome and it is suggested that modification of repetitive elements is a predominant activity for RNA editing with significant implications for cellular gene expression.
Abstract: RNA editing by adenosine deamination generates RNA and protein diversity through the posttranscriptional modification of single nucleotides in RNA sequences. Few mammalian A-to-I edited genes have been identified despite evidence that many more should exist. Here we identify intramolecular pairs of Alu elements as a major target for editing in the human transcriptome. An experimental demonstration in 43 genes was extended by a broader computational analysis of more than 100,000 human mRNAs. We find that 1,445 human mRNAs (1.4%) are subject to RNA editing at more than 14,500 sites, and our data further suggest that the vast majority of pre-mRNAs (greater than 85%) are targeted in introns by the editing machinery. The editing levels of Alu-containing mRNAs correlate with distance and homology between inverted repeats and vary in different tissues. Alu-mediated RNA duplexes targeted by RNA editing are formed intramolecularly, whereas editing due to intermolecular base-pairing appears to be negligible. We present evidence that these editing events can lead to the posttranscriptional creation or elimination of splice signals affecting alternatively spliced Alu-derived exons. The analysis suggests that modification of repetitive elements is a predominant activity for RNA editing with significant implications for cellular gene expression.

697 citations


Journal ArticleDOI
17 Dec 2004-Cell
TL;DR: ExonScan and related bioinformatic analyses suggest that these ESS motifs play important roles in suppression of pseudoexons, in splice site definition, and in AS.

683 citations


Journal ArticleDOI
TL;DR: An increasing amount of evidence indicates that genomic variants in both coding and non-coding sequences can have unexpected deleterious effects on the splicing of the gene transcript.
Abstract: When genome variants are identified in genomic DNA, especially during routine analysis of disease-associated genes, their functional implications might not be immediately evident. Distinguishing between a genomic variant that changes the phenotype and one that does not is a difficult task. An increasing amount of evidence indicates that genomic variants in both coding and non-coding sequences can have unexpected deleterious effects on the splicing of the gene transcript. So how can benign polymorphisms be distinguished from disease-associated splicing mutations?

573 citations


Journal ArticleDOI
TL;DR: A Gateway vector, pANDA, is developed for RNA interference of rice genes to help identify the functions of genes whose tagged mutants are not available at present and complement existing methods for functional genomics of rice.
Abstract: Since the recent sequencing of the rice genome, the functional identification of rice genes has become increasingly important. Various tagged lines have been generated; however, the number of tagged genes available is not sufficient for extensive study of gene function. To help identify the functions of genes in rice, we developed a Gateway vector, pANDA, for RNA interference of rice genes. This vector can be used for Agrobacterium transformation of rice and allows easy and fast construction of efficient RNAi vectors. In the construct, hairpin RNA derived from a given gene is transcribed from a strong maize ubiquitin promoter, and an intron is placed 5' upstream of inverted repeats to enhance RNA expression. Analysis of rice genes using this vector showed that suppression of mRNA expression was observed in more than 90% of transgenic plants examined, and short interfering RNA indicative of RNA silencing was detected in each silenced plant. A similar vector, pANDA-mini, was also developed for direct transfer into leaf cells or protoplasts. This vector can be used for transient suppression of gene function in rice. These vectors should help identify the functions of rice genes whose tagged mutants are not available at present and complement existing methods for functional genomics of rice.

503 citations


Journal ArticleDOI
TL;DR: Alu-associated RNA editing may be a mechanism for marking nonstandard transcripts, not destined for translation, that are primarily associated with retained introns, extended UTRs, or with transcripts that have no corresponding known gene.
Abstract: More than one million copies of the approximately 300-bp Alu element are interspersed throughout the human genome, with up to 75% of all known genes having Alu insertions within their introns and/or UTRs. Transcribed Alu sequences can alter splicing patterns by generating new exons, but other impacts of intragenic Alu elements on their host RNA are largely unexplored. Recently, repeat elements present in the introns or 3'-UTRs of 15 human brain RNAs have been shown to be targets for multiple adenosine to inosine (A-to-I) editing. Using a statistical approach, we find that editing of transcripts with embedded Alu sequences is a global phenomenon in the human transcriptome, observed in 2674 ( approximately 2%) of all publicly available full-length human cDNAs (n = 128,406), from >250 libraries and >30 tissue sources. In the vast majority of edited RNAs, A-to-I substitutions are clustered within transcribed sense or antisense Alu sequences. Edited bases are primarily associated with retained introns, extended UTRs, or with transcripts that have no corresponding known gene. Therefore, Alu-associated RNA editing may be a mechanism for marking nonstandard transcripts, not destined for translation.

503 citations


Journal ArticleDOI
01 Oct 2004-RNA
TL;DR: The presence of transcription factors in the spliceosome and the existence of proteins with dual activities in splicing and transcription can explain the links between both processes and add a new level of complexity to the regulation of gene expression in eukaryotes.
Abstract: Transcription and pre-mRNA splicing are extremely complex multimolecular processes that involve protein-DNA, protein-RNA, and protein-protein interactions. Splicing occurs in the close vicinity of genes and is frequently cotranscriptional. This is consistent with evidence that both processes are coordinated and, in some cases, functionally coupled. This review focuses on the roles of cis- and trans-acting factors that regulate transcription, on constitutive and alternative splicing. We also discuss possible functions in splicing of the C-terminal domain (CTD) of the RNA polymerase II (pol II) largest subunit, whose participation in other key pre-mRNA processing reactions (capping and cleavage/polyadenylation) is well documented. Recent evidence indicates that transcriptional elongation and splicing can be influenced reciprocally: Elongation rates control alternative splicing and splicing factors can, in turn, modulate pol II elongation. The presence of transcription factors in the spliceosome and the existence of proteins, such as the coactivator PGC-1, with dual activities in splicing and transcription can explain the links between both processes and add a new level of complexity to the regulation of gene expression in eukaryotes.

501 citations


Journal ArticleDOI
TL;DR: It is demonstrated that MBNL proteins regulate alternative splicing of two pre‐mRNAs that are misregulated in DM, cardiac troponin T (cTNT) and insulin receptor (IR).
Abstract: Although the muscleblind (MBNL) protein family has been implicated in myotonic dystrophy (DM), a specific function for these proteins has not been reported. A key feature of the RNA-mediated pathogenesis model for DM is the disrupted splicing of specific pre-mRNA targets. Here we demonstrate that MBNL proteins regulate alternative splicing of two pre-mRNAs that are misregulated in DM, cardiac troponin T (cTNT) and insulin receptor (IR). Alternative cTNT and IR exons are also regulated by CELF proteins, which were previously implicated in DM pathogenesis. MBNL proteins promote opposite splicing patterns for cTNT and IR alternative exons, both of which are antagonized by CELF proteins. CELF- and MBNL-binding sites are distinct and regulation by MBNL does not require the CELF-binding site. The results are consistent with a mechanism for DM pathogenesis in which expanded repeats cause a loss of MBNL and/or gain of CELF activities, leading to misregulation of alternative splicing of specific pre-mRNA targets.

468 citations


Journal ArticleDOI
01 Jul 2004-Nature
TL;DR: The X-ray crystal structure of a complete group I bacterial intron in complex with both the 5′- and the 3′-exons is reported, representing the first splicing complex to include a complete intron, both exons and an organized active site occupied with metal ions.
Abstract: The discovery of the RNA self-splicing group I intron provided the first demonstration that not all enzymes are proteins. Here we report the X-ray crystal structure (3.1-A resolution) of a complete group I bacterial intron in complex with both the 5'- and the 3'-exons. This complex corresponds to the splicing intermediate before the exon ligation step. It reveals how the intron uses structurally unprecedented RNA motifs to select the 5'- and 3'-splice sites. The 5'-exon's 3'-OH is positioned for inline nucleophilic attack on the conformationally constrained scissile phosphate at the intron-3'-exon junction. Six phosphates from three disparate RNA strands converge to coordinate two metal ions that are asymmetrically positioned on opposing sides of the reactive phosphate. This structure represents the first splicing complex to include a complete intron, both exons and an organized active site occupied with metal ions.

Journal ArticleDOI
TL;DR: The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.
Abstract: We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.

Journal ArticleDOI
TL;DR: The experimental basis for the current understanding of group II intron mobility mechanisms is discussed, beginning with genetic observations in yeast mitochondria, and culminating with a detailed understanding of molecular mechanisms shared by organellar and bacterial group II introns.
Abstract: Mobile group II introns, found in bacterial and organellar genomes, are both catalytic RNAs and retrotransposable elements. They use an extraordinary mobility mechanism in which the excised intron RNA reverse splices directly into a DNA target site and is then reverse transcribed by the intron-encoded protein. After DNA insertion, the introns remove themselves by protein-assisted, autocatalytic RNA splicing, thereby minimizing host damage. Here we discuss the experimental basis for our current understanding of group II intron mobility mechanisms, beginning with genetic observations in yeast mitochondria, and culminating with a detailed understanding of molecular mechanisms shared by organellar and bacterial group II introns. We also discuss recently discovered links between group II intron mobility and DNA replication, new insights into group II intron evolution arising from bacterial genome sequencing, and the evolutionary relationship between group II introns and both eukaryotic spliceosomal introns and non-LTR-retrotransposons. Finally, we describe the development of mobile group II introns into gene-targeting vectors, "targetrons," which have programmable target specificity.

Journal ArticleDOI
TL;DR: The results demonstrate an essential requirement for ADAR1 in embryogenesis and suggest that it functions to promote survival of numerous tissues by editing one or more double-stranded RNAs required for protection against stress-induced apoptosis.

Journal ArticleDOI
TL;DR: The reason why so many factors are needed reflects the observation that exon recognition can be affected by many pre-mRNA features such as exon length, the presence of enhancer and silencer elements, the strength of splicing signals, the promoter architecture, and the rate of RNA processivity.
Abstract: Pre-mRNA splicing in eukaryotes requires joining together the nucleotides of the various mRNA-coding regions (exons) after recognizing them from the normally vastly superior number of non-mRNA-coding sequences (introns). For three excellent reviews on general splicing and its regulation, refer to references 14, 62, and 70. In eukaryotes, the vast majority of splicing processes are catalyzed by the spliceosome, a very complex RNA-protein aggregate which has been estimated to contain several hundred different proteins in addition to five spliceosomal snRNAs (1, 54, 62, 63, 81, 109). These factors are responsible for the accurate positioning of the spliceosome on the 5′ and 3′ splice site sequences. The reason why so many factors are needed reflects the observation that exon recognition can be affected by many pre-mRNA features such as exon length (5, 97), the presence of enhancer and silencer elements (8, 62), the strength of splicing signals (45), the promoter architecture (29, 55), and the rate of RNA processivity (86). In addition, the general cellular environment also exerts an effect, as recent observations suggest the existence of extensive coupling between splicing and many other gene expression steps (69) and even its modification by external stimuli (96).

Journal ArticleDOI
TL;DR: Through the detailed comparisons of several chloroplast genomes, evolutionary hotspots predominated by the inversion end points, indel mutation events, and high frequencies of base substitutions were identified.
Abstract: The nucleotide sequence of Korean ginseng (Panax schinseng Nees) chloroplast genome has been completed (AY582139). The circular double-stranded DNA, which consists of 156,318 bp, contains a pair of inverted repeat regions (IRa and IRb) with 26,071 bp each, which are separated by small and large single copy regions of 86,106 bp and 18,070 bp, respectively. The inverted repeat region is further extended into a large single copy region which includes the 5' parts of the rpsl9 gene. Four short inversions associated with short palindromic sequences that form stem-loop structures were also observed in the chloroplast genome of P. schinseng compared to that of Nicotiana tabacum. The genome content and the relative positions of 114 genes (75 peptide-encoding genes, 30 tRNA genes, 4 rRNA genes, and 5 conserved open reading frames [ycfs]), however, are identical with the chloroplast DNA of N. tabacum. Sixteen genes contain one intron while two genes have two introns. Of these introns, only one (trnL-UAA) belongs to the self-splicing group I; all remaining introns have the characteristics of six domains belonging to group II. Eighteen simple sequence repeats have been identified from the chloroplast genome of Korean ginseng. Several of these SSR loci show infra-specific variations. A detailed comparison of 17 known completed chloroplast genomes from the vascular plants allowed the identification of evolutionary modes of coding segments and intron sequences, as well as the evaluation of the phylogenetic utilities of chloroplast genes. Furthermore, through the detailed comparisons of several chloroplast genomes, evolutionary hotspots predominated by the inversion end points, indel mutation events, and high frequencies of base substitutions were identified. Large-sized indels were often associated with direct repeats at the end of the sequences facilitating intra-molecular recombination.

Journal ArticleDOI
TL;DR: In inverse PCR, it is demonstrated that HIV-1 genomes reside within actively transcribed host genes in resting CD4+ T cells in vivo, and RT-PCR experiments confirmed the presence of HIV- 1 sequences within transcripts initiating upstream of the HIV-2 transcription start site.
Abstract: Resting CD4+ T-cell populations from human immunodeficiency virus type 1 (HIV-1)-infected individuals include cells with integrated HIV-1 DNA. In individuals showing suppression of viremia during highly active antiretroviral therapy (HAART), resting CD4+ T-cell populations do not produce virus without cellular activation. To determine whether the nonproductive nature of the infection in resting CD4+ T cells is due to retroviral integration into chromosomal regions that are repressive for transcription, we used inverse PCR to characterize the HIV-1 integration sites in vivo in resting CD4+ T cells from patients on HAART. Of 74 integration sites from 16 patients, 93% resided within transcription units, usually within introns. Integration was random with respect to transcriptional orientation relative to the host gene and with respect to position within the host gene. Of integration sites within well-characterized genes, 91% (51 of 56) were in genes that were actively expressed in resting CD4+ T cells, as directly demonstrated by reverse transcriptase PCR (RT-PCR). These results predict that HIV-1 sequences may be included in the primary transcripts of host genes as part of rapidly degraded introns. RT-PCR experiments confirmed the presence of HIV-1 sequences within transcripts initiating upstream of the HIV-1 transcription start site. Taken together, these results demonstrate that HIV-1 genomes reside within actively transcribed host genes in resting CD4+ T cells in vivo.

Journal ArticleDOI
TL;DR: The results of a systematic search for interspecies orthologues of miRNA precursors of miRNAs are reported, leading to the identification of 35 human and 45 mouse new putative miRNA genes.
Abstract: Conservation of microRNAs (miRNAs) among species suggests that they bear conserved biological functions. However, sequencing of new miRNAs has not always been accompanied by a search for orthologues in other species. I report herein the results of a systematic search for interspecies orthologues of miRNA precursors, leading to the identification of 35 human and 45 mouse new putative miRNA genes. MicroRNA tracks were written to visualize miRNAs in human and mouse genomes on the UCSC Genome Browser. Based on their localization, miRNA precursors can be excised either from introns or exons of mRNAs. When intronic miRNAs are antisense to the apparent host gene, they appear to originate from ill-characterized antisense transcription units. Exonic miRNAs are, in general, nonprotein-coding, poorly conserved genes in sense orientation. In three cases, the excision of an miRNA from a protein-coding mRNA might lead to the degradation of the rest of the transcript. Moreover, three new examples of miRNAs fully complementary to an mRNA are reported. Among these, miR135a might control the stability and/or translation of an alternative form of the glycerate kinase mRNA by RNA interference. I also discuss the presence of human miRNAs in introns of paralogous genes and in miRNA clusters.

Journal ArticleDOI
TL;DR: This work used RNA interference to examine the role of >70% of the Drosophila RNA-binding proteins in regulating alternative splicing and identified 47 proteins as splicing regulators, 26 of which have not previously been implicated inAlternative splicing.
Abstract: Alternative splicing is thought to be regulated by nonspliceosomal RNA binding proteins that modulate the association of core components of the spliceosome with the pre-mRNA. Although the majority of metazoan genes encode pre-mRNAs that are alternatively spliced, remarkably few splicing regulators are currently known. Here, we used RNA interference to examine the role of >70% of the Drosophila RNA-binding proteins in regulating alternative splicing. We identified 47 proteins as splicing regulators, 26 of which have not previously been implicated in alternative splicing. Many of the regulators we identified are nonspliceosomal RNA-binding proteins. However, our screen unexpectedly revealed that altering the concentration of certain core components of the spliceosome specifically modulates alternative splicing. These results significantly expand the number of known splicing regulators and reveal an extraordinary richness in the mechanisms that regulate alternative splicing.

Journal ArticleDOI
TL;DR: It is concluded that a significant portion of cassette exons evident in EST databases is not functional, and might result from aberrant rather than regulated splicing.

Journal Article
TL;DR: The human genome is revisited using exon and intron distribution profiles to suggest constraints on the splicing machinery to splice out very long or very short introns and provide insight to optimal intron length selection.
Abstract: The human genome is revisited using exon and intron distribution profiles. The 26,564 annotated genes in the human genome (build October, 2003) contain 233,785 exons and 207,344 introns. On average, there are 8.8 exons and 7.8 introns per gene. About 80% of the exons on each chromosome are < 200 bp in length. < 0.01% of the introns are < 20 bp in length and < 10% of introns are more than 11,000 bp in length. These results suggest constraints on the splicing machinery to splice out very long or very short introns and provide insight to optimal intron length selection. Interestingly, the total length in introns and intergenic DNA on each chromosome is significantly correlated to the determined chromosome size with a coefficient of correlation r = 0.95 and r = 0.97, respectively. These results suggest their implication in genome design.

Journal ArticleDOI
TL;DR: Interestingly, the functional distribution of the transcripts with retained introns is skewed towards stress and external/internal stimuli-related functions, and as such may play a regulatory function.
Abstract: Alternative splicing (AS) combines different transcript splice junctions that result in transcripts with shuffled exons, alternative 5' or 3' splicing sites, retained introns and different transcript termini. In this way, multiple mRNA species and proteins can be created from a single gene expanding the potential informational content of eukaryotic genomes. Search algorithms of AS forms in a variety of Arabidopsis databases showed they contained an unusually high fraction of retained introns (above 30%), compared with 10% that was reported for humans. The preponderance of retained introns (65%) were either part of open reading frames, present in the UTR region or present as the last intron in the transcript, indicating that their occurrence would not participate in non-sense-mediated decay. Interestingly, the functional distribution of the transcripts with retained introns is skewed towards stress and external/internal stimuli-related functions. A sampling of the alternative transcripts with retained introns were confirmed by RT-PCR and were shown to co-purify with polyribosomes, indicating their nuclear export. Thus, retained introns are a prominent feature of AS in Arabidopsis and as such may play a regulatory function.

Journal ArticleDOI
22 Oct 2004-Science
TL;DR: A maskless photolithography method is used to produce DNA oligonucleotide microarrays with unique probe sequences tiled throughout the genome of Drosophila melanogaster and across predicted splice junctions to provide a draft expression map for the entire nonrepetitive genome, which reveals a much more extensive and diverse set of expressed sequences than was previously predicted.
Abstract: We used a maskless photolithography method to produce DNA oligonucleotide microarrays with unique probe sequences tiled throughout the genome of Drosophila melanogaster and across predicted splice junctions. RNA expression of protein coding and nonprotein coding sequences was determined for each major stage of the life cycle, including adult males and females. We detected transcriptional activity for 93% of annotated genes and RNA expression for 41% of the probes in intronic and intergenic sequences. Comparison to genome-wide RNA interference data and to gene annotations revealed distinguishable levels of expression for different classes of genes and higher levels of expression for genes with essential cellular functions. Differential splicing was observed in about 40% of predicted genes, and 5440 previously unknown splice forms were detected. Genes within conserved regions of synteny with D. pseudoobscura had highly correlated expression; these regions ranged in length from 10 to 900 kilobase pairs. The expressed intergenic and intronic sequences are more likely to be evolutionarily conserved than nonexpressed ones, and about 15% of them appear to be developmentally regulated. Our results provide a draft expression map for the entire nonrepetitive genome, which reveals a much more extensive and diverse set of expressed sequences than was previously predicted.

Journal ArticleDOI
TL;DR: A survey of RNA editing in human brain by comparing sequences of clones from a human brain cDNA library to the reference human genome sequence and to genomic DNA from the same individual strongly supports the idea that formation of intramolecular double-stranded RNA with an inverted copy underlies most A-->I editing.
Abstract: We have conducted a survey of RNA editing in human brain by comparing sequences of clones from a human brain cDNA library to the reference human genome sequence and to genomic DNA from the same individual. In the RNA sample from which the library was constructed, approximately 1:2000 nucleotides were edited out of >3 Mb surveyed. All edits were adenosine to inosine (A-->I) and were predominantly in intronic and in intergenic RNAs. No edits were found in translated exons and few in untranslated exons. Most edits were in high-copy-number repeats, usually Alus. Analysis of the genome in the vicinity of edited sequences strongly supports the idea that formation of intramolecular double-stranded RNA with an inverted copy underlies most A-->I editing. The likelihood of editing is increased by the presence of two inverted copies of a sequence within the same intron, proximity of the two sequences to each other (preferably within 2 kb), and by a high density of inverted copies in the vicinity. Editing exhibits sequence preferences and is less likely at an adenosine 3' to a guanosine and more likely at an adenosine 5' to a guanosine. Simulation by BLAST alignment of the double-stranded RNA molecules that underlie known edits indicates that there is a greater likelihood of A-->I editing at A:C mismatches than editing at other mismatches or at A:U matches. However, because A:U matches in double-stranded RNA are more common than all mismatches, overall the likely effect of editing is to increase the number of mismatches in double-stranded RNA.

Journal ArticleDOI
Volker Knoop1
TL;DR: The slow sequence evolution and a variable occurrence of introns in plant mtDNA provide an attractive reservoir of phylogenetic information to trace the phylogeny of older land plant clades, which is as yet not fully resolved.
Abstract: Land plants exhibit a significant evolutionary plasticity in their mitochondrial DNA (mtDNA), which contrasts with the more conservative evolution of their chloroplast genomes. Frequent genomic rearrangements, the incorporation of foreign DNA from the nuclear and chloroplast genomes, an ongoing transfer of genes to the nucleus in recent evolutionary times and the disruption of gene continuity in introns or exons are the hallmarks of plant mtDNA, at least in flowering plants. Peculiarities of gene expression, most notably RNA editing and trans-splicing, are significantly more pronounced in land plant mitochondria than in chloroplasts. At the same time, mtDNA is generally the most slowly evolving of the three plant cell genomes on the sequence level, with unique exceptions in only some plant lineages. The slow sequence evolution and a variable occurrence of introns in plant mtDNA provide an attractive reservoir of phylogenetic information to trace the phylogeny of older land plant clades, which is as yet not fully resolved. This review attempts to summarize the unique aspects of land plant mitochondrial evolution from a phylogenetic perspective.

Journal ArticleDOI
TL;DR: Two types of RGC2 genes (Type I and Type II) were initially distinguished based on the pattern of sequence identities between their 3′ regions, and the high frequency of sequence exchange and the presence of numerous chimeric R GC2 genes in nature were confirmed.
Abstract: Resistance Gene Candidate2 (RGC2) genes belong to a large, highly duplicated family of nucleotide binding site–leucine rich repeat (NBS-LRR) encoding disease resistance genes located at a single locus in lettuce (Lactuca sativa). To investigate the genetic events occurring during the evolution of this locus, ∼1.5- to 2-kb 3′ fragments of 126 RGC2 genes from seven genotypes were sequenced from three species of Lactuca, and 107 additional RGC2 sequences were obtained from 40 wild accessions of Lactuca spp. The copy number of RGC2 genes varied from 12 to 32 per genome in the seven genotypes studied extensively. LRR number varied from 40 to 47; most of this variation had resulted from 13 events duplicating two to five LRRs because of unequal crossing-over within or between RGC2 genes at one of two recombination hot spots. Two types of RGC2 genes (Type I and Type II) were initially distinguished based on the pattern of sequence identities between their 3′ regions. The existence of two types of RGC2 genes was further supported by intron similarities, the frequency of sequence exchange, and their prevalence in natural populations. Type I genes are extensive chimeras caused by frequent sequence exchanges. Frequent sequence exchanges between Type I genes homogenized intron sequences, but not coding sequences, and obscured allelic/orthologous relationships. Sequencing of Type I genes from additional wild accessions confirmed the high frequency of sequence exchange and the presence of numerous chimeric RGC2 genes in nature. Unlike Type I genes, Type II genes exhibited infrequent sequence exchange between paralogous sequences. Type II genes from different genotype/species within the genus Lactuca showed obvious allelic/orthologous relationships. Trans-specific polymorphism was observed for different groups of orthologs, suggesting balancing selection. Unequal crossover, insertion/deletion, and point mutation events were distributed unequally through the gene. Different evolutionary forces have impacted different parts of the LRR.

Journal ArticleDOI
TL;DR: Replicative nidoviral uridylate-specific endoribonuclease (NendoU) is established and characterized and substitution of D6408 by Ala was shown to abolish viral RNA synthesis, demonstrating that NendoU has critical functions in viral replication and transcription.
Abstract: Coronaviruses are important pathogens that cause acute respiratory diseases in humans. Replication of the ≈30-kb positive-strand RNA genome of coronaviruses and discontinuous synthesis of an extensive set of subgenome-length RNAs (transcription) are mediated by the replicase-transcriptase, a barely characterized protein complex that comprises several cellular proteins and up to 16 viral subunits. The coronavirus replicase-transcriptase was recently predicted to contain RNA-processing enzymes that are extremely rare or absent in other RNA viruses. Here, we established and characterized the activity of one of these enzymes, replicative nidoviral uridylate-specific endoribonuclease (NendoU). It is considered a major genetic marker that discriminates nidoviruses (Coronaviridae, Arteriviridae, and Roniviridae) from all other RNA virus families. Bacterially expressed forms of NendoU of severe acute respiratory syndrome coronavirus and human coronavirus 229E were revealed to cleave single-stranded and double-stranded RNA in a Mn2+-dependent manner. Single-stranded RNA was cleaved less specifically and effectively, suggesting that double-stranded RNA is the biologically relevant NendoU substrate. Double-stranded RNA substrates were cleaved upstream and downstream of uridylates at GUU or GU sequences to produce molecules with 2′-3′ cyclic phosphate ends. 2′-O-ribose-methylated RNA substrates proved to be resistant to cleavage by NendoU, indicating a functional link with the 2′-O-ribose methyltransferase located adjacent to NendoU in the coronavirus replicative polyprotein. A mutagenesis study verified potential active-site residues and allowed us to inactivate NendoU in the full-length human coronavirus 229E clone. Substitution of D6408 by Ala was shown to abolish viral RNA synthesis, demonstrating that NendoU has critical functions in viral replication and transcription.

Journal ArticleDOI
TL;DR: The observations imply that splicing in fungi may be different from that in vertebrates and may require additional proteins that interact with polypyrimidine tracts upstream of the branch point to function.
Abstract: Genomic sequences and expressed sequence tag data for a diverse group of fungi (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Aspergillus nidulans, Neurospora crassa, and Cryptococcus neoformans) provided the opportunity to accurately characterize conserved intronic elements. An examination of large intron data sets revealed that fungal introns in general are short, that 98% or more of them belong to the canonical splice site (ss) class (5GU ... AG3), and that they have polypyrimidine tracts predominantly in the region between the 5 ss and the branch point. Information content is high in the 5 ss, branch site, and 3 ss regions of the introns but low in the exon regions adjacent to the introns in the fungi examined. The two yeasts have broader intron length ranges and correspondingly higher intron information content than the other fungi. Generally, as intron length increases in the fungi, so does intron information content. Homologs of U2AF spliceosomal proteins were found in all species except for S. cerevisiae, suggesting a nonconventional role for U2AF in the absence of canonical polypyrimidine tracts in the majority of introns. Our observations imply that splicing in fungi may be different from that in vertebrates and may require additional proteins that interact with polypyrimidine tracts upstream of the branch point. Theoretical protein homologs for Nam8p and TIA-1, two proteins that require U-rich regions upstream of the branch point to function, were found. There appear to be sufficient differences between S. cerevisiae and S. pombe introns and the introns of two filamentous members of the Ascomycota and one member of the Basidiomycota to warrant the development of new model organisms for studying the splicing mechanisms of fungi.

Journal ArticleDOI
TL;DR: The genes encoding the 18 known GABR subunits, plus one now located here, are characterized for their precise locations, sizes, and exon/intron structures, and a dicysteine loop and its exon show remarkable constancy between all GABr subunits and species, of deduced functional significance.

Journal ArticleDOI
TL;DR: Understanding the role of 5BSL3.2 and determining how this new CRE functions in the context of previously identified elements at the 5′ and 3′ ends of the RNA genome should provide new insights into HCV RNA replication.
Abstract: RNA structures play key roles in the replication of RNA viruses. Sequence alignment software, thermodynamic RNA folding programs, and classical comparative phylogenetic analysis were used to build models of six RNA elements in the coding region of the hepatitis C virus (HCV) RNA-dependent RNA polymerase, NS5B. The importance of five of these elements was evaluated by site-directed mutagenesis of a subgenomic HCV replicon. Mutations disrupting one of the predicted stem-loop structures, designated 5BSL3.2, blocked RNA replication, implicating it as an essential cis-acting replication element (CRE). 5BSL3.2 is about 50 bases in length and is part of a larger predicted cruciform structure (5BSL3). As confirmed by RNA structure probing, 5BSL3.2 consists of an 8-bp lower helix, a 6-bp upper helix, a 12-base terminal loop, and an 8-base internal loop. Mutational analysis and structure probing were used to explore the importance of these features. Primary sequences in the loops were shown to be important for HCV RNA replication, and the upper helix appears to serve as an essential scaffold that helps maintain the overall RNA structure. Unlike certain picornavirus CREs, whose function is position independent, 5BSL3.2 function appears to be context dependent. Understanding the role of 5BSL3.2 and determining how this new CRE functions in the context of previously identified elements at the 5' and 3' ends of the RNA genome should provide new insights into HCV RNA replication.