scispace - formally typeset
Search or ask a question

Showing papers by "Alistair R. R. Forrest published in 2006"


Journal ArticleDOI
TL;DR: These tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini.
Abstract: Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

1,324 citations


Journal ArticleDOI
TL;DR: This work identifies many novel short proteins, including a “dark matter” subset containing ones that lack detectable homology to other known proteins, and confirms that some of these novel proteins can be translated and localised to the secretory pathway.
Abstract: Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a “dark matter” subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.

217 citations


Journal ArticleDOI
TL;DR: The FANTOM3 annotation system, consisting of automated computational prediction, manualCuration, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
Abstract: The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

193 citations


Journal ArticleDOI
TL;DR: Transcriptional evidence of widespread alternate splicing in the Toll-like receptor signaling pathway is provided from a systematic analysis of the FANTOM3 mouse data set, suggesting a surprisingly common role for variant proteins in diversification/repression of inflammatory signaling.
Abstract: Background: Alternate splicing of key signaling molecules in the Toll-like receptor (Tlr) cascade has been shown to dramatically alter the signaling capacity of inflammatory cells, but it is not known how common this mechanism is. We provide transcriptional evidence of widespread alternate splicing in the Toll-like receptor signaling pathway, derived from a systematic analysis of the FANTOM3 mouse data set. Functional annotation of variant proteins was assessed in light of inflammatory signaling in mouse primary macrophages, and the expression of each variant transcript was assessed by splicing arrays. Results: A total of 256 variant transcripts were identified, including novel variants of Tlr4, Ticam1, Tollip, Rac1, Irak1, 2 and 4, Mapk14/p38, Atf2 and Stat1. The expression of variant transcripts was assessed using custom-designed splicing arrays. We functionally tested the expression of Tlr4 transcripts under a range of cytokine conditions via northern and quantitative real-time polymerase chain reaction. The effects of variant Mapk14/p38 protein expression on macrophage survival were demonstrated. Conclusion: Members of the Toll-like receptor signaling pathway are highly alternatively spliced, producing a large number of novel proteins with the potential to functionally alter inflammatory outcomes. These variants are expressed in primary mouse macrophages in response to inflammatory mediators such as interferon-γ and lipopolysaccharide. Our data suggest a surprisingly common role for variant proteins in diversification/repression of inflammatory signaling.

80 citations


Journal ArticleDOI
TL;DR: A large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways are surveyed, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.
Abstract: The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein- coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo - messenger RNAs ( approximately half of which are transposonassociated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein- coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense- mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non- standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.

67 citations


Journal ArticleDOI
TL;DR: These findings suggest that alternative transcripts of protein kinases and phosphatases are produced that encode different domain structures, and that these variants are likely to play important roles in phosphorylation-dependent signaling pathways.
Abstract: Background: Alternative transcripts of protein kinases and protein phosphatases are known to encode peptides with altered substrate affinities, subcellular localizations, and activities. We undertook a systematic study to catalog the variant transcripts of every protein kinase-like and phosphatase-like locus of mouse http://variant.imb.uq.edu.au. Results: By reviewing all available transcript evidence, we found that at least 75% of kinase and phosphatase loci in mouse generate alternative splice forms, and that 44% of these loci have well supported alternative 5' exons. In a further analysis of full-length cDNAs, we identified 69% of loci as generating more than one peptide isoform. The 1,469 peptide isoforms generated from these loci correspond to 1,080 unique Interpro domain combinations, many of which lack catalytic or interaction domains. We also report on the existence of likely dominant negative forms for many of the receptor kinases and phosphatases, including some 26 secreted decoys (seven known and 19 novel: Alk, Csf1r, Egfr, Epha1, 3, 5,7 and 10, Ephb1, Flt1, Flt3, Insr, Insrr, Kdr, Met, Ptk7, Ptprc, Ptprd, Ptprg, Ptprl, Ptprn, Ptprn2, Ptpro, Ptprr, Ptprs, and Ptprz1) and 13 transmembrane forms (four known and nine novel: Axl, Bmpr1a, Csf1r, Epha4, 5, 6 and 7, Ntrk2, Ntrk3, Pdgfra, Ptprk, Ptprm, Ptpru). Finally, by mining public gene expression data (MPSS and microarrays), we confirmed tissue-specific expression of ten of the novel isoforms. Conclusion: These findings suggest that alternative transcripts of protein kinases and phosphatases are produced that encode different domain structures, and that these variants are likely to play important roles in phosphorylation-dependent signaling pathways.

56 citations


Journal ArticleDOI
01 May 2006-Traffic
TL;DR: This approach combines mining of published literature to identify sub cellular localization data and a high‐throughput, polymerase chain reaction (PCR)‐based approach to experimentally characterize subcellular localization of type II membrane proteins.
Abstract: Application of a computational membrane organization prediction pipeline, MemO, identified putative type II membrane proteins as proteins predicted to encode a single alpha-helical transmembrane domain (TMD) and no signal peptides. MemO was applied to RIKEN's mouse isoform protein set to identify 1436 non-overlapping genomic regions or transcriptional units (TUs), which encode exclusively type II membrane proteins. Proteins with overlapping predicted InterPro and TMDs were reviewed to discard false positive predictions resulting in a dataset comprised of 1831 transcripts in 1408 TUs. This dataset was used to develop a systematic protocol to document subcellular localization of type II membrane proteins. This approach combines mining of published literature to identify subcellular localization data and a high-throughput, polymerase chain reaction (PCR)-based approach to experimentally characterize subcellular localization. These approaches have provided localization data for 244 and 169 proteins. Type II membrane proteins are localized to all major organelle compartments; however, some biases were observed towards the early secretory pathway and punctate structures. Collectively, this study reports the subcellular localization of 26% of the defined dataset. All reported localization data are presented in the LOCATE database (http://www.locate.imb.uq.edu.au).

23 citations


Journal ArticleDOI
TL;DR: Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered.
Abstract: Protein kinases and protein phosphatases are the fundamental components of phosphorylation dependent protein regulatory systems. We have created a database for the protein kinase-like and phosphatase-like loci of mouse http://phosphoreg.imb.uq.edu.au that integrates protein sequence, interaction, classification and pathway information with the results of a systematic screen of their sub-cellular localization and tissue specific expression data mined from the GNF tissue atlas of mouse. The database lets users query where a specific kinase or phosphatase is expressed at both the tissue and sub-cellular levels. Similarly the interface allows the user to query by tissue, pathway or sub-cellular localization, to reveal which components are co-expressed or co-localized. A review of their expression reveals 30% of these components are detected in all tissues tested while 70% show some level of tissue restriction. Hierarchical clustering of the expression data reveals that expression of these genes can be used to separate the samples into tissues of related lineage, including 3 larger clusters of nervous tissue, developing embryo and cells of the immune system. By overlaying the expression, sub-cellular localization and classification data we examine correlations between class, specificity and tissue restriction and show that tyrosine kinases are more generally expressed in fewer tissues than serine/threonine kinases. Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered.

20 citations