scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2008"


Journal ArticleDOI
23 Oct 2008-Nature
TL;DR: The interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated gliobeasts, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
Abstract: Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas ( TCGA) pilot project aims to assess the value of large- scale multi- dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas - the most common type of primary adult brain cancer - and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol- 3- OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.

6,761 citations


Journal ArticleDOI
Li Ding1, Gad Getz2, David A. Wheeler3, Elaine R. Mardis1, Michael D. McLellan1, Kristian Cibulskis2, Carrie Sougnez2, Heidi Greulich4, Heidi Greulich2, Donna M. Muzny3, Margaret Morgan3, Lucinda Fulton1, Robert S. Fulton1, Qunyuan Zhang1, Michael C. Wendl1, Michael S. Lawrence2, David E. Larson1, Ken Chen1, David J. Dooling1, Aniko Sabo3, Alicia Hawes3, Hua Shen3, Shalini N. Jhangiani3, Lora Lewis3, Otis Hall3, Yiming Zhu3, Tittu Mathew3, Yanru Ren3, Jiqiang Yao3, Steven E. Scherer3, Kerstin Clerc3, Ginger A. Metcalf3, Brian Ng3, Aleksandar Milosavljevic3, Manuel L. Gonzalez-Garay3, John R. Osborne1, Rick Meyer1, Xiaoqi Shi1, Yuzhu Tang1, Daniel C. Koboldt1, Ling Lin1, Rachel Abbott1, Tracie L. Miner1, Craig Pohl1, Ginger A. Fewell1, Carrie A. Haipek1, Heather Schmidt1, Brian H. Dunford-Shore1, Aldi T. Kraja1, Seth D. Crosby1, Christopher S. Sawyer1, Tammi L. Vickery1, Sacha N. Sander1, Jody S. Robinson1, Wendy Winckler4, Wendy Winckler2, Jennifer Baldwin2, Lucian R. Chirieac4, Amit Dutt4, Amit Dutt2, Timothy Fennell2, Megan Hanna2, Megan Hanna4, Bruce E. Johnson4, Robert C. Onofrio2, Roman K. Thomas5, Giovanni Tonon4, Barbara A. Weir2, Barbara A. Weir4, Xiaojun Zhao4, Xiaojun Zhao2, Liuda Ziaugra2, Michael C. Zody2, Thomas J. Giordano6, Mark B. Orringer6, Jack A. Roth, Margaret R. Spitz7, Ignacio I. Wistuba, Bradley A. Ozenberger8, Peter J. Good8, Andrew C. Chang6, David G. Beer6, Mark A. Watson1, Marc Ladanyi9, Stephen R. Broderick9, Akihiko Yoshizawa9, William D. Travis9, William Pao9, Michael A. Province1, George M. Weinstock1, Harold E. Varmus9, Stacey Gabriel2, Eric S. Lander2, Richard A. Gibbs3, Matthew Meyerson4, Matthew Meyerson2, Richard K. Wilson1 
23 Oct 2008-Nature
TL;DR: Somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B are found.
Abstract: Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.

2,615 citations


Journal ArticleDOI
06 Jun 2008-Science
TL;DR: A quantitative sequencing-based method is developed for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome, and it is demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed.
Abstract: The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

2,506 citations


Journal ArticleDOI
TL;DR: The most comprehensive list so far of human p53-regulated genes and their experimentally validated, functional binding sites that confer p53 regulation is presented.
Abstract: The p53 protein regulates the transcription of many different genes in response to a wide variety of stress signals. Following DNA damage, p53 regulates key processes, including DNA repair, cell-cycle arrest, senescence and apoptosis, in order to suppress cancer. This Analysis article provides an overview of the current knowledge of p53-regulated genes in these pathways and others, and the mechanisms of their regulation. In addition, we present the most comprehensive list so far of human p53-regulated genes and their experimentally validated, functional binding sites that confer p53 regulation.

1,799 citations


Journal ArticleDOI
TL;DR: This work incorporates several different evidence sources into the gene finder AUGUSTUS, a widely used and essential tool for analyzing newly sequenced genomes and correctly predicts at least one splice form exactly correct in 57% of human genes.
Abstract: Motivation: Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes. Results: We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information. Availability: AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu) Contact: mstanke@gwdg.de Supplementary information: Supplementary data are available at Bioinformatics online.

1,364 citations


Journal ArticleDOI
27 Mar 2008-Nature
TL;DR: An extensive transcriptional network constructed from the human adipose data that exhibits significant overlap with similar network modules constructed from mouse adiposeData was identified that is enriched for genes involved in the inflammatory and immune response and has been found to be causally associated to obesity-related traits.
Abstract: Common human diseases result from the interplay of many genes and environmental factors. Therefore, a more integrative biology approach is needed to unravel the complexity and causes of such diseases. To elucidate the complexity of common human diseases such as obesity, we have analysed the expression of 23,720 transcripts in large population-based blood and adipose tissue cohorts comprehensively assessed for various phenotypes, including traits related to clinical obesity. In contrast to the blood expression profiles, we observed a marked correlation between gene expression in adipose tissue and obesity-related traits. Genome-wide linkage and association mapping revealed a highly significant genetic component to gene expression traits, including a strong genetic effect of proximal (cis) signals, with 50% of the cis signals overlapping between the two tissues profiled. Here we demonstrate an extensive transcriptional network constructed from the human adipose data that exhibits significant overlap with similar network modules constructed from mouse adipose data. A core network module in humans and mice was identified that is enriched for genes involved in the inflammatory and immune response and has been found to be causally associated to obesity-related traits.

1,309 citations


Journal ArticleDOI
TL;DR: The number of well-supported cases of transfer from both prokaryotes and eukaryotes, many with significant functional implications, is now expanding rapidly and major recent trends include the important role of HGT in adaptation to certain specialized niches and the highly variable impact of H GT in different lineages.
Abstract: Horizontal gene transfer (HGT; also known as lateral gene transfer) has had an important role in eukaryotic genome evolution, but its importance is often overshadowed by the greater prevalence and our more advanced understanding of gene transfer in prokaryotes. Recurrent endosymbioses and the generally poor sampling of most nuclear genes from diverse lineages have also complicated the search for transferred genes. Nevertheless, the number of well-supported cases of transfer from both prokaryotes and eukaryotes, many with significant functional implications, is now expanding rapidly. Major recent trends include the important role of HGT in adaptation to certain specialized niches and the highly variable impact of HGT in different lineages.

1,185 citations


Journal ArticleDOI
29 Feb 2008-Science
TL;DR: The methods described here will be generally useful for constructing large DNA molecules from chemically synthesized pieces and also from combinations of natural and synthetic DNA segments.
Abstract: We have synthesized a 582,970 bp Mycoplasma genitalium genome. This synthetic genome, named M. genitalium JCVI-1.0, contains all the genes of wild-type M. genitalium G37 except MG408, which was disrupted by an antibiotic marker to block pathogenicity and to allow for selection. To identify the genome as synthetic, we inserted “watermarks” at intergenic sites known to tolerate transposon insertions. Overlapping “cassettes” of 5 to 7 kb, assembled from chemically synthesized oligonucleotides, were joined by in vitro recombination to produce intermediate assemblies of approximately 24 kb, 72 kb (“1/8 genome”), and 144 kb (“1/4 genome”), which were all cloned as bacterial artificial chromosomes (BACs) in Escherichia coli. Most of these intermediate clones were sequenced, and clones of all four 1/4 genomes with the correct sequence were identified. The complete synthetic genome was assembled by transformationassociated recombination (TAR) cloning in the yeast Saccharomyces cerevisiae, then isolated and sequenced. A clone with the correct sequence was identified. The methods described here will be generally useful for constructing large DNA molecules from chemically synthesized pieces and also from combinations of natural and synthetic DNA segments. M. genitalium is a bacterium with the smallest genome of any independently replicating cell that has been grown in pure

1,139 citations


Journal ArticleDOI
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

1,093 citations


Journal ArticleDOI
24 Apr 2008-Nature
TL;DR: Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica’s distinguishing morpho-physiological, medicinal and nutritional properties.
Abstract: Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3x draft genome sequence of 'SunUp' papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.

1,028 citations


Journal ArticleDOI
18 Apr 2008-Science
TL;DR: It is found that 97% of gene deletions exhibited a measurable growth phenotype, suggesting that nearly all genes are essential for optimal growth in at least one condition.
Abstract: Genetics aims to understand the relation between genotype and phenotype. However, because complete deletion of most yeast genes (∼80%) has no obvious phenotypic consequence in rich medium, it is difficult to study their functions. To uncover phenotypes for this nonessential fraction of the genome, we performed 1144 chemical genomic assays on the yeast whole-genome heterozygous and homozygous deletion collections and quantified the growth fitness of each deletion strain in the presence of chemical or environmental stress conditions. We found that 97% of gene deletions exhibited a measurable growth phenotype, suggesting that nearly all genes are essential for optimal growth in at least one condition.

Journal ArticleDOI
26 Jun 2008-Nature
TL;DR: High-throughput sequencing of complementary DNAs (RNA-Seq) and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation.
Abstract: Until recently, it was thought that much of a genome sequence is silent for much of the time. Now a study in the fission yeast Schizosaccharomyces pombe, using recently developed DNA sequencing technologies, shows that almost all of the yeast genome is genetically active. More than 90% of the genome is transcribed into RNA, including more than 450 newly discovered transcripts, many of them non-coding, with regulatory or other unknown roles. Using recently developed DNA sequencing technologies, nucleic acid transcripts are characterized in unprecedented detail from the yeast Schizosaccharomyces pombe. The sequences definitively demonstrate that 90% of more of the genome is transcribed into RNA, and show a previously unseen link between transcription and splicing efficiency at different points in the cell's growth. Recent data from several organisms indicate that the transcribed portions of genomes are larger and more complex than expected, and that many functional properties of transcripts are based not on coding sequences but on regulatory sequences in untranslated regions or non-coding RNAs1,2,3,4,5,6,7,8,9. Alternative start and polyadenylation sites and regulation of intron splicing add additional dimensions to the rich transcriptional output10,11. This transcriptional complexity has been sampled mainly using hybridization-based methods under one or few experimental conditions. Here we applied direct high-throughput sequencing of complementary DNAs (RNA-Seq), supplemented with data from high-density tiling arrays, to globally sample transcripts of the fission yeast Schizosaccharomyces pombe, independently from available gene annotations. We interrogated transcriptomes under multiple conditions, including rapid proliferation, meiotic differentiation and environmental stress, as well as in RNA processing mutants to reveal the dynamic plasticity of the transcriptional landscape as a function of environmental, developmental and genetic factors. High-throughput sequencing proved to be a powerful and quantitative method to sample transcriptomes deeply at maximal resolution. In contrast to hybridization, sequencing showed little, if any, background noise and was sensitive enough to detect widespread transcription in >90% of the genome, including traces of RNAs that were not robustly transcribed or rapidly degraded. The combined sequencing and strand-specific array data provide rich condition-specific information on novel, mostly non-coding transcripts, untranslated regions and gene structures, thus improving the existing genome annotation. Sequence reads spanning exon–exon or exon–intron junctions give unique insight into a surprising variability in splicing efficiency across introns, genes and conditions. Splicing efficiency was largely coordinated with transcript levels, and increased transcription led to increased splicing in test genes. Hundreds of introns showed such regulated splicing during cellular proliferation or differentiation.

Journal ArticleDOI
TL;DR: Finite cycles of TCR at naturally occurring non-canonical DNA structures might contribute to genomic instability and genetic disease.
Abstract: Expressed genes are scanned by translocating RNA polymerases, which sensitively detect DNA damage and initiate transcription-coupled repair (TCR), a subpathway of nucleotide excision repair that removes lesions from the template DNA strands of actively transcribed genes. Human hereditary diseases that present a deficiency only in TCR are characterized by sunlight sensitivity without enhanced skin cancer. Although multiple gene products are implicated in TCR, we still lack an understanding of the precise signals that can trigger this pathway. Futile cycles of TCR at naturally occurring non-canonical DNA structures might contribute to genomic instability and genetic disease.

Journal ArticleDOI
03 Jul 2008-Nature
TL;DR: It is suggested that signal-induced ncRNAs localized to regulatory regions of transcription units can act cooperatively as selective ligands, recruiting and modulating the activities of distinct classes of RNA-binding co-regulators in response to specific signals, providing an unexpected ncRNA/RNA-binding protein-based strategy to integrate transcriptional programmes.
Abstract: With the recent recognition of non-coding RNAs (ncRNAs) flanking many genes, a central issue is to obtain a full understanding of their potential roles in regulated gene transcription programmes, possibly through different mechanisms. Here we show that an RNA-binding protein, TLS (for translocated in liposarcoma), serves as a key transcriptional regulatory sensor of DNA damage signals that, on the basis of its allosteric modulation by RNA, specifically binds to and inhibits CREB-binding protein (CBP) and p300 histone acetyltransferase activities on a repressed gene target, cyclin D1 (CCND1) in human cell lines. Recruitment of TLS to the CCND1 promoter to cause gene-specific repression is directed by single-stranded, low-copy-number ncRNA transcripts tethered to the 5' regulatory regions of CCND1 that are induced in response to DNA damage signals. Our data suggest that signal-induced ncRNAs localized to regulatory regions of transcription units can act cooperatively as selective ligands, recruiting and modulating the activities of distinct classes of RNA-binding co-regulators in response to specific signals, providing an unexpected ncRNA/RNA-binding protein-based strategy to integrate transcriptional programmes.

Journal ArticleDOI
25 Apr 2008-Science
TL;DR: New genome sequences and improved analytical approaches are clarifying angiosperm evolution and revealing patterns of differential gene loss after genome duplication and differential gene retention associated with evolution of some morphological complexity.
Abstract: Correlated gene arrangements among taxa provide a valuable framework for inference of shared ancestry of genes and for the utilization of findings from model organisms to study less-well-understood systems. In angiosperms, comparisons of gene arrangements are complicated by recurring polyploidy and extensive genome rearrangement. New genome sequences and improved analytical approaches are clarifying angiosperm evolution and revealing patterns of differential gene loss after genome duplication and differential gene retention associated with evolution of some morphological complexity. Because of variability in DNA substitution rates among taxa and genes, deviation from collinearity might be a more reliable phylogenetic character.

Journal ArticleDOI
27 Mar 2008-Nature
TL;DR: Application of this method to liver and adipose gene expression data generated from a segregating mouse population results in the identification of a macrophage-enriched network supported as having a causal relationship with disease traits associated with metabolic syndrome.
Abstract: Identifying variations in DNA that increase susceptibility to disease is one of the primary aims of genetic studies using a forward genetics approach. However, identification of disease-susceptibility genes by means of such studies provides limited functional information on how genes lead to disease. In fact, in most cases there is an absence of functional information altogether, preventing a definitive identification of the susceptibility gene or genes. Here we develop an alternative to the classic forward genetics approach for dissecting complex disease traits where, instead of identifying susceptibility genes directly affected by variations in DNA, we identify gene networks that are perturbed by susceptibility loci and that in turn lead to disease. Application of this method to liver and adipose gene expression data generated from a segregating mouse population results in the identification of a macrophage-enriched network supported as having a causal relationship with disease traits associated with metabolic syndrome. Three genes in this network, lipoprotein lipase (Lpl), lactamase β (Lactb) and protein phosphatase 1-like (Ppm1l), are validated as previously unknown obesity genes, strengthening the association between this network and metabolic disease traits. Our analysis provides direct experimental support that complex traits such as obesity are emergent properties of molecular networks that are modulated by complex genetic loci and environmental factors. Complex human diseases result from the interplay of many genetic and environmental factors. To build up a picture of the factors contributing to one such disease, obesity, gene expression was evaluated as a quantitative trait in blood and adipose tissue samples from hundreds of Icelandic subjects aged 18 to 85. The results reveal a tendency to certain characteristic patterns of gene activation in the fatty tissues — though to a much lesser extent in the blood — of people with a higher body mass index. A transcriptional network constructed from the adipose tissue data has significant overlap with a network based on mouse adipose tissue data. Experimental support for the idea that complex diseases are emergent properties of molecular networks influenced by genes and environment comes from a study in mice. Mice were examined for disturbances in genetic expression networks that correlate with metabolic traits associated with obesity, diabetes and atherosclerosis. Three genes — Lpl, Lactb and Ppm1l — were identified as previously unknown obesity genes. This 'molecular network' approach raises the prospect that therapies might be directed at whole 'disease networks', rather than at one or two specific genes. Standard approaches to identify the genetic changes that lead to disease are reversed by examination of genetic networks for perturbations that are associated with disease states, and following up candidate genes from there. This begins with three genes in mice that lead to obesity when mutated, demonstrating that complex genetic–environmental traits can be dissected with this new approach.

Journal ArticleDOI
19 Dec 2008-Science
TL;DR: Evidence of widespread divergent transcription at protein-encoding gene promoters is presented and it is suggested that Divergent transcription over short distances is common for active promoters and may help promoter regions maintain a state poised for subsequent regulation.
Abstract: Transcription initiation by RNA polymerase II (RNAPII) is thought to occur unidirectionally from most genes. Here, we present evidence of widespread divergent transcription at protein-encoding gene promoters. Transcription start site-associated RNAs (TSSa-RNAs) nonrandomly flank active promoters, with peaks of antisense and sense short RNAs at 250 nucleotides upstream and 50 nucleotides downstream of TSSs, respectively. Northern analysis shows that TSSa-RNAs are subsets of an RNA population 20 to 90 nucleotides in length. Promoter-associated RNAPII and H3K4-trimethylated histones, transcription initiation hallmarks, colocalize at sense and antisense TSSa-RNA positions; however, H3K79-dimethylated histones, characteristic of elongating RNAPII, are only present downstream of TSSs. These results suggest that divergent transcription over short distances is common for active promoters and may help promoter regions maintain a state poised for subsequent regulation.

Journal ArticleDOI
17 Jan 2008-Nature
TL;DR: It is found that partial loss of function of the ribosomal subunit protein RPS14 phenocopies the disease in normal haematopoietic progenitor cells, and also that forced expression of RPS 14 rescues the disease phenotype in patient-derived bone marrow cells.
Abstract: Somatic chromosomal deletions in cancer are thought to indicate the location of tumour suppressor genes, by which a complete loss of gene function occurs through biallelic deletion, point mutation or epigenetic silencing, thus fulfilling Knudson's two-hit hypothesis. In many recurrent deletions, however, such biallelic inactivation has not been found. One prominent example is the 5q- syndrome, a subtype of myelodysplastic syndrome characterized by a defect in erythroid differentiation. Here we describe an RNA-mediated interference (RNAi)-based approach to discovery of the 5q- disease gene. We found that partial loss of function of the ribosomal subunit protein RPS14 phenocopies the disease in normal haematopoietic progenitor cells, and also that forced expression of RPS14 rescues the disease phenotype in patient-derived bone marrow cells. In addition, we identified a block in the processing of pre-ribosomal RNA in RPS14-deficient cells that is functionally equivalent to the defect in Diamond-Blackfan anaemia, linking the molecular pathophysiology of the 5q- syndrome to a congenital syndrome causing bone marrow failure. These results indicate that the 5q- syndrome is caused by a defect in ribosomal protein function and suggest that RNAi screening is an effective strategy for identifying causal haploinsufficiency disease genes.

Journal ArticleDOI
TL;DR: In this paper, DNA methylation and Piwi-interacting small RNA (piRNA) expression were analyzed in wild-type, MILI-null, and MIWI2-null male fetal germ cells.
Abstract: Silencing of transposable elements occurs during fetal gametogenesis in males via de novo DNA methylation of their regulatory regions. The loss of MILI (miwi-like) and MIWI2 (mouse piwi 2), two mouse homologs of Drosophila Piwi, activates retrotransposon gene expression by impairing DNA methylation in the regulatory regions of the retrotransposons. However, as it is unclear whether the defective DNA methylation in the mutants is due to the impairment of de novo DNA methylation, we analyze DNA methylation and Piwi-interacting small RNA (piRNA) expression in wild-type, MILI-null, and MIWI2-null male fetal germ cells. We reveal that defective DNA methylation of the regulatory regions of the Line-1 (long interspersed nuclear elements) and IAP (intracisternal A particle) retrotransposons in the MILI-null and MIWI2-null male germ cells takes place at the level of de novo methylation. Comprehensive analysis shows that the piRNAs of fetal germ cells are distinct from those previously identified in neonatal and adult germ cells. The expression of piRNAs is reduced under MILI- and MIWI2-null conditions in fetal germ cells, although the extent of the reduction differs significantly between the two mutants. Our data strongly suggest that MILI and MIWI2 play essential roles in establishing de novo DNA methylation of retrotransposons in fetal male germ cells.

Journal ArticleDOI
28 Mar 2008-Science
TL;DR: Common principles of transcription factor– and microRNA-mediated gene regulatory events are reviewed and conceptual differences in how these factors control gene expression are discussed.
Abstract: The properties of a cell are determined by the genetic information encoded in its genome. Understanding how such information is differentially and dynamically retrieved to define distinct cell types and cellular states is a major challenge facing molecular biology. Gene regulatory factors that control the expression of genomic information come in a variety of flavors, with transcription factors and microRNAs representing the most numerous gene regulatory factors in multicellular genomes. Here, I review common principles of transcription factor- and microRNA-mediated gene regulatory events and discuss conceptual differences in how these factors control gene expression.

Journal ArticleDOI
TL;DR: A genome-wide analysis of LEA proteins and their encoding genes in Arabidopsis thaliana indicates a wide range of sequence diversity, intracellular localizations, and expression patterns and indicates that they confer an evolutionary advantage for an organism under varying stressful environmental conditions.
Abstract: LEA (late embryogenesis abundant) proteins have first been described about 25 years ago as accumulating late in plant seed development. They were later found in vegetative plant tissues following environmental stress and also in desiccation tolerant bacteria and invertebrates. Although they are widely assumed to play crucial roles in cellular dehydration tolerance, their physiological and biochemical functions are largely unknown. We present a genome-wide analysis of LEA proteins and their encoding genes in Arabidopsis thaliana. We identified 51 LEA protein encoding genes in the Arabidopsis genome that could be classified into nine distinct groups. Expression studies were performed on all genes at different developmental stages, in different plant organs and under different stress and hormone treatments using quantitative RT-PCR. We found evidence of expression for all 51 genes. There was only little overlap between genes expressed in vegetative tissues and in seeds and expression levels were generally higher in seeds. Most genes encoding LEA proteins had abscisic acid response (ABRE) and/or low temperature response (LTRE) elements in their promoters and many genes containing the respective promoter elements were induced by abscisic acid, cold or drought. We also found that 33% of all Arabidopsis LEA protein encoding genes are arranged in tandem repeats and that 43% are part of homeologous pairs. The majority of LEA proteins were predicted to be highly hydrophilic and natively unstructured, but some were predicted to be folded. The analyses indicate a wide range of sequence diversity, intracellular localizations, and expression patterns. The high fraction of retained duplicate genes and the inferred functional diversification indicate that they confer an evolutionary advantage for an organism under varying stressful environmental conditions. This comprehensive analysis will be an important starting point for future efforts to elucidate the functional role of these enigmatic proteins.

Journal ArticleDOI
TL;DR: In this article, the authors conducted a genome-scale siRNA screen, revealing more than 311 host factors, including 267 that were not previously linked to HIV, and found that there was little overlap between these genes and the HIV dependency factors described recently.

Journal ArticleDOI
TL;DR: Global analysis of expressed genes in a naturally occurring microbial community revealed not only indigenous gene- and taxon-specific expression patterns but also gene categories undetected in previous DNA-based metagenomic surveys.
Abstract: Metagenomics is expanding our knowledge of the gene content, functional significance, and genetic variability in natural microbial communities. Still, there exists limited information concerning the regulation and dynamics of genes in the environment. We report here global analysis of expressed genes in a naturally occurring microbial community. We first adapted RNA amplification technologies to produce large amounts of cDNA from small quantities of total microbial community RNA. The fidelity of the RNA amplification procedure was validated with Prochlorococcus cultures and then applied to a microbial assemblage collected in the oligotrophic Pacific Ocean. Microbial community cDNAs were analyzed by pyrosequencing and compared with microbial community genomic DNA sequences determined from the same sample. Pyrosequencing-based estimates of microbial community gene expression compared favorably to independent assessments of individual gene expression using quantitative PCR. Genes associated with key metabolic pathways in open ocean microbial species—including genes involved in photosynthesis, carbon fixation, and nitrogen acquisition—and a number of genes encoding hypothetical proteins were highly represented in the cDNA pool. Genes present in the variable regions of Prochlorococcus genomes were among the most highly expressed, suggesting these encode proteins central to cellular processes in specific genotypes. Although many transcripts detected were highly similar to genes previously detected in ocean metagenomic surveys, a significant fraction (≈50%) were unique. Thus, microbial community transcriptomic analyses revealed not only indigenous gene- and taxon-specific expression patterns but also gene categories undetected in previous DNA-based metagenomic surveys.

Journal ArticleDOI
28 Nov 2008-Cell
TL;DR: It is shown that mammalian Sir2, SIRT1, represses repetitive DNA and a functionally diverse set of genes across the mouse genome and DNA damage-induced redistribution of SIRT 1 and other chromatin-modifying proteins may be a conserved mechanism of aging in eukaryotes.

Journal ArticleDOI
TL;DR: Combining single-transcript measurements with computational modeling indicates that low expression variation is achieved by transcribing genes using single transcription-initiation events that are clearly separated in time, rather than by transcriptional bursts.
Abstract: Understanding the kinetics of gene expression involves accurate quantitation of gene expression. This is now undertaken by quantifying nascent-RNA levels and relating this indication of transcriptional activity to mRNA abundance in single yeast cells. Combining these measurements with computational modeling indicates that the tested yeast housekeeping genes are probably expressed through single initiation events, whereas a SAGA-transcribed gene shows behavior consistent with transcriptional bursting. Proper execution of transcriptional programs is a key requirement of gene expression regulation, demanding accurate control of timing and amplitude. How precisely the transcription machinery fulfills this task is not known. Using an in situ hybridization approach that detects single mRNA molecules, we measured mRNA abundance and transcriptional activity within single Saccharomyces cerevisiae cells. We found that expression levels for particular genes are higher than initially reported and can vary substantially among cells. However, variability for most constitutively expressed genes is unexpectedly small. Combining single-transcript measurements with computational modeling indicates that low expression variation is achieved by transcribing genes using single transcription-initiation events that are clearly separated in time, rather than by transcriptional bursts. In contrast, PDR5, a gene regulated by the transcription coactivator complex SAGA, is expressed using transcription bursts, resulting in larger variation. These data directly demonstrate the existence of multiple expression modes used to modulate the transcriptome.

Journal ArticleDOI
TL;DR: This work reports the discovery of a short ORF embedded within the P3 cistron of the polyprotein but translated in the +2 reading-frame, which suggests that other short overlapping genes may remain hidden even in well studied virus genomes and demonstrates the utility of the software package MLOGD as a tool for identifying such genes.
Abstract: The family Potyviridae includes >30% of known plant virus species, many of which are of great agricultural significance. These viruses have a positive sense RNA genome that is ≈10 kb long and contains a single long ORF. The ORF is translated into a large polyprotein, which is cleaved into ≈10 mature proteins. We report the discovery of a short ORF embedded within the P3 cistron of the polyprotein but translated in the +2 reading-frame. The ORF, termed pipo, is conserved and has a strong bioinformatic coding signature throughout the large and diverse Potyviridae family. Mutations that knock out expression of the PIPO protein in Turnip mosaic potyvirus but leave the polyprotein amino acid sequence unaltered are lethal to the virus. Immunoblotting with antisera raised against two nonoverlapping 14-aa antigens, derived from the PIPO amino acid sequence, reveals the expression of an ≈25-kDa PIPO fusion product in planta. This is consistent with expression of PIPO as a P3-PIPO fusion product via ribosomal frameshifting or transcriptional slippage at a highly conserved G1-2A6-7 motif at the 5′ end of pipo. This discovery suggests that other short overlapping genes may remain hidden even in well studied virus genomes (as well as cellular organisms) and demonstrates the utility of the software package MLOGD as a tool for identifying such genes.

Journal ArticleDOI
TL;DR: A gene expression atlas is generated that provides a global view of gene expression in all major organ systems of this species, with special emphasis on nodule and seed development, and indicates that phylogenetic analysis alone is insufficient to predict the function of orthologs in different species.
Abstract: Legumes played central roles in the development of agriculture and civilization, and today account for approximately one-third of the world's primary crop production. Unfortunately, most cultivated legumes are poor model systems for genomic research. Therefore, Medicago truncatula, which has a relatively small diploid genome, has been adopted as a model species for legume genomics. To enhance its value as a model, we have generated a gene expression atlas that provides a global view of gene expression in all major organ systems of this species, with special emphasis on nodule and seed development. The atlas reveals massive differences in gene expression between organs that are accompanied by changes in the expression of key regulatory genes, such as transcription factor genes, which presumably orchestrate genetic reprogramming during development and differentiation. Interestingly, many legume-specific genes are preferentially expressed in nitrogen-fixing nodules, indicating that evolution endowed them with special roles in this unique and important organ. Comparative transcriptome analysis of Medicago versus Arabidopsis revealed significant divergence in developmental expression profiles of orthologous genes, which indicates that phylogenetic analysis alone is insufficient to predict the function of orthologs in different species. The data presented here represent an unparalleled resource for legume functional genomics, which will accelerate discoveries in legume biology.

Journal ArticleDOI
TL;DR: It is proposed that global transcription is a hallmark of pluripotent ESCs, contributing to their plasticity, and that lineage specification is driven by reduction of the transcribed portion of the genome.

Journal ArticleDOI
TL;DR: The isolation and functional analysis of the rice GIF1 (GRAIN INCOMPLETE Filling 1) gene that encodes a cell-wall invertase required for carbon partitioning during early grain-filling suggest that GIF1 is a potential domestication gene and that such a domestication-selected gene can be used for further crop improvement.
Abstract: Grain-filling, an important trait that contributes greatly to grain weight, is regulated by quantitative trait loci and is associated with crop domestication syndrome. However, the genes and underlying molecular mechanisms controlling crop grain-filling remain elusive. Here we report the isolation and functional analysis of the rice GIF1 (GRAIN INCOMPLETE FILLING 1) gene that encodes a cell-wall invertase required for carbon partitioning during early grain-filling. The cultivated GIF1 gene shows a restricted expression pattern during grain-filling compared to the wild rice allele, probably a result of accumulated mutations in the gene's regulatory sequence through domestication. Fine mapping with introgression lines revealed that the wild rice GIF1 is responsible for grain weight reduction. Ectopic expression of the cultivated GIF1 gene with the 35S or rice Waxy promoter resulted in smaller grains, whereas overexpression of GIF1 driven by its native promoter increased grain production. These findings, together with the domestication signature that we identified by comparing nucleotide diversity of the GIF1 loci between cultivated and wild rice, strongly suggest that GIF1 is a potential domestication gene and that such a domestication-selected gene can be used for further crop improvement.

Journal ArticleDOI
14 Nov 2008-Cell
TL;DR: Arabidopsis RNA polymerase IVb/Pol V, a multisubunit nuclear enzyme required for siRNA-mediated gene silencing of transposons and other repeats, transcribes intergenic and noncoding sequences, thereby facilitating heterochromatin formation andsilencing of overlapping and adjacent genes.