Showing papers on "Gene published in 2011"
••
TL;DR: It is reported that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1,BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes.
Abstract: A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients' lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology.
5,878 citations
••
TL;DR: Using a quantitative model, the first genome-scale prediction of synthesis rates of mRNAs and proteins is obtained and it is found that the cellular abundance of proteins is predominantly controlled at the level of translation.
Abstract: Gene expression is a multistep process that involves the transcription, translation and turnover of messenger RNAs and proteins. Although it is one of the most fundamental processes of life, the entire cascade has never been quantified on a genome-wide scale. Here we simultaneously measured absolute mRNA and protein abundance and turnover by parallel metabolic pulse labelling for more than 5,000 genes in mammalian cells. Whereas mRNA and protein levels correlated better than previously thought, corresponding half-lives showed no correlation. Using a quantitative model we have obtained the first genome-scale prediction of synthesis rates of mRNAs and proteins. We find that the cellular abundance of proteins is predominantly controlled at the level of translation. Genes with similar combinations of mRNA and protein stability shared functional properties, indicating that half-lives evolved under energetic and dynamic constraints. Quantitative information about all stages of gene expression provides a rich resource and helps to provide a greater understanding of the underlying design principles.
5,635 citations
••
Beijing Institute of Genomics1, Cayetano Heredia University2, Indian Council of Agricultural Research3, Russian Academy of Sciences4, University of Dundee5, Huazhong Agricultural University6, Hunan Agricultural University7, Imperial College London8, Polish Academy of Sciences9, International Potato Center10, J. Craig Venter Institute11, National University of La Plata12, Michigan State University13, James Hutton Institute14, Teagasc15, Plant & Food Research16, Aalborg University17, University of Wisconsin-Madison18, Virginia Tech19, Wageningen University and Research Centre20
TL;DR: The potato genome sequence provides a platform for genetic improvement of this vital crop and predicts 39,031 protein-coding genes and presents evidence for at least two genome duplication events indicative of a palaeopolyploid origin.
Abstract: Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.
1,813 citations
••
Civil Aviation Authority of Singapore1, Rothamsted Research2, Beijing Institute of Genomics3, University of Copenhagen4, Rural Development Administration5, John Innes Centre6, University of Georgia7, North China University of Science and Technology8, University of California, Berkeley9, University of Missouri10, University of Queensland11, Australian Research Council12, National Research Council13, Bielefeld University14, Australian Centre for Plant Functional Genomics15, University of Rennes16, Wageningen University and Research Centre17, Agriculture and Agri-Food Canada18, Huazhong Agricultural University19, French Alternative Energies and Atomic Energy Commission20, Chungnam National University21, Norwich Research Park22
TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
1,811 citations
••
TL;DR: OTTIP RNA binds the adaptor protein WDR5 directly and targets WDR 5/MLL complexes across HOXA, driving histone H3 lysine 4 trimethylation and gene transcription.
Abstract: A major question in developmental biology is how functionally related groups of genes are switched on at the right time and in the right place. Long intergenic non-coding RNAs (lincRNAs) have been implicated in both gene silencing and activation, and could be a means of long-range control of gene expression. A lincRNA termed HOTTIP that coordinates the activation of multiple 5' HOXA regulatory genes has now been identified at the 5' tip of the HOXA locus. Chromosomal looping brings HOTTIP close its target genes, where it facilitates histone H3 lysine 4 trimethylation and gene transcription. Long intergenic non-coding RNAs (lincRNAs) have been implicated in both gene silencing and activation, and could be a means for long-range control of gene expression. Here a lincRNA termed HOTTIP is identified at the 5′ tip of the HOXA locus that coordinates the activation of multiple 5′ HOXA genes. Chromosomal looping brings HOTTIP into the proximity of its target genes, where it seems to be required to facilitate histone H3 lysine 4 trimethylation and gene transcription. The genome is extensively transcribed into long intergenic noncoding RNAs (lincRNAs), many of which are implicated in gene silencing1,2. Potential roles of lincRNAs in gene activation are much less understood3,4,5. Development and homeostasis require coordinate regulation of neighbouring genes through a process termed locus control6. Some locus control elements and enhancers transcribe lincRNAs7,8,9,10, hinting at possible roles in long-range control. In vertebrates, 39 Hox genes, encoding homeodomain transcription factors critical for positional identity, are clustered in four chromosomal loci; the Hox genes are expressed in nested anterior-posterior and proximal-distal patterns colinear with their genomic position from 3′ to 5′of the cluster11. Here we identify HOTTIP, a lincRNA transcribed from the 5′ tip of the HOXA locus that coordinates the activation of several 5′ HOXA genes in vivo. Chromosomal looping brings HOTTIP into close proximity to its target genes. HOTTIP RNA binds the adaptor protein WDR5 directly and targets WDR5/MLL complexes across HOXA, driving histone H3 lysine 4 trimethylation and gene transcription. Induced proximity is necessary and sufficient for HOTTIP RNA activation of its target genes. Thus, by serving as key intermediates that transmit information from higher order chromosomal looping into chromatin modifications, lincRNAs may organize chromatin domains to coordinate long-range gene activation.
1,782 citations
••
TL;DR: High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.
Abstract: Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.
1,538 citations
••
TL;DR: These sequences provide a starting point for a new era in the functional analysis of a key model organism and show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus.
Abstract: We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
1,453 citations
••
University of Connecticut Health Center1, University of California, Berkeley2, Lawrence Berkeley National Laboratory3, National Institutes of Health4, Washington University in St. Louis5, Indiana University6, Cold Spring Harbor Laboratory7, Amgen8, Life Technologies9, University of Kansas10, Stowers Institute for Medical Research11, University of California, Santa Cruz12, Howard Hughes Medical Institute13, Affymetrix14
TL;DR: 111,195 new elements are identified, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches.
Abstract: Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
1,427 citations
••
Indiana University1, Utah State University2, University of Notre Dame3, University of New Hampshire4, University of California, Santa Barbara5, University of Tokyo6, United States Department of Energy7, Ludwig Maximilian University of Munich8, J. Craig Venter Institute9, National Institutes of Health10, University of Illinois at Urbana–Champaign11, Hebrew University of Jerusalem12, University of North Texas13, Harvard University14, University of Geneva15, Research Institute of Molecular Pathology16, Oregon State University17, Utrecht University18, University of California, Davis19, Hoffmann-La Roche20, University of Iowa21, University of Strasbourg22, University of Washington23, University of Texas at Arlington24, University of California, Santa Cruz25, Life Technologies26, New York University27, University of Guelph28, Imperial College London29, University of California, Berkeley30
TL;DR: The Daphnia genome reveals a multitude of genes and shows adaptation through gene family expansions, and the coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random.
Abstract: We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.
1,204 citations
••
TL;DR: ChIRP-seq of three lncRNAs reveal that RNA occupancy sites in the genome are focal, sequence-specific, and numerous, and generally applicable to illuminate the intersection of RNA and chromatin with newfound precision genome wide.
1,095 citations
••
TL;DR: Global gene expression analysis demonstrated that exogenous IRF5 upregulated or downregulated expression of established phenotypic markers of M1 or M2 macrophages, respectively, suggesting a critical role for IRf5 in M1 macrophage polarization and defining a previously unknown function forIRF5 as a transcriptional repressor.
Abstract: Polymorphisms in the gene encoding the transcription factor IRF5 that lead to higher mRNA expression are associated with many autoimmune diseases. Here we show that IRF5 expression in macrophages was reversibly induced by inflammatory stimuli and contributed to the plasticity of macrophage polarization. High expression of IRF5 was characteristic of M1 macrophages, in which it directly activated transcription of the genes encoding interleukin 12 subunit p40 (IL-12p40), IL-12p35 and IL-23p19 and repressed the gene encoding IL-10. Consequently, those macrophages set up the environment for a potent T helper type 1 (T(H)1)-T(H)17 response. Global gene expression analysis demonstrated that exogenous IRF5 upregulated or downregulated expression of established phenotypic markers of M1 or M2 macrophages, respectively. Our data suggest a critical role for IRF5 in M1 macrophage polarization and define a previously unknown function for IRF5 as a transcriptional repressor.
••
TL;DR: It is proposed that TET1 fine-tunes transcription, opposes aberrant DNA methylation at CpG-rich sequences and thereby contributes to the regulation ofDNA methylation fidelity.
Abstract: Enzymes catalysing the methylation of the 5-position of cytosine (mC) have essential roles in regulating gene expression and maintaining cellular identity. Recently, TET1 was found to hydroxylate the methyl group of mC, converting it to 5-hydroxymethyl cytosine (hmC). Here we show that TET1 binds throughout the genome of embryonic stem cells, with the majority of binding sites located at transcription start sites (TSSs) of CpG-rich promoters and within genes. The hmC modification is found in gene bodies and in contrast to mC is also enriched at CpG-rich TSSs. We provide evidence further that TET1 has a role in transcriptional repression. TET1 binds a significant proportion of Polycomb group target genes. Furthermore, TET1 associates and colocalizes with the SIN3A co-repressor complex. We propose that TET1 fine-tunes transcription, opposes aberrant DNA methylation at CpG-rich sequences and thereby contributes to the regulation of DNA methylation fidelity.
••
TL;DR: In this article, an ultra-high-density array that tiles the promoters of 56 cell-cycle genes was used to interrogate 108 samples representing diverse perturbations, identifying 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle.
Abstract: Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.
••
TL;DR: Comprising tandem, polymorphic amino acid repeats that individually specify contiguous nucleotides in DNA, this domain is being deployed in DNA targeting for applications ranging from understanding gene function in model organisms to improving traits in crop plants to treating genetic disorders in people.
Abstract: Generating and applying new knowledge from the wealth of available genomic information is hindered, in part, by the difficulty of altering nucleotide sequences and expression of genes in living cells in a targeted fashion. Progress has been made in engineering DNA binding domains to direct proteins to particular sequences for mutagenesis or manipulation of transcription; however, achieving the requisite specificities has been challenging. Transcription activator-like (TAL) effectors of plant pathogenic bacteria contain a modular DNA binding domain that appears to overcome this challenge. Comprising tandem, polymorphic amino acid repeats that individually specify contiguous nucleotides in DNA, this domain is being deployed in DNA targeting for applications ranging from understanding gene function in model organisms to improving traits in crop plants to treating genetic disorders in people.
••
TL;DR: This method uses the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC, a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types.
Abstract: In contrast to 5-methylcytosine (5-mC), which has been studied extensively, little is known about 5-hydroxymethylcytosine (5-hmC), a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types. Here we present a method for determining the genome-wide distribution of 5-hmC. We use the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC. The azide group can be chemically modified with biotin for detection, affinity enrichment and sequencing of 5-hmC-containing DNA fragments in mammalian genomes. Using this method, we demonstrate that 5-hmC is present in human cell lines beyond those previously recognized. We also find a gene expression level-dependent enrichment of intragenic 5-hmC in mouse cerebellum and an age-dependent acquisition of this modification in specific gene bodies linked to neurodegenerative disorders.
••
TL;DR: By combining next-generation sequencing and copy number analysis, it is shown that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case and novel dysregulated pathways underlying its pathogenesis are identified.
Abstract: Diffuse large B-cell lymphoma (DLBCL) is the most common form of human lymphoma. Although a number of structural alterations have been associated with the pathogenesis of this malignancy, the full spectrum of genetic lesions that are present in the DLBCL genome, and therefore the identity of dysregulated cellular pathways, remains unknown. By combining next-generation sequencing and copy number analysis, we show that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case. This analysis also revealed mutations in genes not previously implicated in DLBCL pathogenesis, including those regulating chromatin methylation (MLL2; 24% of samples) and immune recognition by T cells. These results provide initial data on the complexity of the DLBCL coding genome and identify novel dysregulated pathways underlying its pathogenesis.
01 Jun 2011
TL;DR: This work uses an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations and identifies 216 transcribed regions that encode putative lncRNAs, many with RT-PCR–validated periodic expression during the cell cycle.
Abstract: Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.
••
TL;DR: This work provides the most comprehensive genetic characterization of a sterol catabolic pathway to date, suggests putative roles for uncharacterized virulence genes, and precisely maps genes encoding potential drug targets.
Abstract: The pathways that comprise cellular metabolism are highly interconnected, and alterations in individual enzymes can have far-reaching effects. As a result, global profiling methods that measure gene expression are of limited value in predicting how the loss of an individual function will affect the cell. In this work, we employed a new method of global phenotypic profiling to directly define the genes required for the growth of Mycobacterium tuberculosis. A combination of high-density mutagenesis and deep-sequencing was used to characterize the composition of complex mutant libraries exposed to different conditions. This allowed the unambiguous identification of the genes that are essential for Mtb to grow in vitro, and proved to be a significant improvement over previous approaches. To further explore functions that are required for persistence in the host, we defined the pathways necessary for the utilization of cholesterol, a critical carbon source during infection. Few of the genes we identified had previously been implicated in this adaptation by transcriptional profiling, and only a fraction were encoded in the chromosomal region known to encode sterol catabolic functions. These genes comprise an unexpectedly large percentage of those previously shown to be required for bacterial growth in mouse tissue. Thus, this single nutritional change accounts for a significant fraction of the adaption to the host. This work provides the most comprehensive genetic characterization of a sterol catabolic pathway to date, suggests putative roles for uncharacterized virulence genes, and precisely maps genes encoding potential drug targets.
••
TL;DR: This review compares the MYB and bHLH gene families from structural, evolutionary and functional perspectives and suggests that the next few years are likely to witness an increasing understanding of the extent to which conserved transcription factors participate at similar positions in gene regulatory networks across plant species.
Abstract: The expansion of gene families encoding regulatory proteins is typically associated with the increase in complexity characteristic of multi-cellular organisms. The MYB and basic helix-loop-helix (bHLH) families provide excellent examples of how gene duplication and divergence within particular groups of transcription factors are associated with, if not driven by, the morphological and metabolic diversity that characterize the higher plants. These gene families expanded dramatically in higher plants; for example, there are approximately 339 and 162 MYB and bHLH genes, respectively, in Arabidopsis, and approximately 230 and 111, respectively, in rice. In contrast, the Chlamydomonas genome has only 38 MYB genes and eight bHLH genes. In this review, we compare the MYB and bHLH gene families from structural, evolutionary and functional perspectives. The knowledge acquired on the role of many of these factors in Arabidopsis provides an excellent reference to explore sequence-function relationships in crops and other plants. The physical interaction and regulatory synergy between particular sub-classes of MYB and bHLH factors is perhaps one of the best examples of combinatorial plant gene regulation. However, members of the MYB and bHLH families also interact with a number of other regulatory proteins, forming complexes that either activate or repress the expression of sets of target genes that are increasingly being identified through a diversity of high-throughput genomic approaches. The next few years are likely to witness an increasing understanding of the extent to which conserved transcription factors participate at similar positions in gene regulatory networks across plant species.
••
TL;DR: This work established various gene trap cell lines and transgenic cell lines expressing a short-lived luciferase protein from an unstable mRNA, and recorded bioluminescence in real time in single cells, demonstrating that bursting kinetics are highly gene-specific.
Abstract: In prokaryotes and eukaryotes, most genes appear to be transcribed during short periods called transcriptional bursts, interspersed by silent intervals. We describe how such bursts generate gene-specific temporal patterns of messenger RNA (mRNA) synthesis in mammalian cells. To monitor transcription at high temporal resolution, we established various gene trap cell lines and transgenic cell lines expressing a short-lived luciferase protein from an unstable mRNA, and recorded bioluminescence in real time in single cells. Mathematical modeling identified gene-specific on- and off-switching rates in transcriptional activity and mean numbers of mRNAs produced during the bursts. Transcriptional kinetics were markedly altered by cis-regulatory DNA elements. Our analysis demonstrated that bursting kinetics are highly gene-specific, reflecting refractory periods during which genes stay inactive for a certain time before switching on again.
••
TL;DR: The 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47, based on 8.3× dideoxy sequence coverage, is reported, indicating pervasive selection for a smaller genome in this outcrossing species.
Abstract: We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.
•
01 Jan 2011
TL;DR: Achilles cleavage actin and actin homologs AIDS AIDS HIV enzymes Alzheimer's disease amino acid synthesis annexins antibody molecules antisense oligonucleotides arabidopsis genome autoantibodies and autoimmunity automation in genome research.
Abstract: Achilles cleavage actin and actin homologs AIDS AIDS HIV enzymes Alzheimer's disease amino acid synthesis annexins antibody molecules antisense oligonucleotides arabidopsis genome autoantibodies and autoimmunity automation in genome research bacterial growth and division bacterial pathogenesis bacteriorhodopsin biochemical genetics biodegradation of organic wastes bioelectronics bioenergetics of the cell bioinorganic chemistry biomaterials for organ regeneration biomolecular electronics and applications bioorganic chemistry bioprocess engineering bioreactor transport processes biosensors biotechnology breast cancer calcium biochemistry/cancer carbohydrate analysis carbohydrate antigens cardiovascular diseases cell-cell interactions cell death and ageing chaperones chemiluminescence and bioluminescence repressor-operator recognition restriction endonucleases and methyltransferases for the modification of DNA restriction landmark genomic scanning method retinoblastoma retinoids ribosome preparations and protein synthesis techniques ribozyme chemistry RNA scanning tunnelling microscopy in sequencing of DNA sequence alignment of proteins and nucleic acids sequence analysis sequence divergence estimation steroid hormones and receptors superantigens synthetic peptide libraries theoretical molecular biology transgenic animal patents transgenic fish/transgenic mammals translation of RNA protein transport proteins transposens in the human genome triple-helix forming oligonucleotides tumour suppressor genes ultraviolet radiation damage to DNA vaccine biotechnolog viral envelope assembly and budding viruses vitamins X-ray diffraction of biomolecules yeast artificial chromosomes techniques yeast genetics zinc finger DNA binding moti. (Part contents).
••
TL;DR: The results indicate that 5hmC has a probable role in transcriptional regulation, and suggest a model in which5hmC contributes to the ‘poised’ chromatin signature found at developmentally-regulated genes in ES cells.
Abstract: 5-hydroxymethylcytosine (5hmC) is a modified base present at low levels in diverse cell types in mammals. 5hmC is generated by the TET family of Fe(II) and 2-oxoglutarate-dependent enzymes through oxidation of 5-methylcytosine (5mC). 5hmC and TET proteins have been implicated in stem cell biology and cancer, but information on the genome-wide distribution of 5hmC is limited. Here we describe two novel and specific approaches to profile the genomic localization of 5hmC. The first approach, termed GLIB (glucosylation, periodate oxidation, biotinylation) uses a combination of enzymatic and chemical steps to isolate DNA fragments containing as few as a single 5hmC. The second approach involves conversion of 5hmC to cytosine 5-methylenesulphonate (CMS) by treatment of genomic DNA with sodium bisulphite, followed by immunoprecipitation of CMS-containing DNA with a specific antiserum to CMS. High-throughput sequencing of 5hmC-containing DNA from mouse embryonic stem (ES) cells showed strong enrichment within exons and near transcriptional start sites. 5hmC was especially enriched at the start sites of genes whose promoters bear dual histone 3 lysine 27 trimethylation (H3K27me3) and histone 3 lysine 4 trimethylation (H3K4me3) marks. Our results indicate that 5hmC has a probable role in transcriptional regulation, and suggest a model in which 5hmC contributes to the 'poised' chromatin signature found at developmentally-regulated genes in ES cells.
••
TL;DR: A strong genetic component to inter-individual variation in DNA methylation profiles is demonstrated, and there was an enrichment of SNPs that affect both methylation and gene expression, providing evidence for shared mechanisms in a fraction of genes.
Abstract: DNA methylation is an essential epigenetic mechanism involved in gene regulation and disease, but little is known about the mechanisms underlying inter-individual variation in methylation profiles. Here we measured methylation levels at 22,290 CpG dinucleotides in lymphoblastoid cell lines from 77 HapMap Yoruba individuals, for which genome-wide gene expression and genotype data were also available. Association analyses of methylation levels with more than three million common single nucleotide polymorphisms (SNPs) identified 180 CpG-sites in 173 genes that were associated with nearby SNPs (putatively in cis, usually within 5 kb) at a false discovery rate of 10%. The most intriguing trans signal was obtained for SNP rs10876043 in the disco-interacting protein 2 homolog B gene (DIP2B, previously postulated to play a role in DNA methylation), that had a genome-wide significant association with the first principal component of patterns of methylation; however, we found only modest signal of trans-acting associations overall. As expected, we found significant negative correlations between promoter methylation and gene expression levels measured by RNA-sequencing across genes. Finally, there was a significant overlap of SNPs that were associated with both methylation and gene expression levels. Our results demonstrate a strong genetic component to inter-individual variation in DNA methylation profiles. Furthermore, there was an enrichment of SNPs that affect both methylation and gene expression, providing evidence for shared mechanisms in a fraction of genes.
••
TL;DR: A draft genomic sequence of the CHO-K1 ancestral cell line is presented and it is discussed how the availability of this genome sequence may facilitate genome-scale science for the optimization of biopharmaceutical protein production.
Abstract: Chinese hamster ovary (CHO)-derived cell lines are the preferred host cells for the production of therapeutic proteins. Here we present a draft genomic sequence of the CHO-K1 ancestral cell line. The assembly comprises 2.45 Gb of genomic sequence, with 24,383 predicted genes. We associate most of the assembled scaffolds with 21 chromosomes isolated by microfluidics to identify chromosomal locations of genes. Furthermore, we investigate genes involved in glycosylation, which affect therapeutic protein quality, and viral susceptibility genes, which are relevant to cell engineering and regulatory concerns. Homologs of most human glycosylation-associated genes are present in the CHO-K1 genome, although 141 of these homologs are not expressed under exponential growth conditions. Many important viral entry genes are also present in the genome but not expressed, which may explain the unusual viral resistance property of CHO cell lines. We discuss how the availability of this genome sequence may facilitate genome-scale science for the optimization of biopharmaceutical protein production.
••
Purdue University1, Kanazawa University2, Graduate University for Advanced Studies3, National Institutes of Natural Sciences, Japan4, University of California, Davis5, Monash University6, Pennsylvania State University7, University at Buffalo8, New York Botanical Garden9, University of Regina10, University of Arizona11, University of Georgia12, University of Potsdam13, Salk Institute for Biological Studies14, Charles University in Prague15, College of William & Mary16, University of California, San Diego17, École normale supérieure de Lyon18, Carnegie Institution for Science19, Hokkaido University20, University of Jena21, Martin Luther University of Halle-Wittenberg22, University of Copenhagen23, University of Tokyo24, Nagoya University25, Free University of Berlin26, University of Tsukuba27, University of Tübingen28, University of Rostock29, Nara Institute of Science and Technology30, Mayo Clinic31, University of California, Berkeley32, Rutgers University33, National Institute of Genetics34, Max Planck Society35, University of Tennessee Health Science Center36, University of Washington37, Dalhousie University38, University of Oxford39, University of Freiburg40, University of Los Andes41, University of Rhode Island42, Joint BioEnergy Institute43, Ruhr University Bochum44, Texas A&M University45, Osaka University46, Cornell University47, Cold Spring Harbor Laboratory48, University of Burgundy49, Utah State University50, United States Department of Energy51
TL;DR: The genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first nonseed vascular plant genome reported, is reported, finding that the transition from a gametophytes- to a sporophyte-dominated life cycle required far fewer new genes than the Transition from a non Seed vascular to a flowering plant.
Abstract: Vascular plants appeared ~410 million years ago, then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes. We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first nonseed vascular plant genome reported. By comparing gene content in evolutionarily diverse taxa, we found that the transition from a gametophyte- to a sporophyte-dominated life cycle required far fewer new genes than the transition from a nonseed vascular to a flowering plant, whereas secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in posttranscriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the trans-acting small interfering RNA pathway, and extensive RNA editing of organellar genes.
••
TL;DR: Compared genome-wide DNA methylation among 10 A. thaliana lines, differentially methylated sites were farther from transposable elements and showed less association with short interfering RNA expression than invariant positions, which has important implications for the potential contribution of sequence-independent epialleles to plant evolution.
Abstract: Heritable epigenetic polymorphisms, such as differential cytosine methylation, can underlie phenotypic variation. Moreover, wild strains of the plant Arabidopsis thaliana differ in many epialleles, and these can influence the expression of nearby genes. However, to understand their role in evolution, it is imperative to ascertain the emergence rate and stability of epialleles, including those that are not due to structural variation. We have compared genome-wide DNA methylation among 10 A. thaliana lines, derived 30 generations ago from a common ancestor. Epimutations at individual positions were easily detected, and close to 30,000 cytosines in each strain were differentially methylated. In contrast, larger regions of contiguous methylation were much more stable, and the frequency of changes was in the same low range as that of DNA mutations. Like individual positions, the same regions were often affected by differential methylation in independent lines, with evidence for recurrent cycles of forward and reverse mutations. Transposable elements and short interfering RNAs have been causally linked to DNA methylation. In agreement, differentially methylated sites were farther from transposable elements and showed less association with short interfering RNA expression than invariant positions. The biased distribution and frequent reversion of epimutations have important implications for the potential contribution of sequence-independent epialleles to plant evolution.
••
TL;DR: Genetic differences between Arabidopsis thaliana accessions underlie the plant’s extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0.
Abstract: Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
••
TL;DR: Using chromatin immunoprecipitation linked to high throughput sequencing, HIF-binding sites across the genome are identified, indicating that these sites operate over long genomic intervals, and epigenetic regulation of chromatin may have an important role in defining the response to hypoxia.
••
TL;DR: It is proposed that the universal bias in gene loss between the genomes of this ancient tetraploid, and perhaps all tetraPLoids, is the result of selection against loss of the gene responsible for the majority of total expression for a duplicate gene pair.
Abstract: Ancient tetraploidies are found throughout the eukaryotes. After duplication, one copy of each duplicate gene pair tends to be lost (fractionate). For all studied tetraploidies, the loss of duplicated genes, known as homeologs, homoeologs, ohnologs, or syntenic paralogs, is uneven between duplicate regions. In maize, a species that experienced a tetraploidy 5–12 million years ago, we show that in addition to uneven ancient gene loss, the two complete genomes contained within maize are differentiated by ongoing fractionation among diverse inbreds as well as by a pattern of overexpression of genes from the genome that has experienced less gene loss. These expression differences are consistent over a range of experiments quantifying RNA abundance in different tissues. We propose that the universal bias in gene loss between the genomes of this ancient tetraploid, and perhaps all tetraploids, is the result of selection against loss of the gene responsible for the majority of total expression for a duplicate gene pair. Although the tetraploidy of maize is ancient, biased gene loss and expression continue today and explain, at least in part, the remarkable genetic diversity found among modern maize cultivars.