scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2010"


Journal ArticleDOI
TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Abstract: High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

13,337 citations


01 Jan 2010
TL;DR: The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM) and proposed a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes as discussed by the authors.
Abstract: The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefit in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.

4,464 citations


Journal ArticleDOI
TL;DR: DGseq, an R package to identify differentially expressed genes or isoforms for RNA-seq data from different samples is presented, integrated three existing methods, and introduced two novel methods based on MA-plot to detect and visualize gene expression difference.
Abstract: Summary: High-throughput RNA sequencing (RNA-seq) is rapidly emerging as a major quantitative transcriptome profiling platform. Here, we present DEGseq, an R package to identify differentially expressed genes or isoforms for RNA-seq data from different samples. In this package, we integrated three existing methods, and introduced two novel methods based on MA-plot to detect and visualize gene expression difference. Availability: The R package and a quick-start vignette is available at http://bioinfo.au.tsinghua.edu.cn/software/degseq Contact: xwwang@tsinghua.edu.cn; zhangxg@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

3,235 citations


Journal ArticleDOI
02 Jul 2010-Science
TL;DR: The design, synthesis, and assembly of the 1.08–mega–base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new cells that are controlled only by the synthetic chromosome are reported.
Abstract: We report the design, synthesis, and assembly of the 1.08-mega-base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new M. mycoides cells that are controlled only by the synthetic chromosome. The only DNA in the cells is the designed synthetic DNA sequence, including "watermark" sequences and other designed gene deletions and polymorphisms, and mutations acquired during the building process. The new cells have expected phenotypic properties and are capable of continuous self-replication.

2,256 citations


Journal ArticleDOI
22 Jan 2010-Science
TL;DR: A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function.
Abstract: A genome-scale genetic interaction map was constructed by examining 5.4 million gene-gene pairs for synthetic genetic interactions, generating quantitative genetic interaction profiles for ~75% of all genes in the budding yeast, Saccharomyces cerevisiae. A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function. The global network identifies functional cross-connections between all bioprocesses, mapping a cellular wiring diagram of pleiotropy. Genetic interaction degree correlated with a number of different gene attributes, which may be informative about genetic network hubs in other organisms. We also demonstrate that extensive and unbiased mapping of the genetic landscape provides a key for interpretation of chemical-genetic interactions and drug target identification.

2,225 citations


Journal ArticleDOI
24 Jun 2010-Nature
TL;DR: It is found that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role, and this analysis extended to other cancer-related genes that possess pseudogenes, and revealed a non-coding function for mRNAs.
Abstract: The canonical role of messenger RNA (mRNA) is to deliver protein-coding information to sites of protein synthesis. However, given that microRNAs bind to RNAs, we hypothesized that RNAs could possess a regulatory role that relies on their ability to compete for microRNA binding, independently of their protein-coding function. As a model for the protein-coding-independent role of RNAs, we describe the functional relationship between the mRNAs produced by the PTEN tumour suppressor gene and its pseudogene PTENP1 and the critical consequences of this interaction. We find that PTENP1 is biologically active as it can regulate cellular levels of PTEN and exert a growth-suppressive role. We also show that the PTENP1 locus is selectively lost in human cancer. We extended our analysis to other cancer-related genes that possess pseudogenes, such as oncogenic KRAS. We also demonstrate that the transcripts of protein-coding genes such as PTEN are biologically active. These findings attribute a novel biological role to expressed pseudogenes, as they can regulate coding gene expression, and reveal a non-coding function for mRNAs.

2,107 citations


Journal ArticleDOI
30 Jul 2010-Science
TL;DR: System-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size.
Abstract: Protein and messenger RNA (mRNA) copy numbers vary from cell to cell in isogenic bacterial populations. However, these molecules often exist in low copy numbers and are difficult to detect in single cells. We carried out quantitative system-wide analyses of protein and mRNA expression in individual cells with single-molecule sensitivity using a newly constructed yellow fluorescent protein fusion library for Escherichia coli. We found that almost all protein number distributions can be described by the gamma distribution with two fitting parameters which, at low expression levels, have clear physical interpretations as the transcription rate and protein burst size. At high expression levels, the distributions are dominated by extrinsic noise. We found that a single cell's protein and mRNA copy numbers for any given gene are uncorrelated.

1,970 citations


Journal ArticleDOI
06 Aug 2010-Cell
TL;DR: A model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes is proposed.

1,768 citations


01 Aug 2010
TL;DR: In this paper, the identification of lincRNAs (lincRNA-p21) that serve as a repressor in p53-dependent transcriptional responses was reported, and the observed transcriptional repression was mediated through the physical association with hnRNP-K at repressed genes and regulation of p53 mediates apoptosis.
Abstract: Recently, more than 1000 large intergenic noncoding RNAs (lincRNAs) have been reported. These RNAs are evolutionarily conserved in mammalian genomes and thus presumably function in diverse biological processes. Here, we report the identification of lincRNAs that are regulated by p53. One of these lincRNAs (lincRNA-p21) serves as a repressor in p53-dependent transcriptional responses. Inhibition of lincRNA-p21 affects the expression of hundreds of gene targets enriched for genes normally repressed by p53. The observed transcriptional repression by lincRNA-p21 is mediated through the physical association with hnRNP-K. This interaction is required for proper genomic localization of hnRNP-K at repressed genes and regulation of p53 mediates apoptosis. We propose a model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes.

1,593 citations


Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: It is demonstrated that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites.
Abstract: Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.

1,325 citations


Journal ArticleDOI
Stephen Richards1, Richard A. Gibbs1, Nicole M. Gerardo2, Nancy A. Moran3  +220 moreInstitutions (58)
TL;DR: The genome of the pea aphid shows remarkable levels of gene duplication and equally remarkable gene absences that shed light on aspects of aphid biology, most especially its symbiosis with Buchnera.
Abstract: Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

Journal ArticleDOI
TL;DR: In this paper, a mixture-of-isoforms (MISO) model was proposed to estimate expression of alternatively spliced exons and isoforms and assesses confidence in these estimates.
Abstract: Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking-immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.

Journal ArticleDOI
TL;DR: TranslatorX is presented, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations, with a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments.
Abstract: We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

Journal ArticleDOI
TL;DR: An algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities and its accuracy is described and several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.
Abstract: We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.

Journal ArticleDOI
TL;DR: A comprehensive classification of the models that are relevant to all stages of the evolution of gene duplications is presented, each of which predicts a unique combination of evolutionary dynamics and functional properties.
Abstract: Gene duplications and their subsequent divergence play an important part in the evolution of novel gene functions. Several models for the emergence, maintenance and evolution of gene copies have been proposed. However, a clear consensus on how gene duplications are fixed and maintained in genomes is lacking. Here, we present a comprehensive classification of the models that are relevant to all stages of the evolution of gene duplications. Each model predicts a unique combination of evolutionary dynamics and functional properties. Setting out these predictions is an important step towards identifying the main mechanisms that are involved in the evolution of gene duplications.

Journal ArticleDOI
08 Oct 2010-Science
TL;DR: The nature and pattern of the mutations suggest that PPP2R1A functions as an oncogene and ARID1A as a tumor-suppressor gene contributes to the pathogenesis of OCCC.
Abstract: Ovarian clear cell carcinoma (OCCC) is an aggressive human cancer that is generally resistant to therapy. To explore the genetic origin of OCCC, we determined the exomic sequences of eight tumors after immunoaffinity purification of cancer cells. Through comparative analyses of normal cells from the same patients, we identified four genes that were mutated in at least two tumors. PIK3CA, which encodes a subunit of phosphatidylinositol-3 kinase, and KRAS, which encodes a well-known oncoprotein, had previously been implicated in OCCC. The other two mutated genes were previously unknown to be involved in OCCC: PPP2R1A encodes a regulatory subunit of serine/threonine phosphatase 2, and ARID1A encodes adenine-thymine (AT)-rich interactive domain-containing protein 1A, which participates in chromatin remodeling. The nature and pattern of the mutations suggest that PPP2R1A functions as an oncogene and ARID1A as a tumor-suppressor gene. In a total of 42 OCCCs, 7% had mutations in PPP2R1A and 57% had mutations in ARID1A. These results suggest that aberrant chromatin remodeling contributes to the pathogenesis of OCCC.

Journal ArticleDOI
TL;DR: The evidence for an important role of rare gene variants of major effect in common diseases is evaluated and discovery strategies for their identification are outlined.
Abstract: Although genome-wide association (GWA) studies for common variants have thus far succeeded in explaining only a modest fraction of the genetic components of human common diseases, recent advances in next-generation sequencing technologies could rapidly facilitate substantial progress. This outcome is expected if much of the missing genetic control is due to gene variants that are too rare to be picked up by GWA studies and have relatively large effects on risk. Here, we evaluate the evidence for an important role of rare gene variants of major effect in common diseases and outline discovery strategies for their identification.


Journal ArticleDOI
Sushmita Roy1, Jason Ernst1, Peter V. Kharchenko2, Pouya Kheradpour1, Nicolas Nègre3, Matthew L. Eaton4, Jane M. Landolin5, Christopher A. Bristow1, Lijia Ma3, Michael F. Lin1, Stefan Washietl6, Bradley I. Arshinoff7, Ferhat Ay8, Patrick E. Meyer9, Nicolas Robine10, Nicole L. Washington5, Luisa Di Stefano2, Eugene Berezikov11, Christopher D. Brown3, Rogerio Candeias6, Joseph W. Carlson5, Adrian Carr12, Irwin Jungreis1, Daniel Marbach1, Rachel Sealfon1, Michael Y. Tolstorukov2, Sebastian Will6, Artyom A. Alekseyenko2, Carlo G. Artieri13, Benjamin W. Booth5, Angela N. Brooks14, Qi Dai10, Carrie A. Davis15, Michael O. Duff16, X. Feng, Andrey A. Gorchakov2, Tingting Gu17, Jorja G. Henikoff10, Philipp Kapranov18, Renhua Li13, Heather K. MacAlpine4, John H. Malone13, Aki Minoda5, Jared T. Nordman6, Katsutomo Okamura10, Marc D. Perry7, Sara K. Powell4, Nicole C. Riddle17, Akiko Sakai2, Anastasia Samsonova2, Jeremy E. Sandler5, Yuri B. Schwartz2, Noa Sher6, Rebecca Spokony3, David Sturgill13, Marijke J. van Baren17, Kenneth H. Wan5, Li Yang16, Charles Yu5, Elise A. Feingold13, Peter J. Good13, Mark S. Guyer13, Rebecca F. Lowdon13, Kami Ahmad2, Justen Andrews19, Bonnie Berger1, Steven E. Brenner14, Michael R. Brent17, Lucy Cherbas19, Sarah C. R. Elgin17, Thomas R. Gingeras18, Robert L. Grossman3, Roger A. Hoskins5, Thomas C. Kaufman19, W. J. Kent20, Mitzi I. Kuroda2, Terry L. Orr-Weaver6, Norbert Perrimon2, Vincenzo Pirrotta21, James W. Posakony22, Bing Ren22, Steven Russell12, Peter Cherbas19, Brenton R. Graveley16, Suzanna E. Lewis5, Gos Micklem12, Brian Oliver13, Peter J. Park2, Susan E. Celniker5, Steven Henikoff23, Gary H. Karpen14, Eric C. Lai10, David M. MacAlpine4, Lincoln Stein7, Kevin P. White3, Manolis Kellis1 
24 Dec 2010-Science
TL;DR: The Drosophila Encyclopedia of DNA Elements (modENCODE) project as mentioned in this paper has been used to map transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines.
Abstract: To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

Journal ArticleDOI
11 Mar 2010-Nature
TL;DR: A novel differential approach selective for the 5′ end of primary transcripts is presented, establishing a paradigm for mapping and annotating the primary transcriptomes of many living species and discovering hundreds of transcriptional start sites within operons, and opposite to annotated genes.
Abstract: Genome sequencing of Helicobacter pylori has revealed the potential proteins and genetic diversity of this prevalent human pathogen, yet little is known about its transcriptional organization and noncoding RNA output. Massively parallel cDNA sequencing (RNA-seq) has been revolutionizing global transcriptomic analysis. Here, using a novel differential approach (dRNA-seq) selective for the 5' end of primary transcripts, we present a genome-wide map of H. pylori transcriptional start sites and operons. We discovered hundreds of transcriptional start sites within operons, and opposite to annotated genes, indicating that complexity of gene expression from the small H. pylori genome is increased by uncoupling of polycistrons and by genome-wide antisense transcription. We also discovered an unexpected number of approximately 60 small RNAs including the epsilon-subdivision counterpart of the regulatory 6S RNA and associated RNA products, and potential regulators of cis- and trans-encoded target messenger RNAs. Our approach establishes a paradigm for mapping and annotating the primary transcriptomes of many living species.

01 Nov 2010
TL;DR: The mixture-of-isoforms (MISO) model is developed, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates, providing a probabilistic framework for RNA-seq analysis and functional insights into pre-mRNA processing.
Abstract: Through alternative splicing, most human genes express multiple isoforms that often differ in function To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels MISO also detects differentially regulated exons or isoforms Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking-immunoprecipitation sequencing (CLIP-seq) analysis in human cells Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression

Journal ArticleDOI
30 Apr 2010-Science
TL;DR: Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four and demonstrate the value of complete genome sequencing in families.
Abstract: We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.

Journal ArticleDOI
TL;DR: A whole-genome comparative view of DNA methylation using bisulfite sequencing of three cultured cell types representing progressive stages of differentiation highlights the value of high-resolution methylation maps, in conjunction with other systems-level analyses, for investigation of previously undetectable developmental regulatory mechanisms.
Abstract: DNA methylation is a critical epigenetic regulator in mammalian development. Here, we present a whole-genome comparative view of DNA methylation using bisulfite sequencing of three cultured cell types representing progressive stages of differentiation: human embryonic stem cells (hESCs), a fibroblastic differentiated derivative of the hESCs, and neonatal fibroblasts. As a reference, we compared our maps with a methylome map of a fully differentiated adult cell type, mature peripheral blood mononuclear cells (monocytes). We observed many notable common and cell-type-specific features among all cell types. Promoter hypomethylation (both CG and CA) and higher levels of gene body methylation were positively correlated with transcription in all cell types. Exons were more highly methylated than introns, and sharp transitions of methylation occurred at exon-intron boundaries, suggesting a role for differential methylation in transcript splicing. Developmental stage was reflected in both the level of global methylation and extent of non-CpG methylation, with hESC highest, fibroblasts intermediate, and monocytes lowest. Differentiation-associated differential methylation profiles were observed for developmentally regulated genes, including the HOX clusters, other homeobox transcription factors, and pluripotence-associated genes such as POU5F1, TCF3, and KLF4. Our results highlight the value of high-resolution methylation maps, in conjunction with other systems-level analyses, for investigation of previously undetectable developmental regulatory mechanisms.

Journal ArticleDOI
20 May 2010-Nature
TL;DR: A method to globally capture intra- and inter-chromosomal interactions is developed and applied to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae, which recapitulates known features of genome organization, thereby validating the method, and identifies new features.
Abstract: Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or 'factories' for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.

Journal ArticleDOI
Mark Gerstein1, Zhi John Lu1, Eric L. Van Nostrand2, Chao Cheng1, Bradley I. Arshinoff3, Tao Liu4, Kevin Y. Yip1, R. Robilotto1, Andreas Rechtsteiner5, Kohta Ikegami6, P. Alves1, A. Chateigner, Marc D. Perry7, Mitzi Morris8, Raymond K. Auerbach1, X. Feng9, Jing Leng1, A. Vielle10, Wei Niu1, Kahn Rhrissorrakrai8, Ashish Agarwal1, Roger P. Alexander1, Galt P. Barber5, Cathleen M. Brdlik2, J. Brennan6, Jeremy Brouillet2, Adrian Carr, Ming Sin Cheung10, Hiram Clawson5, Sergio Contrino, Luke Dannenberg11, Abby F. Dernburg12, Arshad Desai13, L. Dick14, Andréa C. Dosé12, Jiang Du1, Thea A. Egelhofer5, Sevinc Ercan6, Ghia Euskirchen1, Brent Ewing15, Elise A. Feingold16, Reto Gassmann13, Peter J. Good16, Philip Green15, Francois Gullier, M. Gutwein8, Mark S. Guyer16, Lukas Habegger1, Ting Han17, Jorja G. Henikoff18, Stefan R. Henz19, Angie S. Hinrichs5, H. Holster11, Tony Hyman19, A. Leo Iniguez11, J. Janette1, M. Jensen6, Masaomi Kato1, W. James Kent5, E. Kephart7, Vishal Khivansara17, Ekta Khurana1, John Kim17, P. Kolasinska-Zwierz10, Eric C. Lai20, Isabel J. Latorre10, Amber Leahey15, Suzanna E. Lewis12, Paul Lloyd7, Lucas Lochovsky1, Rebecca F. Lowdon16, Yaniv Lubling21, Rachel Lyne, Michael J. MacCoss15, Sebastian D. Mackowiak22, Marco Mangone8, Sheldon J. McKay23, D. Mecenas8, Gennifer E. Merrihew15, David M. Miller24, A. Muroyama13, John I. Murray15, Siew Loon Ooi18, Hoang Pham12, T. Phippen5, Elicia Preston15, Nikolaus Rajewsky22, Gunnar Rätsch19, Heidi Rosenbaum11, Joel Rozowsky1, Kim Rutherford, P. Ruzanov7, Mihail Sarov19, Rajkumar Sasidharan1, Andrea Sboner1, P. Scheid8, Eran Segal21, Hyunjin Shin4, C. Shou1, Frank J. Slack1, C. Slightam2, Richard J.H. Smith, William C. Spencer24, Eo Stinson12, S. Taing4, Teruaki Takasaki5, D. Vafeados15, Ksenia Voronina13, Guilin Wang1, Nicole L. Washington12, Christina M. Whittle6, Beijing Wu2, Koon-Kiu Yan1, Georg Zeller, Z. Zha7, Mei Zhong1, Xingliang Zhou6, Julie Ahringer10, Susan Strome5, Kristin C. Gunsalus25, Gos Micklem, X. Shirley Liu4, Valerie Reinke1, Stuart K. Kim2, LaDeana W. Hillier15, Steven Henikoff18, Fabio Piano25, Michael Snyder1, Lincoln Stein23, Jason D. Lieb6, Robert H. Waterston15 
24 Dec 2010-Science
TL;DR: These studies identified regions of the nematode and fly genomes that show highly occupied targets (or HOT) regions where DNA was bound by more than 15 of the transcription factors analyzed and the expression of related genes were characterized, providing insights into the organization, structure, and function of the two genomes.
Abstract: We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

Journal ArticleDOI
15 Jan 2010-Science
TL;DR: These studies illustrate how major expression and morphological changes can arise from single mutational leaps in natural populations, producing new adaptive alleles via recurrent regulatory alterations in a key developmental control gene.
Abstract: The molecular mechanisms underlying major phenotypic changes that have evolved repeatedly in nature are generally unknown. Pelvic loss in different natural populations of threespine stickleback fish has occurred through regulatory mutations deleting a tissue-specific enhancer of the Pituitary homeobox transcription factor 1 (Pitx1) gene. The high prevalence of deletion mutations at Pitx1 may be influenced by inherent structural features of the locus. Although Pitx1 null mutations are lethal in laboratory animals, Pitx1 regulatory mutations show molecular signatures of positive selection in pelvic-reduced populations. These studies illustrate how major expression and morphological changes can arise from single mutational leaps in natural populations, producing new adaptive alleles via recurrent regulatory alterations in a key developmental control gene.

Journal ArticleDOI
TL;DR: A RIP-seq method is developed to capture thePRC2 transcriptome and identify a genome-wide pool of >9000 PRC2-interacting RNAs in embryonic stem cells, some of which may be used as biomarkers and therapeutic targets for human disease.

Journal ArticleDOI
25 Mar 2010-Nature
TL;DR: The authors used massively parallel sequencing to identify selective sweeps of favorable alleles and candidate mutations that have had a prominent role in the domestication of domestic chickens and their subsequent specialization into broiler (meat-producing) and layer (egg-consuming) chickens.
Abstract: Domestic animals are excellent models for genetic studies of phenotypic evolution They have evolved genetic adaptations to a new environment, the farm, and have been subjected to strong human-driven selection leading to remarkable phenotypic changes in morphology, physiology and behaviour Identifying the genetic changes underlying these developments provides new insight into general mechanisms by which genetic variation shapes phenotypic diversity Here we describe the use of massively parallel sequencing to identify selective sweeps of favourable alleles and candidate mutations that have had a prominent role in the domestication of chickens (Gallus gallus domesticus) and their subsequent specialization into broiler (meat-producing) and layer (egg-producing) chickens We have generated 445-fold coverage of the chicken genome using pools of genomic DNA representing eight different populations of domestic chickens as well as red jungle fowl (Gallus gallus), the major wild ancestor We report more than 7,000,000 single nucleotide polymorphisms, almost 1,300 deletions and a number of putative selective sweeps One of the most striking selective sweeps found in all domestic chickens occurred at the locus for thyroid stimulating hormone receptor (TSHR), which has a pivotal role in metabolic regulation and photoperiod control of reproduction in vertebrates Several of the selective sweeps detected in broilers overlapped genes associated with growth, appetite and metabolic regulation We found little evidence that selection for loss-of-function mutations had a prominent role in chicken domestication, but we detected two deletions in coding sequences that we suggest are functionally important This study has direct application to animal breeding and enhances the importance of the domestic chicken as a model organism for biomedical research

Journal ArticleDOI
21 Oct 2010-Nature
TL;DR: The endosymbiosis that gave rise to mitochondria restructured the distribution of DNA in relation to bioenergetic membranes, permitting a remarkable 200,000-fold expansion in the number of genes expressed.
Abstract: All complex life is composed of eukaryotic (nucleated) cells. The eukaryotic cell arose from prokaryotes just once in four billion years, and otherwise prokaryotes show no tendency to evolve greater complexity. Why not? Prokaryotic genome size is constrained by bioenergetics. The endosymbiosis that gave rise to mitochondria restructured the distribution of DNA in relation to bioenergetic membranes, permitting a remarkable 200,000-fold expansion in the number of genes expressed. This vast leap in genomic capacity was strictly dependent on mitochondrial power, and prerequisite to eukaryote complexity: the key innovation en route to multicellular life.

Journal ArticleDOI
12 Feb 2010-Science
TL;DR: The results provide a molecular basis for the distribution of meiotic recombination in mammals, in which the binding of PRDM9 to specific DNA sequences targets the initiation of recombination at specific locations in the genome.
Abstract: Meiotic recombination events cluster into narrow segments of the genome, defined as hotspots. Here, we demonstrate that a major player for hotspot specification is the Prdm9 gene. First, two mouse strains that differ in hotspot usage are polymorphic for the zinc finger DNA binding array of PRDM9. Second, the human consensus PRDM9 allele is predicted to recognize the 13-mer motif enriched at human hotspots; this DNA binding specificity is verified by in vitro studies. Third, allelic variants of PRDM9 zinc fingers are significantly associated with variability in genome-wide hotspot usage among humans. Our results provide a molecular basis for the distribution of meiotic recombination in mammals, in which the binding of PRDM9 to specific DNA sequences targets the initiation of recombination at specific locations in the genome.