scispace - formally typeset
Search or ask a question

Showing papers by "Michael Snyder published in 2014"


Journal ArticleDOI
Feng Yue1, Feng Yue2, Yong Cheng3, Alessandra Breschi, Jeff Vierstra4, Weisheng Wu5, Weisheng Wu1, Tyrone Ryba6, Tyrone Ryba7, Richard Sandstrom4, Zhihai Ma3, Carrie A. Davis8, Benjamin D. Pope6, Yin Shen2, Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K. Marinov9, Brian A. Williams9, Diane Trout9, Henry Amrhein9, Katherine I. Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei Hoon See8, Meagan Fastuca8, Jorg Drenkow8, Chris Zaleski8, Alexander Dobin8, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M. A. Bender12, M. A. Bender4, Miaohua Zhang12, Rachel Byron12, Mark Groudine4, Mark Groudine12, David McCleary2, Long Pham2, Zhen Ye2, Samantha Kuan2, Lee Edsall2, Yi-Chieh Wu13, Matthew D. Rasmussen13, Mukul S. Bansal13, Manolis Kellis13, Manolis Kellis14, Cheryl A. Keller1, Christapher S. Morrissey1, Tejaswini Mishra1, Deepti Jain1, Nergiz Dogan1, Robert S. Harris1, Philip Cayting3, Trupti Kawli3, Alan P. Boyle3, Alan P. Boyle5, Ghia Euskirchen3, Anshul Kundaje3, Shin Lin3, Yiing Lin3, Camden Jansen15, Venkat S. Malladi3, Melissa S. Cline16, Drew T. Erickson3, Vanessa M. Kirkup16, Katrina Learned16, Cricket A. Sloan3, Kate R. Rosenbloom16, Beatriz Lacerda de Sousa17, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian18, Tamer Kahveci19, Dongwon Lee20, W. James Kent16, Miguel Santos17, Javier Herrero21, Cedric Notredame, Audra K. Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa K. Canfield4, Peter J. Sabo4, Matthew S. Wilken4, Thomas A. Reh4, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex Reynolds4, Shane Neph4, Richard Humbert4, R. Scott Hansen4, Marella F. T. R. de Bruijn22, Licia Selleri23, Alexander Y. Rudensky24, Steven Z. Josefowicz24, Robert M. Samstein24, Evan E. Eichler4, Stuart H. Orkin25, Dana N. Levasseur26, Thalia Papayannopoulou4, Kai Hsin Chang4, Arthur I. Skoultchi27, Srikanta Gosh27, Christine M. Disteche4, Piper M. Treuting4, Yanli Wang1, Mitchell J. Weiss, Gerd A. Blobel28, Xiaoyi Cao2, Sheng Zhong2, Ting Wang29, Peter J. Good30, Rebecca F. Lowdon30, Rebecca F. Lowdon29, Leslie B. Adams30, Leslie B. Adams31, Xiao Qiao Zhou30, Michael J. Pazin30, Elise A. Feingold30, Barbara J. Wold9, James Taylor11, Ali Mortazavi15, Sherman M. Weissman18, John A. Stamatoyannopoulos4, Michael Snyder3, Roderic Guigó, Thomas R. Gingeras8, David M. Gilbert6, Ross C. Hardison1, Michael A. Beer20, Bing Ren2 
20 Nov 2014-Nature
TL;DR: The mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types as mentioned in this paper.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases

1,335 citations


Journal ArticleDOI
18 Sep 2014-Nature
TL;DR: Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
Abstract: Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. Messenger RNA transcript abundance did not reliably predict protein abundance differences between tumours. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA 'microsatellite instability/CpG island methylation phenotype' transcriptomic subtype, but had distinct mutation, methylation and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34) and SRC (SRC proto-oncogene, non-receptor tyrosine kinase). Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.

1,183 citations


Journal ArticleDOI
20 Nov 2014-Nature
TL;DR: It is demonstrated that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replicationdomain boundaries, largely accounting for the previously reported lack of alignment.
Abstract: Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

783 citations


Journal ArticleDOI
TL;DR: The strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies are reviewed.
Abstract: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

691 citations


Journal ArticleDOI
30 Jan 2014-Nature
TL;DR: The initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child) is reported, which provides a comprehensive RSS map of human coding and non-coding RNAs.
Abstract: In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

512 citations


Journal ArticleDOI
12 Mar 2014-JAMA
TL;DR: The use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings.
Abstract: Importance Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. Objectives To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. Design, Setting, and Participants An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. Main Outcomes and Measures Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. Results Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P Conclusions and Relevance In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

413 citations


Journal ArticleDOI
TL;DR: Transposable elements have significantly and continuously shaped gene regulatory networks during mammalian evolution, and are an important driving force for regulatory innovation.
Abstract: Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF–TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

388 citations


Journal ArticleDOI
31 Jul 2014-Cell
TL;DR: It is shown that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type.

388 citations


Journal ArticleDOI
TL;DR: High-throughput sequencing assays on the transcriptome and epigenome reveal that, in general, differences dominate similarities between the two species, and indicate that there is considerable RNA expression diversity between humans and mice.
Abstract: Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.

313 citations


Journal ArticleDOI
TL;DR: New mechanistic and functional insights are revealed into regulatory region organization in the nucleus into cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and how the distal chromatin state affects gene transcription.
Abstract: Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and revealed how the distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type-specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions, while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.

269 citations


Journal ArticleDOI
20 Nov 2014-Nature
TL;DR: Using the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged are deduced.
Abstract: To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

Journal ArticleDOI
TL;DR: This work sequenced the lymphoblastoid transcriptomes of three family members by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that.
Abstract: Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV—in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

Feng Yue1, Feng Yue2, Yong Cheng3, Alessandra Breschi, Jeff Vierstra4, Weisheng Wu2, Weisheng Wu5, Tyrone Ryba6, Tyrone Ryba7, Richard Sandstrom4, Zhihai Ma3, Carrie A. Davis8, Benjamin D. Pope7, Yin Shen1, Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K. Marinov9, Brian A. Williams9, Diane Trout9, Henry Amrhein9, Katherine I. Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei Hoon See8, Meagan Fastuca8, Jorg Drenkow8, Chris Zaleski8, Alexander Dobin8, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M. A. Bender4, M. A. Bender12, Miaohua Zhang12, Rachel Byron12, Mark Groudine12, Mark Groudine4, David McCleary1, Long Pham1, Zhen Ye1, Samantha Kuan1, Lee Edsall1, Yi-Chieh Wu13, Matthew D. Rasmussen13, Mukul S. Bansal13, Manolis Kellis14, Manolis Kellis13, Cheryl A. Keller2, Christapher S. Morrissey2, Tejaswini Mishra2, Deepti Jain2, Nergiz Dogan2, Robert S. Harris2, Philip Cayting3, Trupti Kawli3, Alan P. Boyle3, Alan P. Boyle5, Ghia Euskirchen3, Anshul Kundaje3, Shin Lin3, Yiing Lin3, Camden Jansen15, Venkat S. Malladi3, Melissa S. Cline16, Drew T. Erickson3, Vanessa M. Kirkup16, Katrina Learned16, Cricket A. Sloan3, Kate R. Rosenbloom16, Beatriz Lacerda de Sousa17, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian18, Tamer Kahveci19, Dongwon Lee20, W. James Kent16, Miguel Santos17, Javier Herrero21, Cedric Notredame, Audra K. Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa K. Canfield4, Peter J. Sabo4, Matthew S. Wilken4, Thomas A. Reh4, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex Reynolds4, Shane Neph4, Richard Humbert4, R. Scott Hansen4, Marella F. T. R. de Bruijn22, Licia Selleri23, Alexander Y. Rudensky24, Steven Z. Josefowicz24, Robert M. Samstein24, Evan E. Eichler4, Stuart H. Orkin25, Dana N. Levasseur26, Thalia Papayannopoulou4, Kai Hsin Chang4, Arthur I. Skoultchi27, Srikanta Gosh27, Christine M. Disteche4, Piper M. Treuting4, Yanli Wang2, Mitchell J. Weiss, Gerd A. Blobel28, Xiaoyi Cao1, Sheng Zhong1, Ting Wang29, Peter J. Good30, Rebecca F. Lowdon29, Rebecca F. Lowdon30, Leslie B. Adams31, Leslie B. Adams30, Xiao Qiao Zhou30, Michael J. Pazin30, Elise A. Feingold30, Barbara J. Wold9, James Taylor11, Ali Mortazavi15, Sherman M. Weissman18, John A. Stamatoyannopoulos4, Michael Snyder3, Roderic Guigó, Thomas R. Gingeras8, David M. Gilbert7, Ross C. Hardison2, Michael A. Beer20, Bing Ren1 
01 Nov 2014
TL;DR: By comparing with the human genome, this work not only confirms substantial conservation in the newly annotated potential functional sequences, but also finds a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

Journal ArticleDOI
TL;DR: This work repurposes a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >107 RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA.
Abstract: Repurposing a DNA sequencing instrument for high-throughput analysis of RNA-protein interactions enables detailed analysis of sequence-function relationships.

Journal ArticleDOI
TL;DR: Using statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing, this work phases 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2–1 Mbp in length.
Abstract: Haplotyping of human genomes is improved by augmenting experimental dilution-based haplotyping with statistical analyses, a strategy known until now only as 'Moleculo.'

Journal ArticleDOI
28 Aug 2014-Nature
TL;DR: The results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections.
Abstract: Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Journal ArticleDOI
TL;DR: This work presents Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser extensible Data Paired-End (BedPE).
Abstract: Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, yet a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly-customizable, publication-ready, multi-panel figures from common genomic data formats including BED, bedGraph, and BEDPE. Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).

01 Aug 2014
TL;DR: In this article, the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors were mapped for a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time.
Abstract: Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

Journal ArticleDOI
Vinicius Tragante1, Michael R. Barnes2, Santhi K. Ganesh3, Matthew B. Lanktree4, Wei Guo5, Nora Franceschini6, Erin N. Smith7, Toby Johnson2, Michael V. Holmes8, Sandosh Padmanabhan9, Konrad J. Karczewski10, Berta Almoguera8, John Barnard11, Jens Baumert, Yen Pei C. Chang12, Clara C. Elbers1, Martin Farrall13, Mary E. Fischer14, Tom R. Gaunt15, Johannes M.I.H. Gho1, Christian Gieger, Anuj Goel13, Yan Gong16, Aaron Isaacs17, Marcus E. Kleber18, Irene Mateo Leach19, Caitrin W. McDonough16, Matthijs F.L. Meijs1, Olle Melander20, Christopher P. Nelson21, Christopher P. Nelson22, Ilja M. Nolte19, Nathan Pankratz23, Thomas S. Price, Jonathan A. Shaffer24, Sonia Shah25, Maciej Tomaszewski21, Peter J. van der Most19, Erik P A Van Iperen, Judith M. Vonk19, Kate Witkowska2, Caroline O. L. Wong2, Li Zhang11, Amber L. Beitelshees12, Gerald S. Berenson26, Deepak L. Bhatt27, Morris Brown28, Amber A. Burt29, Rhonda M. Cooper-DeHoff16, John M. C. Connell30, Karen J. Cruickshanks14, Sean P. Curtis31, George Davey-Smith15, Christian Delles9, Ron T. Gansevoort19, Xiuqing Guo32, Shen Haiqing12, Claire E. Hastie9, Marten H. Hofker19, Marten H. Hofker1, G. Kees Hovingh, Daniel Seung Kim29, Susan Kirkland33, Barbara E.K. Klein14, Ronald Klein14, Yun Li8, Steffi Maiwald, Christopher Newton-Cheh27, Eoin O'Brien34, N. Charlotte Onland-Moret1, Walter Palmas24, Afshin Parsa12, Brenda W.J.H. Penninx35, Mary Pettinger36, Ramachandran S. Vasan37, Jane E. Ranchalis29, Paul M. Ridker27, Lynda M. Rose27, Peter S. Sever38, Daichi Shimbo24, Laura Steele8, Ronald P. Stolk19, Barbara Thorand, Mieke D. Trip, Cornelia M. van Duijn17, W M Monique Verschuren1, Cisca Wijmenga19, Sharon B. Wyatt39, J. Hunter Young40, Aeilko H. Zwinderman, Connie R. Bezzina41, Eric Boerwinkle42, Juan P. Casas43, Mark J. Caulfield2, Aravinda Chakravarti40, Daniel I. Chasman27, Karina W. Davidson24, Pieter A. Doevendans1, Anna F. Dominiczak9, Garret A. FitzGerald8, John G. Gums16, Myriam Fornage42, Hakon Hakonarson8, Indrani Halder44, Hans L. Hillege19, Thomas Illig45, Gail P. Jarvik38, Julie A. Johnson16, John J.P. Kastelein, Wolfgang Koenig46, Meena Kumari25, Winfried März47, Sarah S. Murray7, Jeffrey R. O'Connell12, Albertine J. Oldehinkel19, James S. Pankow23, Daniel J. Rader8, Susan Redline27, Muredach P. Reilly8, Eric E. Schadt48, Kandice Kottke-Marchant11, Harold Snieder19, Michael Snyder10, Alice Stanton49, Martin D. Tobin21, André G. Uitterlinden17, Pim van der Harst19, Yvonne T. van der Schouw1, Nilesh J. Samani22, Nilesh J. Samani21, Hugh Watkins13, Andrew D. Johnson, Alexander P. Reiner36, Xiaofeng Zhu5, Paul I.W. de Bakker50, Daniel Levy, Folkert W. Asselbergs1, Folkert W. Asselbergs25, Patricia B. Munroe2, Brendan J. Keating8 
TL;DR: The findings extend the understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification and provide support for a putative role in hypertension of several genes.
Abstract: Blood pressure (BP) is a heritable risk factor for cardiovascular disease To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis We replicated findings in an independent set of 68,368 individuals of European ancestry Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2 Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules In summary, we identified previously unknown loci associated with BP Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification

Journal ArticleDOI
TL;DR: A systems framework involving the interactome, gene expression and genome sequencing is developed to identify a protein interaction module with members strongly enriched for autism candidate genes that delineates a natural network involved in autism.
Abstract: Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.

Journal ArticleDOI
28 Aug 2014-Nature
TL;DR: This work determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments and produced a spatiotemporally resolved metazoan transcription factor binding map.
Abstract: Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.

Journal ArticleDOI
30 Jan 2014-PLOS ONE
TL;DR: Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals, and a novel, unique role for STAT5A is found in binding to genes involved in neural development and function, while STAT5B appears to play a distinct role in T cell development andfunction via DOCK8, SNX9, FOXP3 and IL2RA binding.
Abstract: Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals. STAT5 refers to two highly related proteins, STAT5A and STAT5B, with critical function: their complete deficiency is lethal in mice; in humans, STAT5B deficiency alone leads to endocrine and immunological problems, while STAT5A deficiency has not been reported. STAT5A and STAT5B show peptide sequence similarities greater than 90%, but subtle structural differences suggest possible non-redundant roles in gene regulation. However, these roles remain unclear in humans. We applied chromatin immunoprecipitation followed by DNA sequencing using human CD4+ T cells to detect candidate genes regulated by STAT5A and/or STAT5B, and quantitative-PCR in STAT5A or STAT5B knock-down (KD) human CD4+ T cells to validate the findings. Our data show STAT5A and STAT5B play redundant roles in cell proliferation and apoptosis via SGK1 interaction. Interestingly, we found a novel, unique role for STAT5A in binding to genes involved in neural development and function (NDRG1, DNAJC6, and SSH2), while STAT5B appears to play a distinct role in T cell development and function via DOCK8, SNX9, FOXP3 and IL2RA binding. Our results also suggest that one or more co-activators for STAT5A and/or STAT5B may play important roles in establishing different binding abilities and gene regulation behaviors. The new identification of these genes regulated by STAT5A and/or STAT5B has major implications for understanding the pathophysiology of cancer progression, neural disorders, and immune abnormalities.

Journal ArticleDOI
TL;DR: The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription in a yeast system and propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.
Abstract: Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains were first identified from mammalian proteins that bind lipid/sterol ligands via a hydrophobic pocket. In plants, predicted START domains are predominantly found in homeodomain leucine zipper (HD-Zip) transcription factors that are master regulators of cell-type differentiation in development. Here we utilized studies of Arabidopsis in parallel with heterologous expression of START domains in yeast to investigate the hypothesis that START domains are versatile ligand-binding motifs that can modulate transcription factor activity. Our results show that deletion of the START domain from Arabidopsis Glabra2 (GL2), a representative HD-Zip transcription factor involved in differentiation of the epidermis, results in a complete loss-of-function phenotype, although the protein is correctly localized to the nucleus. Despite low sequence similarly, the mammalian START domain from StAR can functionally replace the HD-Zip-derived START domain. Embedding the START domain within a synthetic transcription factor in yeast, we found that several mammalian START domains from StAR, MLN64 and PCTP stimulated transcription factor activity, as did START domains from two Arabidopsis HD-Zip transcription factors. Mutation of ligand-binding residues within StAR START reduced this activity, consistent with the yeast assay monitoring ligand-binding. The D182L missense mutation in StAR START was shown to affect GL2 transcription factor activity in maintenance of the leaf trichome cell fate. Analysis of in vivo protein–metabolite interactions by mass spectrometry provided direct evidence for analogous lipid-binding activity in mammalian and plant START domains in the yeast system. Structural modeling predicted similar sized ligand-binding cavities of a subset of plant START domains in comparison to mammalian counterparts. The START domain is required for transcription factor activity in HD-Zip proteins from plants, although it is not strictly necessary for the protein’s nuclear localization. START domains from both mammals and plants are modular in that they can bind lipid ligands to regulate transcription factor function in a yeast system. The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription. We propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.

Journal ArticleDOI
TL;DR: The potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants is demonstrated, exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues.
Abstract: Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

Journal ArticleDOI
TL;DR: The advancing capacity of personalized sequencing is documents, its impact on disease-oriented scientific discovery and anticipates its role in the future of medicine are reviewed.
Abstract: The potential for personalized sequencing to individually optimize medical treatment in diseases such as cancer and for pharmacogenomic application is just beginning to be realized, and the utility of sequencing healthy individuals for managing health is also being explored. The data produced requires additional advancements in interpretation of variants of unknown significance to maximize clinical benefit. Nevertheless, personalized sequencing, only recently applied to clinical medicine, has already been broadly applied to the discovery and study of disease. It is poised to enable the earlier and more accurate diagnosis of disease risk and occurrence, guide prevention and individualized intervention as well as facilitate monitoring of healthy and treated patients, and play a role in the prevention and recurrence of future disease. This article documents the advancing capacity of personalized sequencing, reviews its impact on disease-oriented scientific discovery and anticipates its role in the future of ...

Journal ArticleDOI
TL;DR: Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.
Abstract: Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

Journal ArticleDOI
TL;DR: The proposed omics metadata checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications and allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era.
Abstract: Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

Journal ArticleDOI
TL;DR: It is shown that FAT10 knockout prevents the development of age-associated obesity in mice while extending lifespan and vigor without the appearance of deleterious developmental effects, and suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.
Abstract: The HLA-F adjacent transcript 10 (FAT10) is a member of the ubiquitin-like gene family that alters protein function/stability through covalent ligation. Although FAT10 is induced by inflammatory mediators and implicated in immunity, the physiological functions of FAT10 are poorly defined. We report the discovery that FAT10 regulates lifespan through pleiotropic actions on metabolism and inflammation. Median and overall lifespan are increased 20% in FAT10ko mice, coincident with elevated metabolic rate, preferential use of fat as fuel, and dramatically reduced adiposity. This phenotype is associated with metabolic reprogramming of skeletal muscle (i.e., increased AMP kinase activity, β-oxidation and -uncoupling, and decreased triglyceride content). Moreover, knockout mice have reduced circulating glucose and insulin levels and enhanced insulin sensitivity in metabolic tissues, consistent with elevated IL-10 in skeletal muscle and serum. These observations suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.

Journal ArticleDOI
TL;DR: An improved mapping of targets is created by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments, which identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations.
Abstract: Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.