Showing papers by "Mark Gerstein published in 2014"

PDF

Open Access

Journal Article•DOI•

Guidelines for investigating causality of sequence variants in human disease

[...]

Daniel G. MacArthur¹, Teri A. Manolio², David Dimmock³, Heidi L. Rehm¹, Jay Shendure⁴, Gonçalo R. Abecasis⁵, David R. Adams², Russ B. Altman⁶, Stylianos E. Antonarakis⁷, Euan A. Ashley⁶, Jeffrey C. Barrett⁸, Leslie G. Biesecker², Donald F. Conrad⁹, Gregory M. Cooper, Nancy J. Cox¹⁰, Mark J. Daly¹, Mark Gerstein¹¹, David Goldstein¹², Joel N. Hirschhorn¹³, Suzanne M. Leal¹⁴, Len A. Pennacchio¹⁵, John A. Stamatoyannopoulos⁴, Shamil R. Sunyaev¹, David Valle¹⁶, Benjamin F. Voight¹⁷, Wendy Winckler¹⁸, Chris Gunter - Show less +23 more•Institutions (18)

Harvard University¹, National Institutes of Health², Medical College of Wisconsin³, University of Washington⁴, University of Michigan⁵, Stanford University⁶, University of Geneva⁷, Wellcome Trust Sanger Institute⁸, Washington University in St. Louis⁹, University of Chicago¹⁰, Yale University¹¹, Duke University¹², Boston Children's Hospital¹³, Baylor College of Medicine¹⁴, Lawrence Berkeley National Laboratory¹⁵, Johns Hopkins University¹⁶, University of Pennsylvania¹⁷, Broad Institute¹⁸

24 Apr 2014-Nature

TL;DR: The key challenges of assessing sequence variants in human disease are discussed, integrating both gene-level and variant-level support for causality and guidelines for summarizing confidence in variant pathogenicity are proposed.

...read moreread less

Abstract: The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

...read moreread less

1,165 citations

Journal Article•DOI•

Transcriptional landscape of the prenatal human brain

[...]

Jeremy A. Miller¹, Songlin Ding¹, Susan M. Sunkin¹, Kimberly A. Smith¹, Lydia Ng¹, Aaron Szafer¹, Amanda Ebbert¹, Zackery L. Riley¹, Joshua J. Royall¹, Kaylynn Aiona¹, James M. Arnold¹, Crissa Bennet¹, Darren Bertagnolli¹, Krissy Brouner¹, Stephanie Butler¹, Shiella Caldejon¹, Anita Carey¹, Christine Cuhaciyan¹, Rachel A. Dalley¹, Nick Dee¹, Tim A. Dolbeare¹, Benjamin A.C. Facer¹, David Feng¹, Tim P. Fliss¹, Garrett Gee¹, Jeff Goldy¹, Lindsey Gourley¹, Benjamin W. Gregor¹, Guangyu Gu¹, Robert Howard¹, Jayson M. Jochim¹, Chihchau L. Kuan¹, Christopher Lau¹, Changkyu Lee¹, Felix Lee¹, Tracy Lemon¹, Phil Lesnar¹, Bergen McMurray¹, Naveed Mastan¹, Nerick Mosqueda¹, Theresa Naluai-Cecchini², Nhan Kiet Ngo¹, Julie Nyhus¹, Aaron Oldre¹, Eric Olson¹, Jody Parente¹, Patrick D. Parker¹, Sheana Parry¹, Allison Stevens³, Mihovil Pletikos⁴, Melissa Reding¹, Kate Roll¹, David Sandman¹, Melaine Sarreal¹, Sheila Shapouri¹, Nadiya V. Shapovalova¹, Elaine H. Shen¹, Nathan Sjoquist¹, Clifford R. Slaughterbeck¹, Michael W. Smith¹, Andy J. Sodt¹, Derric Williams¹, Lilla Zöllei³, Bruce Fischl⁵, Mark Gerstein⁴, Daniel H. Geschwind⁶, Ian A. Glass², Michael Hawrylycz¹, Robert F. Hevner², Hao Huang⁷, Allan R. Jones¹, James A. Knowles⁸, Pat Levitt⁸, John W. Phillips¹, Nenad Sestan⁴, Paul Wohnoutka¹, Chinh Dang¹, Amy Bernard¹, John G. Hohmann¹, Ed S. Lein¹ - Show less +76 more•Institutions (8)

Allen Institute for Brain Science¹, University of Washington², Harvard University³, Yale University⁴, Massachusetts Institute of Technology⁵, University of California, Los Angeles⁶, University of Texas Southwestern Medical Center⁷, University of Southern California⁸

10 Apr 2014-Nature

TL;DR: An anatomically comprehensive atlas of the mid-gestational human brain is described, including de novo reference atlases, in situ hybridization, ultra-high-resolution magnetic resonance imaging (MRI) and microarray analysis on highly discrete laser-microdissected brain regions.

...read moreread less

Abstract: The anatomical and functional architecture of the human brain is mainly determined by prenatal transcriptional processes. We describe an anatomically comprehensive atlas of the mid-gestational human brain, including de novo reference atlases, in situ hybridization, ultra-high-resolution magnetic resonance imaging (MRI) and microarray analysis on highly discrete laser-microdissected brain regions. In developing cerebral cortex, transcriptional differences are found between different proliferative and post-mitotic layers, wherein laminar signatures reflect cellular composition and developmental processes. Cytoarchitectural differences between human and mouse have molecular correlates, including species differences in gene expression in subplate, although surprisingly we find minimal differences between the inner and outer subventricular zones even though the outer zone is expanded in humans. Both germinal and post-mitotic cortical layers exhibit fronto-temporal gradients, with particular enrichment in the frontal lobe. Finally, many neurodevelopmental disorder and human-evolution-related genes show patterned expression, potentially underlying unique features of human cortical formation. These data provide a rich, freely-accessible resource for understanding human brain development.

...read moreread less

1,114 citations

Journal Article•DOI•

Defining functional DNA elements in the human genome

[...]

Manolis Kellis¹, Barbara J. Wold², Michael Snyder³, Bradley E. Bernstein⁴, Anshul Kundaje⁵, Georgi K. Marinov², Lucas D. Ward⁵, Ewan Birney, Gregory E. Crawford⁶, Job Dekker⁷, Ian Dunham, Laura Elnitski⁸, Peggy J. Farnham⁹, Elise A. Feingold⁸, Mark Gerstein¹⁰, Morgan C. Giddings, David M. Gilbert¹¹, Thomas R. Gingeras¹², Eric D. Green⁸, Roderic Guigó, Tim Hubbard¹³, Jim Kent¹⁴, Jason D. Lieb¹⁵, Richard M. Myers, Michael J. Pazin⁸, Bing Ren¹⁶, John A. Stamatoyannopoulos¹⁷, Zhiping Weng⁷, Kevin P. White¹⁸, Ross C. Hardison¹⁹ - Show less +26 more•Institutions (19)

Massachusetts Institute of Technology¹, California Institute of Technology², Stanford University³, Harvard University⁴, Broad Institute⁵, Duke University⁶, University of Massachusetts Medical School⁷, National Institutes of Health⁸, University of Southern California⁹, Yale University¹⁰, Florida State University¹¹, Cold Spring Harbor Laboratory¹², Wellcome Trust Sanger Institute¹³, University of California, Santa Cruz¹⁴, Princeton University¹⁵, University of California, San Diego¹⁶, University of Washington¹⁷, University of Chicago¹⁸, Pennsylvania State University¹⁹

29 Apr 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies are reviewed.

...read moreread less

Abstract: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

...read moreread less

691 citations

Journal Article•DOI•

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer.

[...]

Yao Fu¹, Zhu Liu², Shaoke Lou³, Jason Bedford¹, Xinmeng Jasmine Mu⁴, Xinmeng Jasmine Mu¹, Kevin Y. Yip³, Ekta Khurana¹, Ekta Khurana⁵, Mark Gerstein¹ - Show less +6 more•Institutions (5)

Yale University¹, Fudan University², The Chinese University of Hong Kong³, Broad Institute⁴, Cornell University⁵

02 Oct 2014-Genome Biology

TL;DR: A computational framework to annotate and prioritize noncoding drivers from thousands of somatic alterations in a typical tumor, FunSeq2, which combines an adjustable data context integrating large-scale genomics and cancer resources with a streamlined variant-prioritization pipeline.

...read moreread less

Abstract: Identification of noncoding drivers from thousands of somatic alterations in a typical tumor is a difficult and unsolved problem. We report a computational framework, FunSeq2, to annotate and prioritize these mutations. The framework combines an adjustable data context integrating large-scale genomics and cancer resources with a streamlined variant-prioritization pipeline. The pipeline has a weighted scoring system combining: inter- and intra-species conservation; loss- and gain-of-function events for transcription-factor binding; enhancer-gene linkages and network centrality; and per-element recurrence across samples. We further highlight putative drivers with information specific to a particular sample, such as differential expression. FunSeq2 is available from funseq2.gersteinlab.org.

...read moreread less

314 citations

Journal Article•DOI•

Comparative analysis of the transcriptome across distant species

[...]

Mark Gerstein¹, Joel Rozowsky¹, Koon-Kiu Yan¹, Daifeng Wang¹, Chao Cheng², James B. Brown³, James B. Brown⁴, Carrie A. Davis⁵, LaDeana W. Hillier⁶, Cristina Sisu¹, Jingyi Jessica Li⁷, Jingyi Jessica Li⁴, Baikang Pei¹, Arif Harmanci¹, Michael O. Duff⁸, Sarah Djebali⁹, Roger P. Alexander¹, Burak H. Alver¹⁰, Raymond K. Auerbach¹, Kimberly Bell⁵, Peter J. Bickel⁴, Max E. Boeck⁶, Nathan Boley⁴, Nathan Boley³, Benjamin W. Booth³, Lucy Cherbas¹¹, Peter Cherbas¹¹, Chao Di¹², Alexander Dobin⁵, Jorg Drenkow⁵, Brent Ewing⁶, Gang Fang¹, Megan Fastuca⁵, Elise A. Feingold¹³, Adam Frankish¹⁴, Guanjun Gao¹², Peter J. Good¹³, Roderic Guigó⁹, Ann S. Hammonds³, Jen Harrow¹⁴, Roger A. Hoskins³, Cédric Howald¹⁵, Cédric Howald¹⁶, Long Hu¹², Haiyan Huang⁴, Tim Hubbard¹⁴, Tim Hubbard¹⁷, Chau Huynh⁶, Sonali Jha⁵, Dionna M. Kasper¹, Masaomi Kato¹, Thomas C. Kaufman¹¹, Robert R. Kitchen¹, Erik Ladewig¹⁸, Julien Lagarde⁹, Eric C. Lai¹⁸, Jing Leng¹, Zhi Lu¹², Michael J. MacCoss⁶, Gemma E. May⁸, Gemma E. May¹⁹, Rebecca McWhirter²⁰, Gennifer E. Merrihew⁶, David M. Miller²⁰, Ali Mortazavi²¹, Rabi Murad²¹, Brian Oliver¹³, Sara Olson⁸, Peter J. Park¹⁰, Michael J. Pazin¹³, Norbert Perrimon¹⁰, Norbert Perrimon²², Dmitri D. Pervouchine⁹, Valerie Reinke¹, Alexandre Reymond¹⁶, Garrett Robinson⁴, Anastasia Samsonova²², Anastasia Samsonova¹⁰, Gary Saunders¹⁴, Gary Saunders²³, Felix Schlesinger⁵, Anurag Sethi¹, Frank J. Slack¹, William C. Spencer²⁰, Marcus H. Stoiber⁴, Marcus H. Stoiber³, Pnina Strasbourger⁶, Andrea Tanzer²⁴, Andrea Tanzer⁹, Owen Thompson⁶, Kenneth H. Wan³, Guilin Wang¹, Huaien Wang⁵, Kathie L. Watkins²⁰, Jiayu Wen¹⁸, Kejia Wen¹², Chenghai Xue⁵, Li Yang⁸, Li Yang²⁵, Kevin Y. Yip²⁶, Chris Zaleski⁵, Yan Zhang¹, Henry Zheng¹, Steven E. Brenner⁴, Brenton R. Graveley⁸, Susan E. Celniker³, Thomas R. Gingeras⁵, Robert H. Waterston⁶ - Show less +104 more•Institutions (26)

28 Aug 2014-Nature

TL;DR: It is found in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a ‘universal model’ based on a single set of organism-independent parameters.

...read moreread less

Abstract: The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.

...read moreread less

284 citations

Journal Article•DOI•

Characterization of stress-responsive lncRNAs in Arabidopsis thaliana by integrating expression, epigenetic and structural features.

[...]

Chao Di¹, Jiapei Yuan¹, Yue Wu¹, Jingrui Li¹, Huixin Lin², Long Hu¹, Ting Zhang², Yijun Qi¹, Mark Gerstein³, Yan Guo², Zhi John Lu¹ - Show less +7 more•Institutions (3)

Tsinghua University¹, University of Minnesota², Yale University³

01 Dec 2014-Plant Journal

TL;DR: It is found that poly(A)- lncRNAs tend to have shorter transcripts and lower expression levels, and they show significant expression specificity in response to stresses, and their differential expression is significantly enriched in drought condition and depleted in heat condition.

...read moreread less

Abstract: Summary Recently, in addition to poly(A)+ long non-coding RNAs (lncRNAs), many lncRNAs without poly(A) tails, have been characterized in mammals. However, the non-polyA lncRNAs and their conserved motifs, especially those associated with environmental stresses, have not been fully investigated in plant genomes. We performed poly(A)− RNA-seq for seedlings of Arabidopsis thaliana under four stress conditions, and predicted lncRNA transcripts. We classified the lncRNAs into three confidence levels according to their expression patterns, epigenetic signatures and RNA secondary structures. Then, we further classified the lncRNAs to poly(A)+ and poly(A)− transcripts. Compared with poly(A)+ lncRNAs and coding genes, we found that poly(A)− lncRNAs tend to have shorter transcripts and lower expression levels, and they show significant expression specificity in response to stresses. In addition, their differential expression is significantly enriched in drought condition and depleted in heat condition. Overall, we identified 245 poly(A)+ and 58 poly(A)− lncRNAs that are differentially expressed under various stress stimuli. The differential expression was validated by qRT-PCR, and the signaling pathways involved were supported by specific binding of transcription factors (TFs), phytochrome-interacting factor 4 (PIF4) and PIF5. Moreover, we found many conserved sequence and structural motifs of lncRNAs from different functional groups (e.g. a UUC motif responding to salt and a AU-rich stem-loop responding to cold), indicated that the conserved elements might be responsible for the stress-responsive functions of lncRNAs.

...read moreread less

243 citations

Journal Article•DOI•

Comparative analysis of regulatory information and circuits across distant species

[...]

Alan P. Boyle¹, Carlos L. Araya¹, Cathleen M. Brdlik¹, Philip Cayting¹, Chao Cheng², Yong Cheng¹, Kathryn E. Gardner², LaDeana W. Hillier³, J. Janette², Lixia Jiang¹, Dionna M. Kasper², Trupti Kawli¹, Pouya Kheradpour⁴, Anshul Kundaje⁴, Anshul Kundaje¹, Jingyi Jessica Li⁵, Jingyi Jessica Li⁶, Lijia Ma³, Wei Niu², E. Jay Rehm², Joel Rozowsky⁷, Matthew Slattery², Rebecca Spokony⁷, Robert Terrell⁷, D. Vafeados³, Daifeng Wang², Peter Weisdepp³, Yi-Chieh Wu⁴, Dan Xie¹, Koon-Kiu Yan², Elise A. Feingold⁸, Peter J. Good⁸, Michael J. Pazin⁸, Haiyan Huang⁵, Peter J. Bickel⁵, Steven E. Brenner⁵, Valerie Reinke², Robert H. Waterston³, Mark Gerstein², Kevin P. White⁷, Manolis Kellis⁴, Michael Snyder¹ - Show less +38 more•Institutions (8)

Stanford University¹, Yale University², University of Washington³, Massachusetts Institute of Technology⁴, University of California, Berkeley⁵, University of California, Los Angeles⁶, University of Chicago⁷, National Institutes of Health⁸

28 Aug 2014-Nature

TL;DR: The results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections.

...read moreread less

Abstract: Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

...read moreread less

184 citations

Comparative analysis of regulatory information and circuits across distant species

[...]

01 Aug 2014

TL;DR: In this article, the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors were mapped for a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time.

...read moreread less

167 citations

Journal Article•DOI•

Identification of a major determinant for serine-threonine kinase phosphoacceptor specificity.

[...]

Catherine Chen¹, Byung Hak Ha¹, Anastasia F. Thévenin¹, Hua Jane Lou¹, Rong Zhang¹, Kevin Y. Yip¹, Kevin Y. Yip², Jeffrey R. Peterson³, Mark Gerstein¹, Philip M. Kim⁴, Panagis Filippakopoulos⁵, Panagis Filippakopoulos⁶, Stefan Knapp⁶, Titus J. Boggon¹, Benjamin E. Turk¹ - Show less +11 more•Institutions (6)

Yale University¹, The Chinese University of Hong Kong², Fox Chase Cancer Center³, University of Toronto⁴, Ludwig Institute for Cancer Research⁵, Structural Genomics Consortium⁶

09 Jan 2014-Molecular Cell

TL;DR: It is shown that a residue located in the kinase activation segment, which is termed the “DFG+1” residue, acts as a major determinant for serine-threonine phosphorylation site specificity.

...read moreread less

96 citations

Journal Article•DOI•

Comparative analysis of pseudogenes across three phyla

[...]

Cristina Sisu¹, Baikang Pei, Jing Leng, Adam Frankish², Yan Zhang, Suganthi Balasubramanian¹, Rachel A. Harte³, Daifeng Wang, Michael Rutenberg-Schoenberg, Wyatt T. Clark, Mark Diekhans³, Joel Rozowsky¹, Tim Hubbard², Jennifer Harrow², Mark Gerstein¹ - Show less +11 more•Institutions (3)

Yale University¹, Wellcome Trust Sanger Institute², University of California, Santa Cruz³

16 Sep 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Overall, a broad spectrum of biochemical activity for pseudogenes is identified, with the majority in each organism exhibiting varying degrees of partial activity, suggesting a uniform degradation process.

...read moreread less

Abstract: Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism’s genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

...read moreread less

66 citations

Journal Article•DOI•

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework

[...]

Arif Harmanci¹, Joel Rozowsky¹, Mark Gerstein¹•Institutions (1)

Yale University¹

08 Oct 2014-Genome Biology

TL;DR: Analysis of RNA polymerase II data reveals a clear distinction between the stalled and elongating forms of the polymerase, which is useful given the wide range of scales probed in ChIP-Seq assays.

...read moreread less

Abstract: We present MUSIC, a signal processing approach for identification of enriched regions in ChIP-Seq data, available at http://www.music.gersteinlab.org . MUSIC first filters the ChIP-Seq read-depth signal for systematic noise from non-uniform mappability, which fragments enriched regions. Then it performs a multiscale decomposition, using median filtering, identifying enriched regions at multiple length scales. This is useful given the wide range of scales probed in ChIP-Seq assays. MUSIC performs favorably in terms of accuracy and reproducibility compared with other methods. In particular, analysis of RNA polymerase II data reveals a clear distinction between the stalled and elongating forms of the polymerase.

...read moreread less

Journal Article•DOI•

Decoding neuroproteomics: integrating the genome, translatome and functional anatomy

[...]

Robert R. Kitchen¹, Joel Rozowsky¹, Mark Gerstein¹, Angus C. Nairn¹•Institutions (1)

Yale University¹

01 Nov 2014-Nature Neuroscience

TL;DR: Proteomics should be exploited to enhance high-throughput functional genomic analysis by tighter integration of data analyses and experimental strategies to achieve finer cellular and subcellular resolution in transcriptomic and proteomic studies of neural tissues are discussed.

...read moreread less

Abstract: The immense intercellular and intracellular heterogeneity of the CNS presents major challenges for high-throughput omic analyses. Transcriptional, translational and post-translational regulatory events are localized to specific neuronal cell types or subcellular compartments, resulting in discrete patterns of protein expression and activity. A spatial and quantitative knowledge of the neuroproteome is therefore critical to understanding both normal and pathological aspects of the functional genomics and anatomy of the CNS. Improvements in mass spectrometry allow the profiling of proteins at a sufficient depth to complement results from high-throughput genomic and transcriptomic assays. However, there are challenges in integrating proteomic data with other data modalities and even greater challenges in obtaining comprehensive neuroproteomic data with cell-type specificity. Here we discuss how proteomics should be exploited to enhance high-throughput functional genomic analysis by tighter integration of data analyses. We also discuss experimental strategies to achieve finer cellular and subcellular resolution in transcriptomic and proteomic studies of neural tissues.

...read moreread less

Journal Article•DOI•

OrthoClust: an orthology-based network framework for clustering data across multiple species

[...]

Koon-Kiu Yan¹, Daifeng Wang¹, Joel Rozowsky¹, Henry Zheng¹, Chao Cheng², Mark Gerstein¹ - Show less +2 more•Institutions (2)

Yale University¹, Dartmouth College²

28 Aug 2014-Genome Biology

TL;DR: OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species and outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific.

...read moreread less

Abstract: Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.

...read moreread less

Journal Article•DOI•

Cellular Superspreaders: An Epidemiological Perspective on HIV Infection inside the Body

[...]

Kristina Talbert-Slagle¹, Katherine E. Atkins¹, Koon-Kiu Yan¹, Ekta Khurana¹, Mark Gerstein¹, Elizabeth H. Bradley¹, David N. Berg¹, Alison P. Galvani¹, Jeffrey P. Townsend¹ - Show less +5 more•Institutions (1)

Yale University¹

08 May 2014-PLOS Pathogens

TL;DR: Using an epidemiological framework, it is suggested that heterogeneity among CD4+ T cells in the genital mucosa could help explain the low infection-to-exposure ratio and selection of the founder strain after sexual exposure to HIV.

...read moreread less

Abstract: Worldwide, more than 250 people become infected with HIV every hour [1], yet an individual's chance of becoming infected after a single sexual exposure, the predominant mode of HIV transmission, is often lower than one in 100 [2]. When sexually transmitted HIV-1 infection does occur, it is usually initiated by a single virus, called the founder strain, despite the presence of thousands of genetically diverse viral strains in the transmitting partner [3]. Here we review evidence from molecular biology and virology suggesting that heterogeneity among CD4+ T cells could yield wide variation in the capability of individual cells to become infected and transmit HIV to other cells. Using an epidemiological framework, we suggest that such heterogeneity among CD4+ T cells in the genital mucosa could help explain the low infection-to-exposure ratio and selection of the founder strain after sexual exposure to HIV. During sexual transmission, founder viral strains preferentially infect CD4+ T cells using the CCR5 coreceptor [4], [5]. At the time of initial exposure to HIV, these CD4+ T cells exhibit baseline heterogeneity due to stochasticity in cellular gene expression [6] and dynamic variation in immunological status (activated, resting, etc.) [7]. In addition, because CD4+ T cells are mobile, they are heterogeneously distributed in the genital mucosa, with varying degrees of clustering and contact [8]–[11]. In other contexts, it is well-known that heterogeneity among isogeneic cells inside the body can affect many cellular behaviors and outcomes, including infection dynamics [12], [13]. Epidemiological analyses of disease outbreaks among people indicate that heterogeneity in the ability of individuals in a population to spread disease can have a significant impact on whether a local outbreak becomes an epidemic [14]. Heterogeneity among a population of CD4+ T cells may play a similarly critical role in the establishment and spread of HIV in the genital mucosa after sexual exposure.

...read moreread less

Journal Article•DOI•

Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease

[...]

Manolis Kellis¹, Manolis Kellis², Barbara J. Wold³, Michael Snyder⁴, Bradley E. Bernstein⁵, Anshul Kundaje¹, Anshul Kundaje⁴, Anshul Kundaje², Georgi K. Marinov³, Lucas D. Ward², Lucas D. Ward¹, Ewan Birney, Gregory E. Crawford⁶, Job Dekker⁷, Ian Dunham, Laura Elnitski⁸, Peggy J. Farnham⁹, Elise A. Feingold⁸, Mark Gerstein¹⁰, Morgan C. Giddings, David M. Gilbert¹¹, Thomas R. Gingeras¹², Eric D. Green⁸, Roderic Guigó, Tim Hubbard¹³, Jim Kent¹⁴, Jason D. Lieb¹⁵, Richard M. Myers, Michael J. Pazin⁸, Bing Ren¹⁶, John A. Stamatoyannopoulos¹⁷, Zhiping Weng⁷, Kevin P. White¹⁵, Ross C. Hardison¹⁸ - Show less +30 more•Institutions (18)

Massachusetts Institute of Technology¹, Broad Institute², California Institute of Technology³, Stanford University⁴, Harvard University⁵, Duke University⁶, University of Massachusetts Medical School⁷, National Institutes of Health⁸, University of Southern California⁹, Yale University¹⁰, Florida State University¹¹, Cold Spring Harbor Laboratory¹², Wellcome Trust Sanger Institute¹³, University of California, Santa Cruz¹⁴, University of Chicago¹⁵, University of California, San Diego¹⁶, University of Washington¹⁷, Pennsylvania State University¹⁸

19 Aug 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The Encyclopedia of DNA Elements (ENCODE) catalog and similar data resources are viewed as important foundations for understanding the DNA elements and molecular mechanisms underlying human biology and disease.

...read moreread less

Abstract: We agree with Brunet and Doolittle (1) on the utility of distinguishing the evolutionarily selected effects (SE) of some genomic elements from the causal roles (CR) of other elements that lack signatures of selection (1⇓⇓–4). DNA sequences identified by biochemical approaches include both SE and CR elements, and genetic variation in both has been implicated in human traits and disease susceptibility. We thus view the Encyclopedia of DNA Elements (ENCODE) catalog and similar data resources as important foundations for understanding the DNA elements and molecular mechanisms underlying human biology and disease.

...read moreread less

Posted Content•DOI•

Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes

[...]

Dmitri D. Pervouchine¹, Sarah Djebali, Alessandra Breschi, Carrie A. Davis², Pablo Prieto Barja, Alexander Dobin², Andrea Tanzer³, Julien Lagarde, Chris Zaleski², Lei Hoon See², Meagan Fastuca², Jorg Drenkow², Huaien Wang², Giovanni Bussotti, Baikang Pei⁴, Suganthi Balasubramanian⁴, Jean Monlong⁵, Arif Harmanci⁴, Mark Gerstein⁴, Michael A. Beer⁶, Cedric Notredame, Roderic Guigó, Thomas R. Gingeras² - Show less +19 more•Institutions (6)

Moscow State University¹, Cold Spring Harbor Laboratory², University of Vienna³, Yale University⁴, McGill University⁵, Johns Hopkins University⁶

30 Oct 2014-bioRxiv

TL;DR: This article characterized the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates, and revealed a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution.

...read moreread less

Abstract: We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but it is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles.

...read moreread less

Proceedings Article•DOI•

Proposed social and technological solutions to issues of data privacy in personal genomics

[...]

Dov Greenbaum, Arif Harmanci¹, Mark Gerstein¹•Institutions (1)

Yale University¹

23 May 2014

TL;DR: This paper proposes and outlines a licensing scheme, similar to those used by professional organizations, that not only enforce a code of conduct and punish those who fail to live up to that code, but also mandate required continuing education to limit the possibility that the code will be violated inadvertently.

...read moreread less

Abstract: The issues of privacy and disclosure are two sides of a weighty coin. Computational biologists and other scientists involved in genomic research need to be constantly cognizant of the push and pull of these two important concepts. Clinical genomics research in particular raises a number of particularly poignant concerns as society struggles between invasions of privacy such as recent efforts by the FBI and the NSA, and our own (surprisingly) personal disclosures on social media sites or via apathetic acquiescence to large data collection efforts. With regard to privacy there are numerous computational efforts that have heretofore offered to provide both the robustness of protection and the ease of use to be effective in manipulating the terabytes of data before the genomics researcher. Unfortunately algorithms alone have thus far failed to provide either the necessary strength to foil those intent on obtaining information or the promised agility to manipulate the vast datasets. While technical solutions advance, they cannot stand on their own and this paper proposes and outlines a licensing scheme, similar to those used by professional organizations, that not only enforce a code of conduct and punish those who fail to live up to that code, but also mandate required continuing education to limit the possibility that the code will be violated inadvertently. It is the use of the social and the technological advances together that will likely create not only an environment that fosters research and innovation, but also one that is responsive to privacy needs and norms.

...read moreread less

Journal Article•DOI•

MetaSV: an accurate method-aware merging algorithm for structural variations

[...]

Marghoob Mohiyuddin, John C. Mu, Jian Li, Narges Bani Asadi, Mark Gerstein, Alexej Abyzov, Wing Hung Wong, Hugo Y. K. Lam - Show less +4 more

25 Jul 2014-F1000Research