scispace - formally typeset
Search or ask a question

Showing papers by "Manolis Kellis published in 2015"


Journal ArticleDOI
Anshul Kundaje1, Wouter Meuleman1, Wouter Meuleman2, Jason Ernst3, Misha Bilenky4, Angela Yen1, Angela Yen2, Alireza Heravi-Moussavi4, Pouya Kheradpour1, Pouya Kheradpour2, Zhizhuo Zhang2, Zhizhuo Zhang1, Jianrong Wang2, Jianrong Wang1, Michael J. Ziller2, Viren Amin5, John W. Whitaker, Matthew D. Schultz6, Lucas D. Ward1, Lucas D. Ward2, Abhishek Sarkar1, Abhishek Sarkar2, Gerald Quon1, Gerald Quon2, Richard Sandstrom7, Matthew L. Eaton1, Matthew L. Eaton2, Yi-Chieh Wu2, Yi-Chieh Wu1, Andreas R. Pfenning2, Andreas R. Pfenning1, Xinchen Wang2, Xinchen Wang1, Melina Claussnitzer1, Melina Claussnitzer2, Yaping Liu1, Yaping Liu2, Cristian Coarfa5, R. Alan Harris5, Noam Shoresh2, Charles B. Epstein2, Elizabeta Gjoneska2, Elizabeta Gjoneska1, Danny Leung8, Wei Xie8, R. David Hawkins8, Ryan Lister6, Chibo Hong9, Philippe Gascard9, Andrew J. Mungall4, Richard A. Moore4, Eric Chuah4, Angela Tam4, Theresa K. Canfield7, R. Scott Hansen7, Rajinder Kaul7, Peter J. Sabo7, Mukul S. Bansal2, Mukul S. Bansal10, Mukul S. Bansal1, Annaick Carles4, Jesse R. Dixon8, Kai How Farh2, Soheil Feizi1, Soheil Feizi2, Rosa Karlic11, Ah Ram Kim1, Ah Ram Kim2, Ashwinikumar Kulkarni12, Daofeng Li13, Rebecca F. Lowdon13, Ginell Elliott13, Tim R. Mercer14, Shane Neph7, Vitor Onuchic5, Paz Polak2, Paz Polak15, Nisha Rajagopal8, Pradipta R. Ray12, Richard C Sallari2, Richard C Sallari1, Kyle Siebenthall7, Nicholas A Sinnott-Armstrong1, Nicholas A Sinnott-Armstrong2, Michael Stevens13, Robert E. Thurman7, Jie Wu16, Bo Zhang13, Xin Zhou13, Arthur E. Beaudet5, Laurie A. Boyer1, Philip L. De Jager15, Philip L. De Jager2, Peggy J. Farnham17, Susan J. Fisher9, David Haussler18, Steven J.M. Jones4, Steven J.M. Jones19, Wei Li5, Marco A. Marra4, Michael T. McManus9, Shamil R. Sunyaev15, Shamil R. Sunyaev2, James A. Thomson20, Thea D. Tlsty9, Li-Huei Tsai1, Li-Huei Tsai2, Wei Wang, Robert A. Waterland5, Michael Q. Zhang21, Lisa Helbling Chadwick22, Bradley E. Bernstein2, Bradley E. Bernstein15, Bradley E. Bernstein6, Joseph F. Costello9, Joseph R. Ecker11, Martin Hirst4, Alexander Meissner2, Aleksandar Milosavljevic5, Bing Ren8, John A. Stamatoyannopoulos7, Ting Wang13, Manolis Kellis2, Manolis Kellis1 
19 Feb 2015-Nature
TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

5,037 citations


Journal ArticleDOI
Kristin G. Ardlie, David S. DeLuca, Ayellet V. Segrè, Timothy J. Sullivan, Taylor Young, Ellen Gelfand, Casandra A. Trowbridge, Julian Maller, Taru Tukiainen, Monkol Lek, Lucas D. Ward, Pouya Kheradpour, Benjamin Iriarte, Yan Meng, Cameron D. Palmer, Tõnu Esko, Wendy Winckler, Joel N. Hirschhorn, Manolis Kellis, Daniel G. MacArthur, Gad Getz, Andrey A. Shabalin, Gen Li, Yi-Hui Zhou, Andrew B. Nobel, Ivan Rusyn, Fred A. Wright, Tuuli Lappalainen, Pedro G. Ferreira, Halit Ongen, Manuel A. Rivas, Alexis Battle, Sara Mostafavi, Jean Monlong, Michael Sammeth, Marta Melé, Ferran Reverter, Jakob M. Goldmann, Daphne Koller, Roderic Guigó, Mark I. McCarthy, Emmanouil T. Dermitzakis, Eric R. Gamazon, Hae Kyung Im, Anuar Konkashbaev, Dan L. Nicolae, Nancy J. Cox, Timothée Flutre, Xiaoquan Wen, Matthew Stephens, Jonathan K. Pritchard, Zhidong Tu, Bin Zhang, Tao Huang, Quan Long, Luan Lin, Jialiang Yang, Jun Zhu, Jun Liu, Amanda Brown, Bernadette Mestichelli, Denee Tidwell, Edmund Lo, Mike Salvatore, Saboor Shad, Jeffrey A. Thomas, John T. Lonsdale, Michael T. Moser, Bryan Gillard, Ellen Karasik, Kimberly Ramsey, Christopher Choi, Barbara A. Foster, John Syron, Johnell Fleming, Harold Magazine, Rick Hasz, Gary Walters, Jason Bridge, Mark Miklos, Susan L. Sullivan, Laura Barker, Heather M. Traino, Maghboeba Mosavel, Laura A. Siminoff, Dana R. Valley, Daniel C. Rohrer, Scott D. Jewell, Philip A. Branton, Leslie H. Sobin, Mary Barcus, Liqun Qi, Jeffrey McLean, Pushpa Hariharan, Ki Sung Um, Shenpei Wu, David Tabor, Charles Shive, Anna M. Smith, Stephen A. Buia, Anita H. Undale, Karna Robinson, Nancy Roche, Kimberly M. Valentino, Angela Britton, Robin Burges, Debra Bradbury, Kenneth W. Hambright, John Seleski, Greg E. Korzeniewski, Kenyon Erickson, Yvonne Marcus, Jorge Tejada, Mehran Taherian, Chunrong Lu, Margaret J. Basile, Deborah C. Mash, Simona Volpi, Jeffery P. Struewing, Gary F. Temple, Joy T. Boyer, Deborah Colantuoni, Roger Little, Susan E. Koester, Latarsha J. Carithers, Helen M. Moore, Ping Guan, Carolyn C. Compton, Sherilyn Sawyer, Joanne P. Demchok, Jimmie B. Vaught, Chana A. Rabiner, Nicole C. Lockhart 
08 May 2015-Science
TL;DR: The landscape of gene expression across tissues is described, thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants are cataloged, complex network relationships are described, and signals from genome-wide association studies explained by eQTLs are identified.
Abstract: Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysi...

4,418 citations


01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations


Journal ArticleDOI
TL;DR: The data indicate that the FTO allele associated with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in a tissue-autonomous manner, and points to a pathway for adipocyte thermogenesis regulation involving ARID5B, rs1421085, IRX3, and IRX5, which, when manipulated, had pronounced pro-obesity and anti-ob obesity effects.
Abstract: BackgroundGenomewide association studies can be used to identify disease-relevant genomic regions, but interpretation of the data is challenging The FTO region harbors the strongest genetic association with obesity, yet the mechanistic basis of this association remains elusive MethodsWe examined epigenomic data, allelic activity, motif conservation, regulator expression, and gene coexpression patterns, with the aim of dissecting the regulatory circuitry and mechanistic basis of the association between the FTO region and obesity We validated our predictions with the use of directed perturbations in samples from patients and from mice and with endogenous CRISPR–Cas9 genome editing in samples from patients ResultsOur data indicate that the FTO allele associated with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in a tissue-autonomous manner The rs1421085 T-to-C single-nucleotide variant disrupts a conserved motif for the ARID5B repressor, which leads to derepression of a pot

1,097 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach was used to define credible sets for the T1D-associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34(+) stem cells.
Abstract: Genetic studies of type 1 diabetes (T1D) have identified 50 susceptibility regions, finding major pathways contributing to risk, with some loci shared across immune disorders. To make genetic comparisons across autoimmune disorders as informative as possible, a dense genotyping array, the Immunochip, was developed, from which we identified four new T1D-associated regions (P < 5 × 10(-8)). A comparative analysis with 15 immune diseases showed that T1D is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new T1D risk loci. Using a Bayesian approach, we defined credible sets for the T1D-associated SNPs. The associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34(+) stem cells. Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.

562 citations


Journal ArticleDOI
19 Feb 2015-Nature
TL;DR: This work profiles transcriptional and chromatin state dynamics across early and late pathology in the hippocampus of an inducible mouse model of AD-like neurodegeneration and establishes the mouse as a useful model for functional studies of AD regulatory regions.
Abstract: Analysis of transcriptional and epigenomic changes in the hippocampus of a mouse model of Alzheimer’s disease shows that immune function genes and regulatory regions are upregulated, whereas genes and regulatory regions involved in synaptic plasticity, learning and memory are downregulated; genetic variants associated with Alzheimer’s disease are only enriched in orthologues of upregulated immune regions, suggesting that dysregulation of immune processes may underlie Alzheimer’s disease predisposition. Recent genome-wide association studies have shown substantial genetic variation in non-coding regions associated with Alzheimer's disease, suggesting the involvement of aberrant gene regulation. However, the functional significance of these variants remained unclear. By profiling transcriptional and chromatin state dynamics in a mouse model, Elizabeta Gjoneska and colleagues now show that the immune response genes and their regulatory regions are upregulated, whereas those involved in synaptic plasticity and learning and memory are downregulated. These changes are highly conserved between the mouse model and the human disease. Surprisingly, Alzheimer's disease-associated genetic variants are mainly enriched in higher-activity, immune-related enhancers, and are depleted in lower-activity, neural enhancers. This suggests that genetic predisposition to Alzheimer's may be primarily associated with immune functions, while neuronal plasticity may be affected primarily by non-genetic effects. Alzheimer’s disease (AD) is a severe1 age-related neurodegenerative disorder characterized by accumulation of amyloid-β plaques and neurofibrillary tangles, synaptic and neuronal loss, and cognitive decline. Several genes have been implicated in AD, but chromatin state alterations during neurodegeneration remain uncharacterized. Here we profile transcriptional and chromatin state dynamics across early and late pathology in the hippocampus of an inducible mouse model of AD-like neurodegeneration. We find a coordinated downregulation of synaptic plasticity genes and regulatory regions, and upregulation of immune response genes and regulatory regions, which are targeted by factors that belong to the ETS family of transcriptional regulators, including PU.1. Human regions orthologous to increasing-level enhancers show immune-cell-specific enhancer signatures as well as immune cell expression quantitative trait loci, while decreasing-level enhancer orthologues show fetal-brain-specific enhancer activity. Notably, AD-associated genetic variants are specifically enriched in increasing-level enhancer orthologues, implicating immune processes in AD predisposition. Indeed, increasing enhancers overlap known AD loci lacking protein-altering variants, and implicate additional loci that do not reach genome-wide significance. Our results reveal new insights into the mechanisms of neurodegeneration and establish the mouse as a useful model for functional studies of AD regulatory regions.

509 citations


Journal ArticleDOI
18 Jun 2015-Cell
TL;DR: It is reported that neuronal activity stimulation triggers the formation of DNA double strand breaks (DSBs) in the promoters of a subset of early-response genes, including Fos, Npas4, and Egr1.

499 citations


Journal ArticleDOI
Daniel E. Neafsey1, Robert M. Waterhouse, Mohammad Reza Abai2, Sergey Aganezov3, Max A. Alekseyev3, James E. Allen4, James Amon, Bruno Arcà5, Peter Arensburger6, Gleb N. Artemov7, Lauren A. Assour8, Hamidreza Basseri2, Aaron M. Berlin1, Bruce W. Birren1, Stéphanie Blandin9, Stéphanie Blandin10, Andrew I. Brockman11, Thomas R. Burkot12, Austin Burt11, Clara S. Chan13, Cedric Chauve14, Joanna C. Chiu15, Mikkel B. Christensen4, Carlo Costantini16, Victoria L.M. Davidson17, Elena Deligianni18, Tania Dottorini11, Vicky Dritsou19, Stacey Gabriel1, Wamdaogo M. Guelbeogo, Andrew Brantley Hall20, Mira V. Han21, Thaung Hlaing, Daniel S.T. Hughes4, Daniel S.T. Hughes22, Adam M. Jenkins23, Xiaofang Jiang20, Irwin Jungreis13, Evdoxia G. Kakani19, Evdoxia G. Kakani24, Maryam Kamali20, Petri Kemppainen25, Ryan C. Kennedy26, Ioannis K. Kirmitzoglou27, Ioannis K. Kirmitzoglou11, Lizette L. Koekemoer28, Njoroge Laban, Nicholas Langridge4, Mara K. N. Lawniczak11, Manolis Lirakis29, Neil F. Lobo8, Ernesto Lowy4, Robert M. MacCallum11, Chunhong Mao20, Gareth Maslen4, Charles Mbogo30, Jenny McCarthy6, Kristin Michel17, Sara N. Mitchell24, Wendy Moore31, Katherine A. Murphy15, Anastasia N. Naumenko20, Tony Nolan11, Eva Maria Novoa13, Samantha M. O’Loughlin11, Chioma Oringanje31, Mohammad Ali Oshaghi2, Nazzy Pakpour15, Philippos Aris Papathanos19, Philippos Aris Papathanos11, Ashley Peery20, Michael Povelones32, Anil Prakash33, David P. Price34, Ashok Rajaraman14, Lisa J. Reimer35, David C. Rinker36, Antonis Rokas37, Tanya L. Russell12, N’Fale Sagnon, Maria V. Sharakhova20, Terrance Shea1, Felipe A. Simão38, Felipe A. Simão39, Frédéric Simard16, Michel A. Slotman40, Pradya Somboon41, V. N. Stegniy7, Claudio J. Struchiner42, Claudio J. Struchiner43, Gregg W.C. Thomas44, Marta Tojo45, Pantelis Topalis18, Jose M. C. Tubio46, Maria F. Unger8, John Vontas29, Catherine Walton25, Craig S. Wilding47, Judith H. Willis48, Yi-Chieh Wu13, Yi-Chieh Wu49, Guiyun Yan50, Evgeny M. Zdobnov38, Evgeny M. Zdobnov39, Xiaofan Zhou37, Flaminia Catteruccia19, Flaminia Catteruccia24, George K. Christophides11, Frank H. Collins8, Robert S. Cornman48, Andrea Crisanti19, Andrea Crisanti11, Martin J. Donnelly46, Martin J. Donnelly35, Scott J. Emrich8, Michael C. Fontaine51, Michael C. Fontaine8, William M. Gelbart24, Matthew W. Hahn44, Immo A. Hansen34, Paul I. Howell52, Fotis C. Kafatos11, Manolis Kellis13, Daniel Lawson4, Christos Louis18, Shirley Luckhart15, Marc A. T. Muskavitch53, Marc A. T. Muskavitch23, José M. C. Ribeiro, Michael A. Riehle31, Igor V. Sharakhov20, Zhijian Tu20, Laurence J. Zwiebel37, Nora J. Besansky8 
Broad Institute1, Tehran University of Medical Sciences2, George Washington University3, European Bioinformatics Institute4, Sapienza University of Rome5, Temple University6, Tomsk State University7, University of Notre Dame8, French Institute of Health and Medical Research9, Centre national de la recherche scientifique10, Imperial College London11, James Cook University12, Massachusetts Institute of Technology13, Simon Fraser University14, University of California, Davis15, Institut de recherche pour le développement16, Kansas State University17, Foundation for Research & Technology – Hellas18, University of Perugia19, Virginia Tech20, University of Nevada, Las Vegas21, Baylor College of Medicine22, Boston College23, Harvard University24, University of Manchester25, University of California, San Francisco26, University of Cyprus27, National Health Laboratory Service28, University of Crete29, Kenya Medical Research Institute30, University of Arizona31, University of Pennsylvania32, Indian Council of Medical Research33, New Mexico State University34, Liverpool School of Tropical Medicine35, Vanderbilt University Medical Center36, Vanderbilt University37, Swiss Institute of Bioinformatics38, University of Geneva39, Texas A&M University40, Chiang Mai University41, Rio de Janeiro State University42, Oswaldo Cruz Foundation43, Indiana University44, University of Santiago de Compostela45, Wellcome Trust Sanger Institute46, Liverpool John Moores University47, University of Georgia48, Harvey Mudd College49, University of California, Irvine50, University of Groningen51, Centers for Disease Control and Prevention52, Biogen Idec53
02 Jan 2015-Science
TL;DR: The authors investigated the genomic basis of vectorial capacity and explore new avenues for vector control, sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila.
Abstract: Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts

476 citations



Journal ArticleDOI
TL;DR: In this paper, the authors show that BRCA1 is recruited to R-loops that form normally over a subset of transcription termination regions, where it mediates the recruitment of a specific, physiological binding partner, senataxin.

328 citations


Journal ArticleDOI
TL;DR: In this article, an ensemble of regression trees is used to estimate the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets.
Abstract: With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

01 Feb 2015
TL;DR: The imputed datasets provide the most comprehensive human regulatory region annotation to date, and the approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.
Abstract: With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

Journal ArticleDOI
Jialiang Yang1, Tao Huang1, Francesca Petralia1, Quan Long1, Bin Zhang1, Carmen Argmann1, Yong Zhao1, Charles V. Mobbs1, Eric E. Schadt1, Jun Zhu1, Zhidong Tu1, Kristin G. Ardlie2, David S. DeLuca2, Ayellet V. Segrè2, Timothy J. Sullivan2, Taylor Young2, Ellen Gelfand2, Casandra A. Trowbridge2, Julian Maller2, Taru Tukiainen2, Monkol Lek2, Lucas D. Ward3, Lucas D. Ward2, Pouya Kheradpour3, Pouya Kheradpour2, Benjamin Iriarte3, Yan Meng2, Cameron D. Palmer2, Wendy Winckler2, Joel N. Hirschhorn2, Manolis Kellis3, Manolis Kellis2, Daniel G. MacArthur2, Gad Getz2, Andrey A. Shablin4, Gen Li5, Yi-Hui Zhou6, Andrew B. Nobel5, Ivan Rusyn5, Ivan Rusyn7, Fred A. Wright6, Tuuli Lappalainen, Pedro G. Ferreira8, Pedro G. Ferreira9, Halit Ongen9, Halit Ongen8, Manuel A. Rivas10, Alexis Battle11, Alexis Battle12, Sara Mostafavi12, Jean Monlong13, Jean Monlong14, Michael Sammeth14, Marta Melé14, Marta Melé2, Ferran Reverter15, Jakob Goldman16, Daphne Koller12, Roderic Guigó14, Mark I. McCarthy10, Emmanouil T. Dermitzakis9, Emmanouil T. Dermitzakis8, Eric R. Gamazon17, Anuar Konkashbaev17, Dan L. Nicolae17, Nancy J. Cox17, Timothée Flutre18, Xiaoquan Wen19, Matthew Stephens17, Jonathan K. Pritchard20, Jonathan K. Pritchard12, Jonathan K. Pritchard17, Luan Lin1, Jun Liu2, Amanda M. V. Brown, Bernadette Mestichelli, Denee Tidwell, Edmund Lo, Mike Salvatore, Saboor Shad, Jeffrey A. Thomas, John T. Lonsdale, Christopher Choi21, Ellen Karasik21, Kimberly Ramsey21, Michael T. Moser21, Barbara A. Foster21, Bryan Gillard21, John Syron, Johnelle Fleming, Harold Magazine, Rick Hasz, Gary Walters, Jason Bridge, Mark Miklos, Susan L. Sullivan, Laura Barker4, Heather M. Traino4, Magboeba Mosavel4, Laura A. Siminoff22, Laura A. Siminoff4, Dana R. Valley23, Daniel C. Rohrer23, Scott Jewel23, Philip A. Branton24, Leslie H. Sobin, Liqun Qi, Pushpa Hariharan, Shenpei Wu, David Tabor, Charles Shive, Anna M. Smith, Stephen A. Buia, Anita H. Undale, Karna Robinson, Nancy Roche, Kimberly M. Valentino, Angela Britton, Robin Burges, Debra Bradbury, Kenneth W. Hambright, John Seleski, Greg E. Korzeniewski, Kenyon Erickson, Yvonne Marcus25, Jorge Tejada25, Mehran Taherian25, Chunrong Lu25, Barnaby E. Robles25, Margaret J. Basile25, Deborah C. Mash25, Simona Volpi24, Jeff Struewing24, Gary F. Temple24, Joy T. Boyer24, Deborah Colantuoni24, Roger Little24, Susan E. Koester24, Latarsha J. Carithers24, Helen M. Moore24, Ping Guan24, Carolyn C. Compton24, Sherilyn Sawyer24, Joanne P. Demchok24, Jimmie B. Vaught24, Chana A. Rabiner24, Nicole C. Lockhart24 
TL;DR: In this article, the aging gene expression signatures are very tissue specific and enrichment for some well-known aging components such as mitochondria biology is observed in many tissues, and different levels of cross-tissue synchronization of age-related gene expression changes are observed, and some essential tissues (e.g., heart and lung) show much stronger "co-aging" than other tissues based on principal component analysis.
Abstract: Aging is one of the most important biological processes and is a known risk factor for many age-related diseases in human. Studying age-related transcriptomic changes in tissues across the whole body can provide valuable information for a holistic understanding of this fundamental process. In this work, we catalogue age-related gene expression changes in nine tissues from nearly two hundred individuals collected by the Genotype-Tissue Expression (GTEx) project. In general, we find the aging gene expression signatures are very tissue specific. However, enrichment for some well-known aging components such as mitochondria biology is observed in many tissues. Different levels of cross-tissue synchronization of age-related gene expression changes are observed, and some essential tissues (e.g., heart and lung) show much stronger "co-aging" than other tissues based on a principal component analysis. The aging gene signatures and complex disease genes show a complex overlapping pattern and only in some cases, we see that they are significantly overlapped in the tissues affected by the corresponding diseases. In summary, our analyses provide novel insights to the co-regulation of age-related gene expression in multiple tissues; it also presents a tissue-specific view of the link between aging and age-related diseases.

Journal ArticleDOI
TL;DR: This work infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues, and shows that modules conserved across tissues are especially likely to have functions common to all tissues.
Abstract: To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.

Journal ArticleDOI
TL;DR: Computational modeling of DNA and RNA targets of regulatory proteins is improved by a deep-learning approach and shows good results in terms of uniformity, accuracy, and efficiency.
Abstract: Computational modeling of DNA and RNA targets of regulatory proteins is improved by a deep-learning approach.

01 Mar 2015
TL;DR: A comparative analysis with 15 immune diseases showed that T1D is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new T1d risk loci.
Abstract: Genetic studies of type 1 diabetes (T1D) have identified 50 susceptibility regions, finding major pathways contributing to risk, with some loci shared across immune disorders. To make genetic comparisons across autoimmune disorders as informative as possible, a dense genotyping array, the Immunochip, was developed, from which we identified four new T1D-associated regions (P < 5 × 10−8). A comparative analysis with 15 immune diseases showed that T1D is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new T1D risk loci. Using a Bayesian approach, we defined credible sets for the T1D-associated SNPs. The associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34+ stem cells. Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.

Journal ArticleDOI
TL;DR: It is concluded that intermediate DNA methylation is a conserved signature of gene regulation and exon usage, highlighting gene context-dependent functions.
Abstract: The role of intermediate methylation states in DNA is unclear. Here, to comprehensively identify regions of intermediate methylation and their quantitative relationship with gene activity, we apply integrative and comparative epigenomics to 25 human primary cell and tissue samples. We report 18,452 intermediate methylation regions located near 36% of genes and enriched at enhancers, exons and DNase I hypersensitivity sites. Intermediate methylation regions average 57% methylation, are predominantly allele-independent and are conserved across individuals and between mouse and human, suggesting a conserved function. These regions have an intermediate level of active chromatin marks and their associated genes have intermediate transcriptional activity. Exonic intermediate methylation correlates with exon inclusion at a level between that of fully methylated and unmethylated exons, highlighting gene context-dependent functions. We conclude that intermediate DNA methylation is a conserved signature of gene regulation and exon usage.

Journal ArticleDOI
TL;DR: ChromDiff is presented, a group-wise chromatin state comparison method that generates an information-theoretic representation of epigenomes and corrects for external covariate factors to better isolate relevant Chromatin state changes.
Abstract: Epigenomic data sets provide critical information about the dynamic role of chromatin states in gene regulation, but a key question of how chromatin state segmentations vary under different conditions across the genome has remained unaddressed. Here we present ChromDiff, a group-wise chromatin state comparison method that generates an information-theoretic representation of epigenomes and corrects for external covariate factors to better isolate relevant chromatin state changes. By applying ChromDiff to the 127 epigenomes from the Roadmap Epigenomics and ENCODE projects, we provide novel group-wise comparative analyses across sex, tissue type, state and developmental age. Remarkably, we find that distinct sets of epigenomic features are maximally discriminative for different group-wise comparisons, in each case revealing distinct enriched pathways, many of which do not show gene expression differences. Our methodology should be broadly applicable for epigenomic comparisons and provides a powerful new tool for studying chromatin state differences at the genome scale.

Journal ArticleDOI
TL;DR: A new and highly effective method for gene tree error correction in the presence of horizontal gene transfer is introduced and it is shown that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that this method dramatically improves gene tree accuracy.
Abstract: Motivation: The accurate inference of gene trees is a necessary step in many evolutionary studies. While the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem. Results: In this work, we introduce a new and highly effective method for gene tree error-correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications, and losses, and uses a statistical hypothesis testing framework (Shimodaira-Hasegawa test) to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses. Availability: An implementation of our method is available at


Journal ArticleDOI
TL;DR: The role of DNA methylation in Alzheimer's disease is explored and causal models to assess its role in the pathology of AD are examined.
Abstract: Objective: We explore the role of DNA methylation in Alzheimer’s disease (AD). To elucidate where DNA methylation falls along the causal pathway linking risk factors to disease, we examine causal models to assess its role in the pathology of AD. Methods: DNA methylation profiles were generated in 740 brain samples using the Illumina HumanMet450K beadset. We focused our analysis on CpG sites from 11 AD susceptibility gene regions. The primary outcome was a quantitative measure of neuritic amyloid plaque (NP), a key early element of AD pathology. We tested four causal models: (1) independent associations, (2) CpG mediating the association of a variant, (3) reverse causality, and (4) genetic variant by CpG interaction. Results: Six genes regions (17 CpGs) showed evidence of CpG associations with NP, independent of genetic variation – BIN1 (5), CLU (5), MS4A6A (3), ABCA7 (2), CD2AP (1), and APOE (1). Together they explained 16.8% of the variability in NP. An interaction effect was seen in the CR1 region for two CpGs, cg10021878 (P = 0.01) and cg05922028 (P = 0.001), in relation to NP. In both cases, subjects with the risk allele rs6656401 AT/AA display more methylation being associated with more NP burden, whereas subjects with the rs6656401 TT protective genotype have an inverse association with more methylation being associated with less NP. Interpretation: These observations suggest that, within known AD susceptibility loci, methylation is related to pathologic processes of AD and may play a largely independent role by influencing gene expression in AD susceptibility loci.

01 Apr 2015
TL;DR: In this article, the role of DNA methylation in Alzheimer's disease (AD) was explored, and the authors examined causal models to assess its role in the pathology of AD using 740 brain samples using the Illumina HumanMet450K beadset.
Abstract: Objective: We explore the role of DNA methylation in Alzheimer’s disease (AD). To elucidate where DNA methylation falls along the causal pathway linking risk factors to disease, we examine causal models to assess its role in the pathology of AD. Methods: DNA methylation profiles were generated in 740 brain samples using the Illumina HumanMet450K beadset. We focused our analysis on CpG sites from 11 AD susceptibility gene regions. The primary outcome was a quantitative measure of neuritic amyloid plaque (NP), a key early element of AD pathology. We tested four causal models: (1) independent associations, (2) CpG mediating the association of a variant, (3) reverse causality, and (4) genetic variant by CpG interaction. Results: Six genes regions (17 CpGs) showed evidence of CpG associations with NP, independent of genetic variation – BIN1 (5), CLU (5), MS4A6A (3), ABCA7 (2), CD2AP (1), and APOE (1). Together they explained 16.8% of the variability in NP. An interaction effect was seen in the CR1 region for two CpGs, cg10021878 (P = 0.01) and cg05922028 (P = 0.001), in relation to NP. In both cases, subjects with the risk allele rs6656401 AT/AA display more methylation being associated with more NP burden, whereas subjects with the rs6656401 TT protective genotype have an inverse association with more methylation being associated with less NP. Interpretation: These observations suggest that, within known AD susceptibility loci, methylation is related to pathologic processes of AD and may play a largely independent role by influencing gene expression in AD susceptibility loci.

01 Aug 2015
TL;DR: It is shown that PRC2 is required to maintain expression of maternal microRNAs and long non-coding RNAs from the Gtl2-Rian-Mirg locus, which is essential for full pluripotency of iPSCs.
Abstract: Polycomb Repressive Complex 2 (PRC2) function and DNA methylation (DNAme) are typically correlated with gene repression. Here, we show that PRC2 is required to maintain expression of maternal microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) from the Gtl2-Rian-Mirg locus, which is essential for full pluripotency of iPSCs. In the absence of PRC2, the entire locus becomes transcriptionally repressed due to gain of DNAme at the intergenic differentially methylated regions (IG-DMRs). Furthermore, we demonstrate that the IG-DMR serves as an enhancer of the maternal Gtl2-Rian-Mirg locus. Further analysis reveals that PRC2 interacts physically with Dnmt3 methyltransferases and reduces recruitment to and subsequent DNAme at the IG-DMR, thereby allowing for proper expression of the maternal Gtl2-Rian-Mirg locus. Our observations are consistent with a mechanism through which PRC2 counteracts the action of Dnmt3 methyltransferases at an imprinted locus required for full pluripotency.

Journal ArticleDOI
TL;DR: A phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses.
Abstract: Background: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. Results: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. Conclusions: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.

Journal ArticleDOI
TL;DR: This work quantitatively examines TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs), and develops a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict theDNA-binding specificity of any TALE.
Abstract: Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to B5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.

Posted Content
TL;DR: This paper proposes a network alignment framework that uses an orthogonal relaxation of the underlying QAP in a maximum weight bipartite matching optimization, and generalizes the objective function of the network alignment problem to consider both matched and mismatched interactions in a standard QAP formulation.
Abstract: Network alignment refers to the problem of nding a bijective mapping across vertices of two graphs to maximize the number of overlapping edges and/or to minimize the number of mismatched interactions across networks. This problem arises in many elds such as computational biology, social sciences and computer vision and is often cast as an expensive quadratic assignment problem (QAP). Although spectral methods have received signicant attention in dierent network science problems such as network clustering, the use of spectral techniques in the network alignment problem has been limited partially owing to the lack of principled connections between spectral methods and relaxations of the network alignment optimization. In this paper, we propose a network alignment framework that uses an orthogonal relaxation of the underlying QAP in a maximum weight bipartite matching optimization. Our method takes into account the ellipsoidal level sets of the quadratic objective function by exploiting eigenvalues and eigenvectors of (transformations of) adjacency graphs. Our framework not only can be employed to provide a theoretical justication for existing heuristic spectral network alignment methods, but it also leads to a new scalable network alignment algorithm which outperforms existing ones over various synthetic and real networks. Moreover, we generalize the objective function of the network alignment problem to consider both matched and mismatched interactions in a standard QAP formulation. This can be critical in applications where networks have low similarity and therefore we expect more mismatches than matches. We assess the eectiveness of our proposed method theoretically for certain classes of networks, through simulations over various synthetic network models, and in two real-data applications; in comparative analysis of gene regulatory networks across human, y and worm, and in user de-anonymization over twitter follower subgraphs.

Posted Content
TL;DR: Questions about how to promote high-risk new research directions and broaden the reach of information theory, while continuing to be true to its ideals and insisting on the intellectual rigor that makes its breakthroughs so powerful are explored.
Abstract: Information theory is rapidly approaching its 70th birthday. What are promising future directions for research in information theory? Where will information theory be having the most impact in 10-20 years? What new and emerging areas are ripe for the most impact, of the sort that information theory has had on the telecommunications industry over the last 60 years? How should the IEEE Information Theory Society promote high-risk new research directions and broaden the reach of information theory, while continuing to be true to its ideals and insisting on the intellectual rigor that makes its breakthroughs so powerful? These are some of the questions that an ad hoc committee (composed of the present authors) explored over the past two years. We have discussed and debated these questions, and solicited detailed inputs from experts in fields including genomics, biology, economics, and neuroscience. This report is the result of these discussions.

Journal ArticleDOI
TL;DR: Six numbers pertaining to the performance of network deconvolution (ND) relative to other methods were incorrect and the corrected numbers show that the method performs better than had been reported.
Abstract: Nat. Biotechnol. 31, 726–733 (2013); published online 14 July 2013; corrected after print 7 April 2015 In the version of this article initially published, six numbers pertaining to the performance of network deconvolution (ND) relative to other methods were incorrect. The corrected numbers (see table below) show that our method performs better than had been reported.

Journal ArticleDOI
TL;DR: The 4th edition of the Joint RECOMB Conference on Systems Biology, Regulatory Genomics, and DREAM Challenges was held in Barcelona, Spain on October 14–19, 2011 and brought together computational and experimental scientists to discuss current research directions and latest findings, and to establish new collaborations towards a systems-level understanding of gene regulation and modeling of biological systems.
Abstract: Over the past 10 years, the study of cell regulatory processes and their integration within complex ‘‘systems-level’’ models of cell physiology and cell pathology has flourished, with geometric increase in scientific publications and impact on biology. Within the broad spectrum of molecular biology disciplines, systems biology and regulatory genomics are perhaps the ones that have been most characterized by the seamless and unique integration of computational and experimental sciences, allowing the rapid transformation of high-throughput data into complex computational models, of models into testable hypotheses, and finally of hypotheses into knowledge via experimental validation. Today, these disciplines are achieving maturity, as also demonstrated by the creation of several university departments, centers, and institutes dedicated to their study and by the popularity and growth of meeting such as the RECOMB Conference on Systems Biology, Regulatory Genomics, and DREAM Challenges. This event, which is currently in its fourth edition as a joint meeting, is particularly relevant as it combines unique computational and experimental perspectives, while also establishing a unique frame of reference, via the DREAM challenges, to objectively gauge the progress of our ability to dissect regulatory networks and to model biological processes. The 4th edition of the Joint RECOMB Conference on Systems Biology, Regulatory Genomics, and DREAM Challenges was held in Barcelona, Spain on October 14–19, 2011. The conference brought together computational and experimental scientists to discuss current research directions and latest findings, and to establish new collaborations towards a systems-level understanding of gene regulation and modeling of biological systems. The conference included oral presentations from accepted full-length manuscripts and from a few high-quality abstracts, as well as invited presentations from thought leaders in the field. Accepted full-length manuscripts that constitute significant theoretical advances to the fields of systems biology and regulatory genomics have been combined in a collection that is presented in the current issue of the Journal of Computational Biology.

Posted ContentDOI
03 Nov 2015-bioRxiv
TL;DR: This work used computational machine learning algorithms on five histone modifications to predict gene expression in a variety of samples, revealing a high predictive accuracy, especially in cell cultures, with predictive ability dependent on sample type and anatomy.
Abstract: Here, we predict gene expression from epigenetic features based on public data available through the Epigenome Roadmap Project. This rich new dataset includes samples from primary tissues, which to our knowledge have not previously been studied in this context. Specifically, we used computational machine learning algorithms on five histone modifications to predict gene expression in a variety of samples. Our models reveal a high predictive accuracy, especially in cell cultures, with predictive ability dependent on sample type and anatomy. The relative importance of each histone mark feature varied across samples. We localized each histone mark signal to its relevant region, revealing that chromatin state enrichment varies greatly between histone marks. Our results provide several novel insights into epigenetic regulation of transcription in new contexts.