scispace - formally typeset
Search or ask a question

Showing papers in "Nature Genetics in 2011"


Journal ArticleDOI
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

10,056 citations


Journal ArticleDOI
Xiaowu Wang1, Hanzhong Wang, Jun Wang2, Jun Wang3, Jun Wang4, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai2, Jeong-Hwan Mun5, Ian Bancroft6, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang2, Xiyin Wang7, Xiyin Wang8, Michael Freeling9, J. Chris Pires10, Andrew H. Paterson8, Boulos Chalhoub, Bo Wang2, Alice Hayward11, Alice Hayward12, Andrew G. Sharpe13, Beom-Seok Park5, Bernd Weisshaar14, Binghang Liu2, Bo Li2, Bo Liu, Chaobo Tong, Chi Song2, Chris Duran15, Chris Duran11, Chunfang Peng2, Geng Chunyu2, Chushin Koh13, Chuyu Lin2, David Edwards11, David Edwards15, Desheng Mu2, Di Shen, Eleni Soumpourou6, Fei Li, Fiona Fraser6, Gavin C. Conant10, Gilles Lassalle16, Graham J.W. King3, Guusje Bonnema17, Haibao Tang9, Haiping Wang, Harry Belcram, Heling Zhou2, Hideki Hirakawa, Hiroshi Abe, Hui Guo8, Hui Wang, Huizhe Jin8, Isobel A. P. Parkin18, Jacqueline Batley12, Jacqueline Batley11, Jeong-Sun Kim5, Jérémy Just, Jianwen Li2, Jiaohui Xu2, Jie Deng, Jin A Kim5, Jingping Li8, Jingyin Yu, Jinling Meng19, Jinpeng Wang7, Jiumeng Min2, Julie Poulain20, Katsunori Hatakeyama, Kui Wu2, Li Wang7, Lu Fang, Martin Trick6, Matthew G. Links18, Meixia Zhao, Mina Jin5, Nirala Ramchiary21, Nizar Drou22, Paul J. Berkman15, Paul J. Berkman11, Qingle Cai2, Quanfei Huang2, Ruiqiang Li2, Satoshi Tabata, Shifeng Cheng2, Shu Zhang2, Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon5, Su-Ryun Choi21, Tae-Ho Lee8, Wei Fan2, Xiang Zhao2, Xu Tan8, Xun Xu2, Yan Wang, Yang Qiu, Ye Yin2, Yingrui Li2, Yongchen Du, Yongcui Liao, Yong Pyo Lim21, Yoshihiro Narusaka, Yupeng Wang7, Zhenyi Wang7, Zhenyu Li2, Zhiwen Wang2, Zhiyong Xiong10, Zhonghua Zhang 
TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

1,811 citations


Journal ArticleDOI
Paul Hollingworth1, Denise Harold1, Rebecca Sims1, Amy Gerrish1  +174 moreInstitutions (59)
TL;DR: Meta-analyses of all data provided compelling evidence that ABCA7 and the MS4A gene cluster are new Alzheimer's disease susceptibility loci and independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance.
Abstract: We sought to identify new susceptibility loci for Alzheimer's disease through a staged association study (GERAD+) and by testing suggestive loci reported by the Alzheimer's Disease Genetic Consortium (ADGC) in a companion paper. We undertook a combined analysis of four genome-wide association datasets (stage 1) and identified ten newly associated variants with P ≤ 1 × 10−5. We tested these variants for association in an independent sample (stage 2). Three SNPs at two loci replicated and showed evidence for association in a further sample (stage 3). Meta-analyses of all data provided compelling evidence that ABCA7 (rs3764650, meta P = 4.5 × 10−17; including ADGC data, meta P = 5.0 × 10−21) and the MS4A gene cluster (rs610932, meta P = 1.8 × 10−14; including ADGC data, meta P = 1.2 × 10−16) are new Alzheimer's disease susceptibility loci. We also found independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance: CD2AP (GERAD+, P = 8.0 × 10−4; including ADGC data, meta P = 8.6 × 10−9), CD33 (GERAD+, P = 2.2 × 10−4; including ADGC data, meta P = 1.6 × 10−9) and EPHA1 (GERAD+, P = 3.4 × 10−4; including ADGC data, meta P = 6.0 × 10−10).

1,771 citations


Journal ArticleDOI
Adam C. Naj1, Gyungah Jun2, Gary W. Beecham1, Li-San Wang3  +153 moreInstitutions (38)
TL;DR: The Alzheimer Disease Genetics Consortium performed a genome-wide association study of late-onset Alzheimer disease using a three-stage design consisting of a discovery stage (stage 1), two replication stages (stages 2 and 3), and both joint analysis and meta-analysis approaches were used.
Abstract: The Alzheimer Disease Genetics Consortium (ADGC) performed a genome-wide association study of late-onset Alzheimer disease using a three-stage design consisting of a discovery stage (stage 1) and two replication stages (stages 2 and 3). Both joint analysis and meta-analysis approaches were used. We obtained genome-wide significant results at MS4A4A (rs4938933; stages 1 and 2, meta-analysis P (P(M)) = 1.7 × 10(-9), joint analysis P (P(J)) = 1.7 × 10(-9); stages 1, 2 and 3, P(M) = 8.2 × 10(-12)), CD2AP (rs9349407; stages 1, 2 and 3, P(M) = 8.6 × 10(-9)), EPHA1 (rs11767557; stages 1, 2 and 3, P(M) = 6.0 × 10(-10)) and CD33 (rs3865444; stages 1, 2 and 3, P(M) = 1.6 × 10(-9)). We also replicated previous associations at CR1 (rs6701713; P(M) = 4.6 × 10(-10), P(J) = 5.2 × 10(-11)), CLU (rs1532278; P(M) = 8.3 × 10(-8), P(J) = 1.9 × 10(-8)), BIN1 (rs7561528; P(M) = 4.0 × 10(-14), P(J) = 5.2 × 10(-14)) and PICALM (rs561655; P(M) = 7.0 × 10(-11), P(J) = 1.0 × 10(-10)), but not at EXOC3L2, to late-onset Alzheimer's disease susceptibility.

1,743 citations


Journal ArticleDOI
TL;DR: This paper performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals.
Abstract: We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 - 10'8 and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.

1,705 citations


Journal ArticleDOI
TL;DR: The authors examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects.
Abstract: We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 x 10(-11)) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 x 10(-9)), ANK3 (rs10994359, P = 2.5 x 10(-8)) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 x 10(-9)).

1,671 citations


Journal ArticleDOI
Pamela Sklar1, Pamela Sklar2, Stephan Ripke3, Stephan Ripke2  +189 moreInstitutions (51)
TL;DR: An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4, and a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals was identified.
Abstract: We conducted a combined genome-wide association study (GWAS) of 7,481 individuals with bipolar disorder (cases) and 9,250 controls as part of the Psychiatric GWAS Consortium. Our replication study tested 34 SNPs in 4,496 independent cases with bipolar disorder and 42,422 independent controls and found that 18 of 34 SNPs had P < 0.05, with 31 of 34 SNPs having signals with the same direction of effect (P = 3.8 × 10−7). An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4. We identified a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals. Finally, a combined GWAS analysis of schizophrenia and bipolar disorder yielded strong association evidence for SNPs in CACNA1C and in the region of NEK4-ITIH1-ITIH3-ITIH4. Our replication results imply that increasing sample sizes in bipolar disorder will confirm many additional loci.

1,312 citations


Journal ArticleDOI
Carl A. Anderson1, Gabrielle Boucher2, Charlie W. Lees3, Andre Franke4, Mauro D'Amato5, Kent D. Taylor6, James Lee7, Philippe Goyette2, Marcin Imielinski8, Anna Latiano9, Caroline Lagacé2, Regan Scott10, Leila Amininejad11, Suzannah Bumpstead1, Leonard Baidoo10, Robert N. Baldassano8, Murray L. Barclay12, Theodore M. Bayless13, Stephan Brand14, Carsten Büning15, Jean-Frederic Colombel16, Lee A. Denson17, Martine De Vos18, Marla Dubinsky6, Cathryn Edwards19, David Ellinghaus4, Rudolf S N Fehrmann20, James A B Floyd1, Timothy H. Florin21, Denis Franchimont11, Lude Franke20, Michel Georges22, Jürgen Glas14, Nicole L. Glazer23, Stephen L. Guthery24, Talin Haritunians6, Nicholas K. Hayward25, Jean-Pierre Hugot26, Gilles Jobin2, Debby Laukens18, Ian C. Lawrance27, Marc Lémann26, Arie Levine28, Cécile Libioulle22, Edouard Louis22, Dermot P.B. McGovern6, Monica Milla, Grant W. Montgomery25, Katherine I. Morley1, Craig Mowat29, Aylwin Ng30, William G. Newman31, Roel A. Ophoff32, Laura Papi33, Orazio Palmieri9, Laurent Peyrin-Biroulet, Julián Panés, Anne M. Phillips29, Natalie J. Prescott34, Deborah D. Proctor35, Rebecca L. Roberts12, Richard K Russell36, Paul Rutgeerts37, Jeremy D. Sanderson38, Miquel Sans39, Philip Schumm40, Frank Seibold41, Yashoda Sharma35, Lisa A. Simms25, Mark Seielstad42, Mark Seielstad43, A. Hillary Steinhart44, Stephan R. Targan6, Leonard H. van den Berg32, Morten H. Vatn45, Hein W. Verspaget46, Thomas D. Walters44, Cisca Wijmenga20, David C. Wilson3, Harm-Jan Westra20, Ramnik J. Xavier30, Zhen Zhen Zhao25, Cyriel Y. Ponsioen47, Vibeke Andersen48, Leif Törkvist5, Maria Gazouli49, Nicholas P. Anagnou49, Tom H. Karlsen45, Limas Kupčinskas50, Jurgita Sventoraityte50, John C. Mansfield51, Subra Kugathasan52, Mark S. Silverberg44, Jonas Halfvarson53, Jerome I. Rotter6, Christopher G. Mathew34, Anne M. Griffiths44, Richard B. Gearry12, Tariq Ahmad, Steven R. Brant13, Mathias Chamaillard54, Jack Satsangi3, Judy H. Cho35, Stefan Schreiber4, Mark J. Daly30, Jeffrey C. Barrett1, Miles Parkes7, Vito Annese9, Hakon Hakonarson55, Graham L. Radford-Smith25, Richard H. Duerr10, Severine Vermeire37, Rinse K. Weersma20, John D. Rioux2 
Wellcome Trust Sanger Institute1, Université de Montréal2, University of Edinburgh3, University of Kiel4, Karolinska Institutet5, Cedars-Sinai Medical Center6, University of Cambridge7, University of Pennsylvania8, Casa Sollievo della Sofferenza9, University of Pittsburgh10, Université libre de Bruxelles11, University of Otago12, Johns Hopkins University13, Ludwig Maximilian University of Munich14, Charité15, Lille University of Science and Technology16, Cincinnati Children's Hospital Medical Center17, Ghent University18, Torbay Hospital19, University of Groningen20, Mater Health Services21, University of Liège22, University of Washington23, University of Utah24, QIMR Berghofer Medical Research Institute25, University of Paris26, University of Western Australia27, Tel Aviv University28, University of Dundee29, Harvard University30, University of Manchester31, Utrecht University32, University of Florence33, King's College London34, Yale University35, Royal Hospital for Sick Children36, Katholieke Universiteit Leuven37, Guy's and St Thomas' NHS Foundation Trust38, University of Barcelona39, University of Chicago40, University of Bern41, University of California, San Francisco42, Agency for Science, Technology and Research43, University of Toronto44, University of Oslo45, Leiden University46, University of Amsterdam47, Aarhus University48, National and Kapodistrian University of Athens49, Lithuanian University of Health Sciences50, Newcastle University51, Emory University52, Örebro University53, French Institute of Health and Medical Research54, Center for Applied Genomics55
TL;DR: A meta-analysis of six ulcerative colitis genome-wide association study datasets found many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1.
Abstract: Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.

1,291 citations


Journal ArticleDOI
TL;DR: This evolving CNV morbidity map, combined with exome and genome sequencing, will be critical for deciphering the genetic basis of developmental delay, intellectual disability and autism spectrum disorders.
Abstract: To understand the genetic heterogeneity underlying developmental delay, we compared copy number variants (CNVs) in 15,767 children with intellectual disability and various congenital defects (cases) to CNVs in 8,329 unaffected adult controls. We estimate that ∼14.2% of disease in these children is caused by CNVs >400 kb. We observed a greater enrichment of CNVs in individuals with craniofacial anomalies and cardiovascular defects compared to those with epilepsy or autism. We identified 59 pathogenic CNVs, including 14 new or previously weakly supported candidates, refined the critical interval for several genomic disorders, such as the 17q21.31 microdeletion syndrome, and identified 940 candidate dosage-sensitive genes. We also developed methods to opportunistically discover small, disruptive CNVs within the large and growing diagnostic array datasets. This evolving CNV morbidity map, combined with exome and genome sequencing, will be critical for deciphering the genetic basis of developmental delay, intellectual disability and autism spectrum disorders.

1,190 citations


Journal ArticleDOI
TL;DR: The results show that trio-based exome sequencing is a powerful approach for identifying new candidate genes for ASDs and suggest that de novo mutations may contribute substantially to the genetic etiology of ASDs.
Abstract: Evidence for the etiology of autism spectrum disorders (ASDs) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity. We sequenced the exomes of 20 individuals with sporadic ASD (cases) and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, 11 of which were protein altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4 out of 20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 missense variant, and we provide functional support for a multi-hit model for disease risk. Our results show that trio-based exome sequencing is a powerful approach for identifying new candidate genes for ASDs and suggest that de novo mutations may contribute substantially to the genetic etiology of ASDs.

1,116 citations


Journal ArticleDOI
TL;DR: New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted, and macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes.
Abstract: The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

Journal ArticleDOI
TL;DR: Stochastic methylation variation of the same cDMRs, distinguishing cancer from normal tissue, is shown in colon, lung, breast, thyroid and Wilms' tumors, with intermediate variation in adenomas.
Abstract: Tumor heterogeneity is a major barrier to effective cancer diagnosis and treatment. We recently identified cancer-specific differentially DNA-methylated regions (cDMRs) in colon cancer, which also distinguish normal tissue types from each other, suggesting that these cDMRs might be generalized across cancer types. Here we show stochastic methylation variation of the same cDMRs, distinguishing cancer from normal tissue, in colon, lung, breast, thyroid and Wilms' tumors, with intermediate variation in adenomas. Whole-genome bisulfite sequencing shows these variable cDMRs are related to loss of sharply delimited methylation boundaries at CpG islands. Furthermore, we find hypomethylation of discrete blocks encompassing half the genome, with extreme gene expression variability. Genes associated with the cDMRs and large blocks are involved in mitosis and matrix remodeling, respectively. We suggest a model for cancer involving loss of epigenetic stability of well-defined genomic domains that underlies increased methylation variability in cancer that may contribute to tumor heterogeneity.

Journal ArticleDOI
TL;DR: In this article, an ultra-high-density array that tiles the promoters of 56 cell-cycle genes was used to interrogate 108 samples representing diverse perturbations, identifying 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle.
Abstract: Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.

Journal ArticleDOI
TL;DR: The majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome are described, their effects on gene function, and the patterns of local and global linkage among these variants.
Abstract: The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations.

Journal ArticleDOI
TL;DR: It is found that in some cancer cells a relatively large amount of glycolytic carbon is diverted into serine and glycine metabolism through phosphoglycerate dehydrogenase (PHGDH).
Abstract: Jason Locasale, Lewis Cantley, Matthew Vander Heiden and colleagues show that PHGDH is amplified in some human cancers and diverts a relatively large amount of glycolytic carbon into serine and glycine biosynthesis. They further show that PHGDH-amplified cancer cells become dependent on PHGDH for their growth, suggesting that the altered metabolic flux driven by this amplification contributes to oncogenesis. Most tumors exhibit increased glucose metabolism to lactate, however, the extent to which glucose-derived metabolic fluxes are used for alternative processes is poorly understood1,2. Using a metabolomics approach with isotope labeling, we found that in some cancer cells a relatively large amount of glycolytic carbon is diverted into serine and glycine metabolism through phosphoglycerate dehydrogenase (PHGDH). An analysis of human cancers showed that PHGDH is recurrently amplified in a genomic region of focal copy number gain most commonly found in melanoma. Decreasing PHGDH expression impaired proliferation in amplified cell lines. Increased expression was also associated with breast cancer subtypes, and ectopic expression of PHGDH in mammary epithelial cells disrupted acinar morphogenesis and induced other phenotypic alterations that may predispose cells to transformation. Our findings show that the diversion of glycolytic flux into a specific alternate pathway can be selected during tumor development and may contribute to the pathogenesis of human cancer.

Journal ArticleDOI
TL;DR: By combining next-generation sequencing and copy number analysis, it is shown that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case and novel dysregulated pathways underlying its pathogenesis are identified.
Abstract: Diffuse large B-cell lymphoma (DLBCL) is the most common form of human lymphoma. Although a number of structural alterations have been associated with the pathogenesis of this malignancy, the full spectrum of genetic lesions that are present in the DLBCL genome, and therefore the identity of dysregulated cellular pathways, remains unknown. By combining next-generation sequencing and copy number analysis, we show that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case. This analysis also revealed mutations in genes not previously implicated in DLBCL pathogenesis, including those regulating chromatin methylation (MLL2; 24% of samples) and immune recognition by T cells. These results provide initial data on the complexity of the DLBCL coding genome and identify novel dysregulated pathways underlying its pathogenesis.

Journal ArticleDOI
TL;DR: Overall, GWAS results show that variations at the liguleless genes have contributed to more upright leaves, and the use of GWAS with specially designed mapping populations is effective in uncovering the basis of key agronomic traits.
Abstract: US maize yield has increased eight-fold in the past 80 years, with half of the gain attributed to selection by breeders. During this time, changes in maize leaf angle and size have altered plant architecture, allowing more efficient light capture as planting density has increased. Through a genome-wide association study (GWAS) of the maize nested association mapping panel, we determined the genetic basis of important leaf architecture traits and identified some of the key genes. Overall, we demonstrate that the genetic architecture of the leaf traits is dominated by small effects, with little epistasis, environmental interaction or pleiotropy. In particular, GWAS results show that variations at the liguleless genes have contributed to more upright leaves. These results demonstrate that the use of GWAS with specially designed mapping populations is effective in uncovering the basis of key agronomic traits.

Journal ArticleDOI
TL;DR: Up to 95% of de novo genomic binding by the glucocorticoid receptor, a paradigmatic ligand-activated transcription factor, is targeted to preexisting foci of accessible chromatin, defining a framework for understanding regulatory factor–genome interactions and providing a molecular basis for the tissue selectivity of steroid pharmaceuticals and other agents that intersect the living genome.
Abstract: Development, differentiation and response to environmental stimuli are characterized by sequential changes in cellular state initiated by the de novo binding of regulated transcriptional factors to their cognate genomic sites. The mechanism whereby a given regulatory factor selects a limited number of in vivo targets from a myriad of potential genomic binding sites is undetermined. Here we show that up to 95% of de novo genomic binding by the glucocorticoid receptor, a paradigmatic ligand-activated transcription factor, is targeted to preexisting foci of accessible chromatin. Factor binding invariably potentiates chromatin accessibility. Cell-selective glucocorticoid receptor occupancy patterns appear to be comprehensively predetermined by cell-specific differences in baseline chromatin accessibility patterns, with secondary contributions from local sequence features. The results define a framework for understanding regulatory factor-genome interactions and provide a molecular basis for the tissue selectivity of steroid pharmaceuticals and other agents that intersect the living genome.

Journal ArticleDOI
TL;DR: The results provide further evidence that a substantial proportion of heritability is captured by common SNPs, that height, BMI and QTi are highly polygenic traits, and that the additive variation explained by a part of the genome is approximately proportional to the total length of DNA contained within genes therein.
Abstract: We estimate and partition genetic variation for height, body mass index (BMI), von Willebrand factor and QT interval (QTi) using 586,898 SNPs genotyped on 11,586 unrelated individuals. We estimate that ∼45%, ∼17%, ∼25% and ∼21% of the variance in height, BMI, von Willebrand factor and QTi, respectively, can be explained by all autosomal SNPs and a further ∼0.5-1% can be explained by X chromosome SNPs. We show that the variance explained by each chromosome is proportional to its length, and that SNPs in or near genes explain more variation than SNPs between genes. We propose a new approach to estimate variation due to cryptic relatedness and population stratification. Our results provide further evidence that a substantial proportion of heritability is captured by common SNPs, that height, BMI and QTi are highly polygenic traits, and that the additive variation explained by a part of the genome is approximately proportional to the total length of DNA contained within genes therein.

Journal ArticleDOI
TL;DR: A BAP1-related cancer syndrome is identified that is characterized by mesothelioma and uveal melanoma, and it is hypothesized that other cancers may also be involved and that mesot helioma predominates upon asbestos exposure.
Abstract: Because only a small fraction of asbestos-exposed individuals develop malignant mesothelioma, and because mesothelioma clustering is observed in some families, we searched for genetic predisposing factors. We discovered germline mutations in the gene encoding BRCA1 associated protein-1 (BAP1) in two families with a high incidence of mesothelioma, and we observed somatic alterations affecting BAP1 in familial mesotheliomas, indicating biallelic inactivation. In addition to mesothelioma, some BAP1 mutation carriers developed uveal melanoma. We also found germline BAP1 mutations in 2 of 26 sporadic mesotheliomas; both individuals with mutant BAP1 were previously diagnosed with uveal melanoma. We also observed somatic truncating BAP1 mutations and aberrant BAP1 expression in sporadic mesotheliomas without germline mutations. These results identify a BAP1-related cancer syndrome that is characterized by mesothelioma and uveal melanoma. We hypothesize that other cancers may also be involved and that mesothelioma predominates upon asbestos exposure. These findings will help to identify individuals at high risk of mesothelioma who could be targeted for early intervention.

Journal ArticleDOI
TL;DR: The 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47, based on 8.3× dideoxy sequence coverage, is reported, indicating pervasive selection for a smaller genome in this outcrossing species.
Abstract: We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.

Journal ArticleDOI
David M. Evans1, Spencer Cca.2, J J Pointon3, Zhan Su2, D Harvey3, Grazyna Kochan2, Udo Oppermann4, Alexander T. Dilthey5, Matti Pirinen5, Millicent A. Stone6, L H Appleton3, Loukas Moutsianas2, Stephen Leslie2, T. W. H. Wordsworth3, Tony J. Kenna7, Tugce Karaderi3, Gethin P. Thomas7, Minghong Ward8, Michael H. Weisman9, C. Farrar3, Linda A. Bradbury7, Patrick Danoy7, Robert D. Inman10, Walter P. Maksymowych11, Dafna D. Gladman10, Proton Rahman12, Ann W. Morgan13, Helena Marzo-Ortega13, Paul Bowness3, Karl Gaffney14, Gaston Jsh.15, Malcolm D. Smith15, Jácome Bruges-Armas16, Couto A-R.17, Rosa Sorrentino17, Fabiana Paladini17, Manuel A. R. Ferreira18, Huji Xu19, Yu Liu19, L. Jiang19, Carlos López-Larrea, Roberto Díaz-Peña, Antonio López-Vázquez, Tetyana Zayats5, Céline Bellenguez2, Hannah Blackburn, Jenefer M. Blackwell20, Elvira Bramon21, Suzannah Bumpstead21, Juan P. Casas22, Aiden Corvin23, N. Craddock24, Panagiotis Deloukas21, Serge Dronov21, Audrey Duncanson25, Sarah Edkins21, Colin Freeman26, Matthew W. Gillman21, Emma Gray21, R. Gwilliam21, Naomi Hammond21, Sarah E. Hunt21, Janusz Jankowski, Alagurevathi Jayakumar21, Cordelia Langford21, Jennifer Liddle21, Hugh S. Markus27, Christopher G. Mathew28, O. T. McCann21, Mark I. McCarthy29, Palmer Cna.21, Leena Peltonen21, Robert Plomin28, Simon C. Potter21, Anna Rautanen21, Radhi Ravindrarajah21, Michelle Ricketts21, Nilesh J. Samani30, Stephen Sawcer31, A. Strange26, Richard C. Trembath28, Ananth C. Viswanathan32, Ananth C. Viswanathan33, Matthew Waller21, Paul A. Weston21, Pamela Whittaker21, Sara Widaa21, Nicholas W. Wood, Gil McVean26, John D. Reveille34, B P Wordsworth35, Matthew A. Brown35, Peter Donnelly26 
TL;DR: In this paper, the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all their datasets (p < 5x 10(-6) overall, with support in each of the three datasets studied).
Abstract: Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 x 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.

Journal ArticleDOI
Dara G. Torgerson1, Dara G. Torgerson2, Elizabeth J. Ampleford3, Grace Y. Chiu4, W. James Gauderman5, Christopher R. Gignoux6, Penelope E. Graves7, Blanca E. Himes8, Albert M. Levin9, Rasika A. Mathias10, Dana B. Hancock6, Dana B. Hancock2, Dana B. Hancock11, James W. Baurley5, Celeste Eng6, Debra A. Stern7, Juan C. Celedón12, Nicholas Rafaels10, Daniel Capurso5, David V. Conti6, Lindsey A. Roth, Manuel Soto-Quiros10, Alkis Togias3, Xingnan Li1, Rachel A. Myers, Isabelle Romieu5, Isabelle Romieu13, David Van Den Berg6, Donglei Hu10, Nadia N. Hansel6, Ryan D. Hernandez8, Elliott Israel5, Muhammad T. Salam6, Joshua Galanter14, Pedro C. Avila, Lydiana Avila, Jose R. Rodriquez-Santana, R. Chapela15, William Rodríguez-Cintrón10, Gregory B. Diette10, N. Franklin Adkinson10, Rebekah A. Abel1, K. Ross1, Min Shi11, Mezbah U. Faruque16, Georgia M. Dunston16, Harold Watson17, Vito J. Mantese10, Serpil C. Ezurum18, Liming Liang8, Ingo Ruczinski10, Jean G. Ford10, Scott Huntsman6, Kian Fan Chung19, Hita Vora5, Xia Li5, William J. Calhoun20, Mario Castro21, Juan José Luis Sienra-Monge, Blanca Estela Del Río-Navarro, Klaus A. Deichmann22, Andrea Heinzmann22, Sally E. Wenzel22, William W. Busse23, William W. Busse12, James E. Gern23, Robert F. Lemanske23, Terri H. Beaty10, Eugene R. Bleecker3, Benjamin A. Raby8, Deborah A. Meyers3, Stephanie J. London10, Frank D. Gilliland5, Esteban G. Burchard6, Fernando D. Martinez7, Scott T. Weiss8, L. Keoki Williams9, Kathleen C. Barnes10, Carole Ober1, Dan L. Nicolae1 
TL;DR: The results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma.
Abstract: Asthma is a common disease with a complex risk architecture including both genetic and environmental factors. We performed a meta-analysis of North American genome-wide association studies of asthma in 5,416 individuals with asthma (cases) including individuals of European American, African American or African Caribbean, and Latino ancestry, with replication in an additional 12,649 individuals from the same ethnic groups. We identified five susceptibility loci. Four were at previously reported loci on 17q21, near IL1RL1, TSLP and IL33, but we report for the first time, to our knowledge, that these loci are associated with asthma risk in three ethnic groups. In addition, we identified a new asthma susceptibility locus at PYHIN1, with the association being specific to individuals of African descent (P = 3.9 × 10(-9)). These results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma.

Journal ArticleDOI
TL;DR: It is shown that Sox9 is expressed throughout the biliary and pancreatic ductal epithelia, which are connected to the intestinal stem-cell zone, which suggests interdependence between the structure and homeostasis of endodermal organs, with Sox9 expression being linked to progenitor status.
Abstract: The liver and exocrine pancreas share a common structure, with functioning units (hepatic plates and pancreatic acini) connected to the ductal tree. Here we show that Sox9 is expressed throughout the biliary and pancreatic ductal epithelia, which are connected to the intestinal stem-cell zone. Cre-based lineage tracing showed that adult intestinal cells, hepatocytes and pancreatic acinar cells are supplied physiologically from Sox9-expressing progenitors. Combination of lineage analysis and hepatic injury experiments showed involvement of Sox9-positive precursors in liver regeneration. Embryonic pancreatic Sox9-expressing cells differentiate into all types of mature cells, but their capacity for endocrine differentiation diminishes shortly after birth, when endocrine cells detach from the epithelial lining of the ducts and form the islets of Langerhans. We observed a developmental switch in the hepatic progenitor cell type from Sox9-negative to Sox9-positive progenitors as the biliary tree develops. These results suggest interdependence between the structure and homeostasis of endodermal organs, with Sox9 expression being linked to progenitor status.

Journal ArticleDOI
TL;DR: The identification of somatic mutations by exome sequencing in acute monocytic leukemia, the M5 subtype of acute myeloid leukemia (AML-M5), suggests a contribution of aberrant DNA methyltransferase activity to the pathogenesis of acute monocrytic leukemia and provides a useful new biomarker for relevant cases.
Abstract: Abnormal epigenetic regulation has been implicated in oncogenesis. We report here the identification of somatic mutations by exome sequencing in acute monocytic leukemia, the M5 subtype of acute myeloid leukemia (AML-M5). We discovered mutations in DNMT3A (encoding DNA methyltransferase 3A) in 23 of 112 (20.5%) cases. The DNMT3A mutants showed reduced enzymatic activity or aberrant affinity to histone H3 in vitro. Notably, there were alterations of DNA methylation patterns and/or gene expression profiles (such as HOXB genes) in samples with DNMT3A mutations as compared with those without such changes. Leukemias with DNMT3A mutations constituted a group of poor prognosis with elderly disease onset and of promonocytic as well as monocytic predominance among AML-M5 individuals. Screening other leukemia subtypes showed Arg882 alterations in 13.6% of acute myelomonocytic leukemia (AML-M4) cases. Our work suggests a contribution of aberrant DNA methyltransferase activity to the pathogenesis of acute monocytic leukemia and provides a useful new biomarker for relevant cases.

Journal ArticleDOI
TL;DR: Next-generation sequencing is used to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls to identify new, rare and probably functional variants that could aid functional experiments and predictive models.
Abstract: More than 1,000 susceptibility loci have been identified through genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings have not yet been defined. Here we used pooled next-generation sequencing to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls. Through follow-up genotyping of 70 rare and low-frequency protein-altering variants in nine independent case-control series (16,054 Crohn's disease cases, 12,153 ulcerative colitis cases and 17,575 healthy controls), we identified four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association with a protective splice variant in CARD9 (P < 1 × 10(-16), odds ratio ≈ 0.29) and additional associations with coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by identifying new, rare and probably functional variants that could aid functional experiments and predictive models.

Journal ArticleDOI
TL;DR: It is shown that the quantitative trait locus GS5 in rice controls grain size by regulating grain width, filling and weight and functions as a positive regulator of grain size, such that higher expression of GS5 is correlated with larger grain size.
Abstract: Increasing crop yield is one of the most important goals of plant science research. Grain size is a major determinant of grain yield in cereals and is a target trait for both domestication and artificial breeding(1). We showed that the quantitative trait locus (QTL) GS5 in rice controls grain size by regulating grain width, filling and weight. GS5 encodes a putative serine carboxypeptidase and functions as a positive regulator of grain size, such that higher expression of GS5 is correlated with larger grain size. Sequencing of the promoter region in 51 rice accessions from a wide geographic range identified three haplotypes that seem to be associated with grain width. The results suggest that natural variation in GS5 contributes to grain size diversity in rice and may be useful in improving yield in rice and, potentially, other crops(2).

Journal ArticleDOI
TL;DR: It is shown that FoxA1 is a key determinant that can influence differential interactions between ER and chromatin and that CTCF was an upstream negative regulator of FOXA1-chromatin interactions.
Abstract: Estrogen receptor-α (ER) is the key feature of most breast cancers and binding of ER to the genome correlates with expression of the Forkhead protein FOXA1 (also called HNF3α). Here we show that FOXA1 is a key determinant that can influence differential interactions between ER and chromatin. Almost all ER-chromatin interactions and gene expression changes depended on the presence of FOXA1 and FOXA1 influenced genome-wide chromatin accessibility. Furthermore, we found that CTCF was an upstream negative regulator of FOXA1-chromatin interactions. In estrogen-responsive breast cancer cells, the dependency on FOXA1 for tamoxifen-ER activity was absolute; in tamoxifen-resistant cells, ER binding was independent of ligand but depended on FOXA1. Expression of FOXA1 in non-breast cancer cells can alter ER binding and function. As such, FOXA1 is a major determinant of estrogen-ER activity and endocrine response in breast cancer cells.

Journal ArticleDOI
TL;DR: New studies reveal that 20% of individuals with acute myeloid leukemia harbor somatic mutations in DNMT3A (encoding DNA methyltransferase 3A), although these leukemias have some gene expression and DNA methylation changes.
Abstract: New studies reveal that 20% of individuals with acute myeloid leukemia harbor somatic mutations in DNMT3A (encoding DNA methyltransferase 3A). Although these leukemias have some gene expression and DNA methylation changes, a direct link between mutant DNMT3A, epigenetic changes and pathogenesis remains to be established.

Journal ArticleDOI
TL;DR: The complex genetic architecture of the risk regions of and refine the risk signals for celiac disease are defined, providing the next step toward uncovering the causal mechanisms of the disease.
Abstract: Using variants from the 1000 Genomes Project pilot European CEU dataset and data from additional resequencing studies, we densely genotyped 183 non-HLA risk loci previously associated with immune-mediated diseases in 12,041 individuals with celiac disease (cases) and 12,228 controls. We identified 13 new celiac disease risk loci reaching genome-wide significance, bringing the number of known loci (including the HLA locus) to 40. We found multiple independent association signals at over one-third of these loci, a finding that is attributable to a combination of common, low-frequency and rare genetic variants. Compared to previously available data such as those from HapMap3, our dense genotyping in a large sample collection provided a higher resolution of the pattern of linkage disequilibrium and suggested localization of many signals to finer scale regions. In particular, 29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements. Altogether, we define the complex genetic architecture of the risk regions of and refine the risk signals for celiac disease, providing the next step toward uncovering the causal mechanisms of the disease.