Showing papers in "Nature Genetics in 2011"
••
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
10,056 citations
••
Civil Aviation Authority of Singapore1, Beijing Institute of Genomics2, Rothamsted Research3, University of Copenhagen4, Rural Development Administration5, John Innes Centre6, North China University of Science and Technology7, University of Georgia8, University of California, Berkeley9, University of Missouri10, University of Queensland11, Australian Research Council12, National Research Council13, Bielefeld University14, Australian Centre for Plant Functional Genomics15, University of Rennes16, Wageningen University and Research Centre17, Agriculture and Agri-Food Canada18, Huazhong Agricultural University19, French Alternative Energies and Atomic Energy Commission20, Chungnam National University21, Norwich Research Park22
TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
1,811 citations
••
TL;DR: Meta-analyses of all data provided compelling evidence that ABCA7 and the MS4A gene cluster are new Alzheimer's disease susceptibility loci and independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance.
Abstract: We sought to identify new susceptibility loci for Alzheimer's disease through a staged association study (GERAD+) and by testing suggestive loci reported by the Alzheimer's Disease Genetic Consortium (ADGC) in a companion paper. We undertook a combined analysis of four genome-wide association datasets (stage 1) and identified ten newly associated variants with P ≤ 1 × 10−5. We tested these variants for association in an independent sample (stage 2). Three SNPs at two loci replicated and showed evidence for association in a further sample (stage 3). Meta-analyses of all data provided compelling evidence that ABCA7 (rs3764650, meta P = 4.5 × 10−17; including ADGC data, meta P = 5.0 × 10−21) and the MS4A gene cluster (rs610932, meta P = 1.8 × 10−14; including ADGC data, meta P = 1.2 × 10−16) are new Alzheimer's disease susceptibility loci. We also found independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance: CD2AP (GERAD+, P = 8.0 × 10−4; including ADGC data, meta P = 8.6 × 10−9), CD33 (GERAD+, P = 2.2 × 10−4; including ADGC data, meta P = 1.6 × 10−9) and EPHA1 (GERAD+, P = 3.4 × 10−4; including ADGC data, meta P = 6.0 × 10−10).
1,771 citations
••
TL;DR: The Alzheimer Disease Genetics Consortium performed a genome-wide association study of late-onset Alzheimer disease using a three-stage design consisting of a discovery stage (stage 1), two replication stages (stages 2 and 3), and both joint analysis and meta-analysis approaches were used.
Abstract: The Alzheimer Disease Genetics Consortium (ADGC) performed a genome-wide association study of late-onset Alzheimer disease using a three-stage design consisting of a discovery stage (stage 1) and two replication stages (stages 2 and 3). Both joint analysis and meta-analysis approaches were used. We obtained genome-wide significant results at MS4A4A (rs4938933; stages 1 and 2, meta-analysis P (P(M)) = 1.7 × 10(-9), joint analysis P (P(J)) = 1.7 × 10(-9); stages 1, 2 and 3, P(M) = 8.2 × 10(-12)), CD2AP (rs9349407; stages 1, 2 and 3, P(M) = 8.6 × 10(-9)), EPHA1 (rs11767557; stages 1, 2 and 3, P(M) = 6.0 × 10(-10)) and CD33 (rs3865444; stages 1, 2 and 3, P(M) = 1.6 × 10(-9)). We also replicated previous associations at CR1 (rs6701713; P(M) = 4.6 × 10(-10), P(J) = 5.2 × 10(-11)), CLU (rs1532278; P(M) = 8.3 × 10(-8), P(J) = 1.9 × 10(-8)), BIN1 (rs7561528; P(M) = 4.0 × 10(-14), P(J) = 5.2 × 10(-14)) and PICALM (rs561655; P(M) = 7.0 × 10(-11), P(J) = 1.0 × 10(-10)), but not at EXOC3L2, to late-onset Alzheimer's disease susceptibility.
1,743 citations
••
Heribert Schunkert1, Inke R. König1, Sekar Kathiresan2, Muredach P. Reilly3 +163 more•Institutions (59)
TL;DR: This paper performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals.
Abstract: We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 - 10'8 and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.
1,705 citations
••
Stephan Ripke1, Alan R. Sanders2, Kenneth S. Kendler3, Douglas F. Levinson4 +207 more•Institutions (71)
TL;DR: The authors examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects.
Abstract: We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (P = 1.6 x 10(-11)) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, P = 7.0 x 10(-9)), ANK3 (rs10994359, P = 2.5 x 10(-8)) and the ITIH3-ITIH4 region (rs2239547, P = 7.8 x 10(-9)).
1,671 citations
••
TL;DR: An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4, and a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals was identified.
Abstract: We conducted a combined genome-wide association study (GWAS) of 7,481 individuals with bipolar disorder (cases) and 9,250 controls as part of the Psychiatric GWAS Consortium. Our replication study tested 34 SNPs in 4,496 independent cases with bipolar disorder and 42,422 independent controls and found that 18 of 34 SNPs had P < 0.05, with 31 of 34 SNPs having signals with the same direction of effect (P = 3.8 × 10−7). An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4. We identified a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals. Finally, a combined GWAS analysis of schizophrenia and bipolar disorder yielded strong association evidence for SNPs in CACNA1C and in the region of NEK4-ITIH1-ITIH3-ITIH4. Our replication results imply that increasing sample sizes in bipolar disorder will confirm many additional loci.
1,312 citations
••
Wellcome Trust Sanger Institute1, Université de Montréal2, University of Edinburgh3, University of Kiel4, Karolinska Institutet5, Cedars-Sinai Medical Center6, University of Cambridge7, University of Pennsylvania8, Casa Sollievo della Sofferenza9, University of Pittsburgh10, Université libre de Bruxelles11, University of Otago12, Johns Hopkins University13, Ludwig Maximilian University of Munich14, Charité15, Lille University of Science and Technology16, Cincinnati Children's Hospital Medical Center17, Ghent University18, Torbay Hospital19, University of Groningen20, Mater Health Services21, University of Liège22, University of Washington23, University of Utah24, QIMR Berghofer Medical Research Institute25, University of Paris26, University of Western Australia27, Tel Aviv University28, University of Dundee29, Harvard University30, University of Manchester31, Utrecht University32, University of Florence33, King's College London34, Yale University35, Royal Hospital for Sick Children36, Katholieke Universiteit Leuven37, Guy's and St Thomas' NHS Foundation Trust38, University of Barcelona39, University of Chicago40, University of Bern41, University of California, San Francisco42, Agency for Science, Technology and Research43, University of Toronto44, University of Oslo45, Leiden University46, University of Amsterdam47, Aarhus University48, National and Kapodistrian University of Athens49, Lithuanian University of Health Sciences50, Newcastle University51, Emory University52, Örebro University53, French Institute of Health and Medical Research54, Center for Applied Genomics55
TL;DR: A meta-analysis of six ulcerative colitis genome-wide association study datasets found many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1.
Abstract: Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.
1,291 citations
••
TL;DR: This evolving CNV morbidity map, combined with exome and genome sequencing, will be critical for deciphering the genetic basis of developmental delay, intellectual disability and autism spectrum disorders.
Abstract: To understand the genetic heterogeneity underlying developmental delay, we compared copy number variants (CNVs) in 15,767 children with intellectual disability and various congenital defects (cases) to CNVs in 8,329 unaffected adult controls. We estimate that ∼14.2% of disease in these children is caused by CNVs >400 kb. We observed a greater enrichment of CNVs in individuals with craniofacial anomalies and cardiovascular defects compared to those with epilepsy or autism. We identified 59 pathogenic CNVs, including 14 new or previously weakly supported candidates, refined the critical interval for several genomic disorders, such as the 17q21.31 microdeletion syndrome, and identified 940 candidate dosage-sensitive genes. We also developed methods to opportunistically discover small, disruptive CNVs within the large and growing diagnostic array datasets. This evolving CNV morbidity map, combined with exome and genome sequencing, will be critical for deciphering the genetic basis of developmental delay, intellectual disability and autism spectrum disorders.
1,190 citations
••
TL;DR: The results show that trio-based exome sequencing is a powerful approach for identifying new candidate genes for ASDs and suggest that de novo mutations may contribute substantially to the genetic etiology of ASDs.
Abstract: Evidence for the etiology of autism spectrum disorders (ASDs) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity. We sequenced the exomes of 20 individuals with sporadic ASD (cases) and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, 11 of which were protein altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4 out of 20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 missense variant, and we provide functional support for a multi-hit model for disease risk. Our results show that trio-based exome sequencing is a powerful approach for identifying new candidate genes for ASDs and suggest that de novo mutations may contribute substantially to the genetic etiology of ASDs.
1,116 citations
••
University of North Texas1, East Malling Research Station2, Plant & Food Research3, Oregon State University4, University of Maryland, College Park5, Indiana University6, Virginia Tech7, Georgia Institute of Technology8, University of New Hampshire9, United States Department of Agriculture10, Hoffmann-La Roche11, University of Auckland12, Rutgers University13, University of the Western Cape14, University of Florida15, University of Chile16, Andrés Bello National University17, Weizmann Institute of Science18, University of Pittsburgh19, University of Georgia20, Technische Universität München21, University of Illinois at Urbana–Champaign22, Institut national de la recherche agronomique23
TL;DR: New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted, and macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes.
Abstract: The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.
••
TL;DR: Stochastic methylation variation of the same cDMRs, distinguishing cancer from normal tissue, is shown in colon, lung, breast, thyroid and Wilms' tumors, with intermediate variation in adenomas.
Abstract: Tumor heterogeneity is a major barrier to effective cancer diagnosis and treatment. We recently identified cancer-specific differentially DNA-methylated regions (cDMRs) in colon cancer, which also distinguish normal tissue types from each other, suggesting that these cDMRs might be generalized across cancer types. Here we show stochastic methylation variation of the same cDMRs, distinguishing cancer from normal tissue, in colon, lung, breast, thyroid and Wilms' tumors, with intermediate variation in adenomas. Whole-genome bisulfite sequencing shows these variable cDMRs are related to loss of sharply delimited methylation boundaries at CpG islands. Furthermore, we find hypomethylation of discrete blocks encompassing half the genome, with extreme gene expression variability. Genes associated with the cDMRs and large blocks are involved in mitosis and matrix remodeling, respectively. We suggest a model for cancer involving loss of epigenetic stability of well-defined genomic domains that underlies increased methylation variability in cancer that may contribute to tumor heterogeneity.
••
TL;DR: In this article, an ultra-high-density array that tiles the promoters of 56 cell-cycle genes was used to interrogate 108 samples representing diverse perturbations, identifying 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle.
Abstract: Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.
••
TL;DR: The majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome are described, their effects on gene function, and the patterns of local and global linkage among these variants.
Abstract: The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations.
••
TL;DR: It is found that in some cancer cells a relatively large amount of glycolytic carbon is diverted into serine and glycine metabolism through phosphoglycerate dehydrogenase (PHGDH).
Abstract: Jason Locasale, Lewis Cantley, Matthew Vander Heiden and colleagues show that PHGDH is amplified in some human cancers and diverts a relatively large amount of glycolytic carbon into serine and glycine biosynthesis. They further show that PHGDH-amplified cancer cells become dependent on PHGDH for their growth, suggesting that the altered metabolic flux driven by this amplification contributes to oncogenesis. Most tumors exhibit increased glucose metabolism to lactate, however, the extent to which glucose-derived metabolic fluxes are used for alternative processes is poorly understood1,2. Using a metabolomics approach with isotope labeling, we found that in some cancer cells a relatively large amount of glycolytic carbon is diverted into serine and glycine metabolism through phosphoglycerate dehydrogenase (PHGDH). An analysis of human cancers showed that PHGDH is recurrently amplified in a genomic region of focal copy number gain most commonly found in melanoma. Decreasing PHGDH expression impaired proliferation in amplified cell lines. Increased expression was also associated with breast cancer subtypes, and ectopic expression of PHGDH in mammary epithelial cells disrupted acinar morphogenesis and induced other phenotypic alterations that may predispose cells to transformation. Our findings show that the diversion of glycolytic flux into a specific alternate pathway can be selected during tumor development and may contribute to the pathogenesis of human cancer.
••
TL;DR: By combining next-generation sequencing and copy number analysis, it is shown that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case and novel dysregulated pathways underlying its pathogenesis are identified.
Abstract: Diffuse large B-cell lymphoma (DLBCL) is the most common form of human lymphoma. Although a number of structural alterations have been associated with the pathogenesis of this malignancy, the full spectrum of genetic lesions that are present in the DLBCL genome, and therefore the identity of dysregulated cellular pathways, remains unknown. By combining next-generation sequencing and copy number analysis, we show that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case. This analysis also revealed mutations in genes not previously implicated in DLBCL pathogenesis, including those regulating chromatin methylation (MLL2; 24% of samples) and immune recognition by T cells. These results provide initial data on the complexity of the DLBCL coding genome and identify novel dysregulated pathways underlying its pathogenesis.
••
TL;DR: Overall, GWAS results show that variations at the liguleless genes have contributed to more upright leaves, and the use of GWAS with specially designed mapping populations is effective in uncovering the basis of key agronomic traits.
Abstract: US maize yield has increased eight-fold in the past 80 years, with half of the gain attributed to selection by breeders. During this time, changes in maize leaf angle and size have altered plant architecture, allowing more efficient light capture as planting density has increased. Through a genome-wide association study (GWAS) of the maize nested association mapping panel, we determined the genetic basis of important leaf architecture traits and identified some of the key genes. Overall, we demonstrate that the genetic architecture of the leaf traits is dominated by small effects, with little epistasis, environmental interaction or pleiotropy. In particular, GWAS results show that variations at the liguleless genes have contributed to more upright leaves. These results demonstrate that the use of GWAS with specially designed mapping populations is effective in uncovering the basis of key agronomic traits.
••
TL;DR: Up to 95% of de novo genomic binding by the glucocorticoid receptor, a paradigmatic ligand-activated transcription factor, is targeted to preexisting foci of accessible chromatin, defining a framework for understanding regulatory factor–genome interactions and providing a molecular basis for the tissue selectivity of steroid pharmaceuticals and other agents that intersect the living genome.
Abstract: Development, differentiation and response to environmental stimuli are characterized by sequential changes in cellular state initiated by the de novo binding of regulated transcriptional factors to their cognate genomic sites. The mechanism whereby a given regulatory factor selects a limited number of in vivo targets from a myriad of potential genomic binding sites is undetermined. Here we show that up to 95% of de novo genomic binding by the glucocorticoid receptor, a paradigmatic ligand-activated transcription factor, is targeted to preexisting foci of accessible chromatin. Factor binding invariably potentiates chromatin accessibility. Cell-selective glucocorticoid receptor occupancy patterns appear to be comprehensively predetermined by cell-specific differences in baseline chromatin accessibility patterns, with secondary contributions from local sequence features. The results define a framework for understanding regulatory factor-genome interactions and provide a molecular basis for the tissue selectivity of steroid pharmaceuticals and other agents that intersect the living genome.
••
QIMR Berghofer Medical Research Institute1, National Institutes of Health2, Harvard University3, University of Texas Health Science Center at Houston4, Mayo Clinic5, Statens Serum Institut6, University of Pittsburgh7, Northwestern University8, University of Edinburgh9, University of Minnesota10, Université de Montréal11, Washington University in St. Louis12, Johns Hopkins University13, University of Washington14, Government of Victoria15, University of Melbourne16
TL;DR: The results provide further evidence that a substantial proportion of heritability is captured by common SNPs, that height, BMI and QTi are highly polygenic traits, and that the additive variation explained by a part of the genome is approximately proportional to the total length of DNA contained within genes therein.
Abstract: We estimate and partition genetic variation for height, body mass index (BMI), von Willebrand factor and QT interval (QTi) using 586,898 SNPs genotyped on 11,586 unrelated individuals. We estimate that ∼45%, ∼17%, ∼25% and ∼21% of the variance in height, BMI, von Willebrand factor and QTi, respectively, can be explained by all autosomal SNPs and a further ∼0.5-1% can be explained by X chromosome SNPs. We show that the variance explained by each chromosome is proportional to its length, and that SNPs in or near genes explain more variation than SNPs between genes. We propose a new approach to estimate variation due to cryptic relatedness and population stratification. Our results provide further evidence that a substantial proportion of heritability is captured by common SNPs, that height, BMI and QTi are highly polygenic traits, and that the additive variation explained by a part of the genome is approximately proportional to the total length of DNA contained within genes therein.
••
TL;DR: A BAP1-related cancer syndrome is identified that is characterized by mesothelioma and uveal melanoma, and it is hypothesized that other cancers may also be involved and that mesot helioma predominates upon asbestos exposure.
Abstract: Because only a small fraction of asbestos-exposed individuals develop malignant mesothelioma, and because mesothelioma clustering is observed in some families, we searched for genetic predisposing factors. We discovered germline mutations in the gene encoding BRCA1 associated protein-1 (BAP1) in two families with a high incidence of mesothelioma, and we observed somatic alterations affecting BAP1 in familial mesotheliomas, indicating biallelic inactivation. In addition to mesothelioma, some BAP1 mutation carriers developed uveal melanoma. We also found germline BAP1 mutations in 2 of 26 sporadic mesotheliomas; both individuals with mutant BAP1 were previously diagnosed with uveal melanoma. We also observed somatic truncating BAP1 mutations and aberrant BAP1 expression in sporadic mesotheliomas without germline mutations. These results identify a BAP1-related cancer syndrome that is characterized by mesothelioma and uveal melanoma. We hypothesize that other cancers may also be involved and that mesothelioma predominates upon asbestos exposure. These findings will help to identify individuals at high risk of mesothelioma who could be targeted for early intervention.
••
TL;DR: The 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47, based on 8.3× dideoxy sequence coverage, is reported, indicating pervasive selection for a smaller genome in this outcrossing species.
Abstract: We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.
••
Medical Research Council1, University of Oxford2, National Institute for Health Research3, Structural Genomics Consortium4, University of Bristol5, University of Bath6, University of Queensland7, National Institutes of Health8, Cedars-Sinai Medical Center9, University of Toronto10, University of Alberta11, Memorial University of Newfoundland12, University of Leeds13, Norfolk and Norwich University Hospital14, Repatriation General Hospital15, University of Porto16, Sapienza University of Rome17, QIMR Berghofer Medical Research Institute18, Second Military Medical University19, Telethon Institute for Child Health Research20, Wellcome Trust Sanger Institute21, University of London22, Trinity College, Dublin23, Cardiff University24, Wellcome Trust25, Wellcome Trust Centre for Human Genetics26, St George's, University of London27, King's College London28, Churchill Hospital29, University of Leicester30, University of Cambridge31, Moorfields Eye Hospital32, University College London33, University of Texas Health Science Center at Houston34, Princess Alexandra Hospital35
TL;DR: In this paper, the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all their datasets (p < 5x 10(-6) overall, with support in each of the three datasets studied).
Abstract: Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 x 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.
••
University of Chicago1, RTI International2, Wake Forest University3, Westat4, University of Southern California5, University of California, San Francisco6, University of Arizona7, Harvard University8, Henry Ford Health System9, Johns Hopkins University10, National Institutes of Health11, University of Pittsburgh12, International Agency for Research on Cancer13, Northwestern University14, United States Department of Veterans Affairs15, Howard University16, University of the West Indies17, Cleveland Clinic18, Imperial College London19, University of Texas Medical Branch20, Washington University in St. Louis21, University of Freiburg22, University of Wisconsin-Madison23
TL;DR: The results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma.
Abstract: Asthma is a common disease with a complex risk architecture including both genetic and environmental factors. We performed a meta-analysis of North American genome-wide association studies of asthma in 5,416 individuals with asthma (cases) including individuals of European American, African American or African Caribbean, and Latino ancestry, with replication in an additional 12,649 individuals from the same ethnic groups. We identified five susceptibility loci. Four were at previously reported loci on 17q21, near IL1RL1, TSLP and IL33, but we report for the first time, to our knowledge, that these loci are associated with asthma risk in three ethnic groups. In addition, we identified a new asthma susceptibility locus at PYHIN1, with the association being specific to individuals of African descent (P = 3.9 × 10(-9)). These results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma.
••
TL;DR: It is shown that Sox9 is expressed throughout the biliary and pancreatic ductal epithelia, which are connected to the intestinal stem-cell zone, which suggests interdependence between the structure and homeostasis of endodermal organs, with Sox9 expression being linked to progenitor status.
Abstract: The liver and exocrine pancreas share a common structure, with functioning units (hepatic plates and pancreatic acini) connected to the ductal tree. Here we show that Sox9 is expressed throughout the biliary and pancreatic ductal epithelia, which are connected to the intestinal stem-cell zone. Cre-based lineage tracing showed that adult intestinal cells, hepatocytes and pancreatic acinar cells are supplied physiologically from Sox9-expressing progenitors. Combination of lineage analysis and hepatic injury experiments showed involvement of Sox9-positive precursors in liver regeneration. Embryonic pancreatic Sox9-expressing cells differentiate into all types of mature cells, but their capacity for endocrine differentiation diminishes shortly after birth, when endocrine cells detach from the epithelial lining of the ducts and form the islets of Langerhans. We observed a developmental switch in the hepatic progenitor cell type from Sox9-negative to Sox9-positive progenitors as the biliary tree develops. These results suggest interdependence between the structure and homeostasis of endodermal organs, with Sox9 expression being linked to progenitor status.
••
TL;DR: The identification of somatic mutations by exome sequencing in acute monocytic leukemia, the M5 subtype of acute myeloid leukemia (AML-M5), suggests a contribution of aberrant DNA methyltransferase activity to the pathogenesis of acute monocrytic leukemia and provides a useful new biomarker for relevant cases.
Abstract: Abnormal epigenetic regulation has been implicated in oncogenesis. We report here the identification of somatic mutations by exome sequencing in acute monocytic leukemia, the M5 subtype of acute myeloid leukemia (AML-M5). We discovered mutations in DNMT3A (encoding DNA methyltransferase 3A) in 23 of 112 (20.5%) cases. The DNMT3A mutants showed reduced enzymatic activity or aberrant affinity to histone H3 in vitro. Notably, there were alterations of DNA methylation patterns and/or gene expression profiles (such as HOXB genes) in samples with DNMT3A mutations as compared with those without such changes. Leukemias with DNMT3A mutations constituted a group of poor prognosis with elderly disease onset and of promonocytic as well as monocytic predominance among AML-M5 individuals. Screening other leukemia subtypes showed Arg882 alterations in 13.6% of acute myelomonocytic leukemia (AML-M4) cases. Our work suggests a contribution of aberrant DNA methyltransferase activity to the pathogenesis of acute monocytic leukemia and provides a useful new biomarker for relevant cases.
••
Harvard University1, Broad Institute2, University of Oxford3, Montreal Heart Institute4, Yale University5, Casa Sollievo della Sofferenza6, Örebro University7, Cedars-Sinai Medical Center8, University of Chicago9, Karolinska Institutet10, Johns Hopkins University11, University of Toronto12, University of Pittsburgh13
TL;DR: Next-generation sequencing is used to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls to identify new, rare and probably functional variants that could aid functional experiments and predictive models.
Abstract: More than 1,000 susceptibility loci have been identified through genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings have not yet been defined. Here we used pooled next-generation sequencing to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls. Through follow-up genotyping of 70 rare and low-frequency protein-altering variants in nine independent case-control series (16,054 Crohn's disease cases, 12,153 ulcerative colitis cases and 17,575 healthy controls), we identified four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association with a protective splice variant in CARD9 (P < 1 × 10(-16), odds ratio ≈ 0.29) and additional associations with coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by identifying new, rare and probably functional variants that could aid functional experiments and predictive models.
••
TL;DR: It is shown that the quantitative trait locus GS5 in rice controls grain size by regulating grain width, filling and weight and functions as a positive regulator of grain size, such that higher expression of GS5 is correlated with larger grain size.
Abstract: Increasing crop yield is one of the most important goals of plant science research. Grain size is a major determinant of grain yield in cereals and is a target trait for both domestication and artificial breeding(1). We showed that the quantitative trait locus (QTL) GS5 in rice controls grain size by regulating grain width, filling and weight. GS5 encodes a putative serine carboxypeptidase and functions as a positive regulator of grain size, such that higher expression of GS5 is correlated with larger grain size. Sequencing of the promoter region in 51 rice accessions from a wide geographic range identified three haplotypes that seem to be associated with grain width. The results suggest that natural variation in GS5 contributes to grain size diversity in rice and may be useful in improving yield in rice and, potentially, other crops(2).
••
TL;DR: It is shown that FoxA1 is a key determinant that can influence differential interactions between ER and chromatin and that CTCF was an upstream negative regulator of FOXA1-chromatin interactions.
Abstract: Estrogen receptor-α (ER) is the key feature of most breast cancers and binding of ER to the genome correlates with expression of the Forkhead protein FOXA1 (also called HNF3α). Here we show that FOXA1 is a key determinant that can influence differential interactions between ER and chromatin. Almost all ER-chromatin interactions and gene expression changes depended on the presence of FOXA1 and FOXA1 influenced genome-wide chromatin accessibility. Furthermore, we found that CTCF was an upstream negative regulator of FOXA1-chromatin interactions. In estrogen-responsive breast cancer cells, the dependency on FOXA1 for tamoxifen-ER activity was absolute; in tamoxifen-resistant cells, ER binding was independent of ligand but depended on FOXA1. Expression of FOXA1 in non-breast cancer cells can alter ER binding and function. As such, FOXA1 is a major determinant of estrogen-ER activity and endocrine response in breast cancer cells.
••
TL;DR: New studies reveal that 20% of individuals with acute myeloid leukemia harbor somatic mutations in DNMT3A (encoding DNA methyltransferase 3A), although these leukemias have some gene expression and DNA methylation changes.
Abstract: New studies reveal that 20% of individuals with acute myeloid leukemia harbor somatic mutations in DNMT3A (encoding DNA methyltransferase 3A). Although these leukemias have some gene expression and DNA methylation changes, a direct link between mutant DNMT3A, epigenetic changes and pathogenesis remains to be established.
••
University of Groningen1, Queen Mary University of London2, VU University Amsterdam3, University of Milan4, University of Pittsburgh5, Wellcome Trust Sanger Institute6, University of Naples Federico II7, Radboud University Nijmegen Medical Centre8, Sapienza University of Rome9, University of Maribor10, University of Cambridge11, University of Virginia12, University College London13, University of Delhi14, Hospital Clínico San Carlos15, University Medical Center Utrecht16, Leiden University17, University of Milano-Bicocca18
TL;DR: The complex genetic architecture of the risk regions of and refine the risk signals for celiac disease are defined, providing the next step toward uncovering the causal mechanisms of the disease.
Abstract: Using variants from the 1000 Genomes Project pilot European CEU dataset and data from additional resequencing studies, we densely genotyped 183 non-HLA risk loci previously associated with immune-mediated diseases in 12,041 individuals with celiac disease (cases) and 12,228 controls. We identified 13 new celiac disease risk loci reaching genome-wide significance, bringing the number of known loci (including the HLA locus) to 40. We found multiple independent association signals at over one-third of these loci, a finding that is attributable to a combination of common, low-frequency and rare genetic variants. Compared to previously available data such as those from HapMap3, our dense genotyping in a large sample collection provided a higher resolution of the pattern of linkage disequilibrium and suggested localization of many signals to finer scale regions. In particular, 29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements. Altogether, we define the complex genetic architecture of the risk regions of and refine the risk signals for celiac disease, providing the next step toward uncovering the causal mechanisms of the disease.