scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Centre for Human Genetics published in 2013"


Journal ArticleDOI
TL;DR: It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results.
Abstract: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.

5,683 citations


Journal ArticleDOI
TL;DR: This unit describes how to use BWA and the Genome Analysis Toolkit to map genome sequencing data to a reference and produce high‐quality variant calls that can be used in downstream analyses.
Abstract: This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

5,150 citations


Journal ArticleDOI
26 Sep 2013-Nature
TL;DR: Se sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences discover extremely widespread genetic variation affecting the regulation of most genes.
Abstract: Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

1,892 citations


Journal ArticleDOI
TL;DR: Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Abstract: Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.

1,627 citations


Journal ArticleDOI
TL;DR: The computational performance of SHAPEIT2 is competitive compared to other methods and had the property that SER decreases as sample size increases, illustrating that the SHAPEit2 model can adapt to data sets with very high SNP density.
Abstract: SHAPEIT2 uses multithreading so that multiple cores can be used to phase whole chromosomes, allowing users to make the best use of their computational resources. We tested SHAPEIT2 on several large-sample, whole-chromosome data sets from a range of SNP genotyping chips (Supplementary Note 1). SHAPEIT2 outperforms other methods (Fig. 1a–c) in terms of switch error rate (SER) and the mean distance between switch errors (Supplementary Figs. 1 and 2). As compared to SHAPEIT1, SHAPEIT2 reduced SER by as much as 45% on these data sets. For example, on 1,229 Vietnamese samples assayed on the Illumina 660K chip on chromosome 22, the SERs of SHAPEIT2 (K = 100, W = 2 Mb), SHAPEIT1 (K = 100) (ref. 2), HAPI-UR (v1.01) (ref. 4), Beagle (v3.3) (ref. 5), Impute2 v2.1.2 (K = 100) (ref. 3), MaCH v1.0.18 (K = 100) (ref. 6) and fastPHASE (v1.4) (ref. 7) were 2.87%, 4.64%, 4.75%, 5.14%, 5.57%, 6.05% and 6.34%, respectively. In general, SHAPEIT2 with low values of K outperformed SHAPEIT1 with high values of K (Fig. 1a–c). As the number of samples increased (up to ~9,000 samples in our tests), we found that SHAPEIT2 outperformed other methods and had the property that SER decreases as sample size increases (Fig. 1d). We assessed accuracy on sequence data by phasing 381 European samples from the 1000 Genomes Project (TGP) together with genotypes from two trio parents sequenced at high coverage. We found that SHAPEIT2 (K = 100, W = 0.3 Mb) reduced SER by 38% compared to Beagle (Supplementary Table 1 and Supplementary Note 2), illustrating that the SHAPEIT2 model can adapt to data sets with very high SNP density. The computational performance of SHAPEIT2 is competitive compared to other methods. Figure 1e shows the computational Improved whole-chromosome phasing for disease and population genetic studies

1,242 citations


Journal ArticleDOI
TL;DR: This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR.
Abstract: RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.

1,029 citations


Journal ArticleDOI
TL;DR: A mendelian randomization study based on data from multiple cohorts conducted by Karani Santhanakrishnan Vimaleswaran and colleagues re-examines the causal nature of the relationship between vitamin D levels and obesity.
Abstract: BACKGROUND: Obesity is associated with vitamin D deficiency, and both are areas of active public health concern. We explored the causality and direction of the relationship between body mass index (BMI) and 25-hydroxyvitamin D [25(OH)D] using genetic markers as instrumental variables (IVs) in bi-directional Mendelian randomization (MR) analysis. METHODS AND FINDINGS: We used information from 21 adult cohorts (up to 42,024 participants) with 12 BMI-related SNPs (combined in an allelic score) to produce an instrument for BMI and four SNPs associated with 25(OH)D (combined in two allelic scores, separately for genes encoding its synthesis or metabolism) as an instrument for vitamin D. Regression estimates for the IVs (allele scores) were generated within-study and pooled by meta-analysis to generate summary effects. Associations between vitamin D scores and BMI were confirmed in the Genetic Investigation of Anthropometric Traits (GIANT) consortium (n = 123,864). Each 1 kg/m(2) higher BMI was associated with 1.15% lower 25(OH)D (p = 6.52×10⁻²⁷). The BMI allele score was associated both with BMI (p = 6.30×10⁻⁶²) and 25(OH)D (-0.06% [95% CI -0.10 to -0.02], p = 0.004) in the cohorts that underwent meta-analysis. The two vitamin D allele scores were strongly associated with 25(OH)D (p≤8.07×10⁻⁵⁷ for both scores) but not with BMI (synthesis score, p = 0.88; metabolism score, p = 0.08) in the meta-analysis. A 10% higher genetically instrumented BMI was associated with 4.2% lower 25(OH)D concentrations (IV ratio: -4.2 [95% CI -7.1 to -1.3], p = 0.005). No association was seen for genetically instrumented 25(OH)D with BMI, a finding that was confirmed using data from the GIANT consortium (p≥0.57 for both vitamin D scores). CONCLUSIONS: On the basis of a bi-directional genetic approach that limits confounding, our study suggests that a higher BMI leads to lower 25(OH)D, while any effects of lower 25(OH)D increasing BMI are likely to be small. Population level interventions to reduce BMI are expected to decrease the prevalence of vitamin D deficiency.

851 citations


Journal ArticleDOI
TL;DR: A recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain, predicted to cause a defect in the correction of mispaired bases inserted during DNA replication.
Abstract: Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.

818 citations


Journal ArticleDOI
28 Feb 2013-Nature
TL;DR: Evidence for impaired replication fork progression and increased DNA replication stress in CIN+ colorectal cancer (CRC) cells relative to CIN− CRC cells is found, with structural chromosome abnormalities precipitating chromosome missegregation in mitosis.
Abstract: Cancer chromosomal instability (CIN) results in an increased rate of change of chromosome number and structure and generates intratumour heterogeneity. CIN is observed in most solid tumours and is associated with both poor prognosis and drug resistance. Understanding a mechanistic basis for CIN is therefore paramount. Here we find evidence for impaired replication fork progression and increased DNA replication stress in CIN(+) colorectal cancer (CRC) cells relative to CIN(-) CRC cells, with structural chromosome abnormalities precipitating chromosome missegregation in mitosis. We identify three new CIN-suppressor genes (PIGN (also known as MCD4), MEX3C (RKHD2) and ZNF516 (KIAA0222)) encoded on chromosome 18q that are subject to frequent copy number loss in CIN(+) CRC. Chromosome 18q loss was temporally associated with aneuploidy onset at the adenoma-carcinoma transition. CIN-suppressor gene silencing leads to DNA replication stress, structural chromosome abnormalities and chromosome missegregation. Supplementing cells with nucleosides, to alleviate replication-associated damage, reduces the frequency of chromosome segregation errors after CIN-suppressor gene silencing, and attenuates segregation errors and DNA damage in CIN(+) cells. These data implicate a central role for replication stress in the generation of structural and numerical CIN, which may inform new therapeutic approaches to limit intratumour heterogeneity.

724 citations


Journal ArticleDOI
Veryan Codd1, Christopher P. Nelson1, Eva Albrecht, Massimo Mangino2, Joris Deelen3, Jessica L. Buxton4, Jouke-Jan Hottenga5, Krista Fischer6, Tõnu Esko6, Ida Surakka7, Linda Broer, Dale R. Nyholt8, Irene Mateo Leach9, Perttu Salo, Sara Hägg10, Mary K. Matthews1, Jutta Palmen11, Giuseppe Danilo Norata, Paul F. O'Reilly4, Danish Saleheen12, Najaf Amin13, Anthony J. Balmforth14, Marian Beekman3, Rudolf A. de Boer9, Stefan Böhringer3, Peter S. Braund1, Paul Burton1, Anton J. M. de Craen3, Matthew Denniff1, Yanbin Dong15, Konstantinos Douroudis6, Elena Dubinina1, Johan G. Eriksson, Katia Garlaschelli, Dehuang Guo15, Anna-Liisa Hartikainen16, Anjali K. Henders8, Jeanine J. Houwing-Duistermaat3, Laura Kananen7, Lennart C. Karssen13, Johannes Kettunen7, Norman Klopp, Vasiliki Lagou17, Elisabeth M. van Leeuwen13, Pamela A. F. Madden18, Reedik Mägi6, Patrik K. E. Magnusson10, Satu Männistö19, Satu Männistö20, Mark I. McCarthy21, Mark I. McCarthy17, Mark I. McCarthy22, Sarah E. Medland8, Evelin Mihailov6, Grant W. Montgomery8, Ben A. Oostra13, Aarno Palotie, Annette Peters, Helen Pollard1, Anneli Pouta16, Anneli Pouta20, Inga Prokopenko17, Samuli Ripatti, Veikko Salomaa20, Veikko Salomaa19, H. Eka D. Suchiman3, Ana M. Valdes2, Niek Verweij9, Ana Viñuela2, Xiaoling Wang23, Xiaoling Wang24, H-Erich Wichmann25, Elisabeth Widen7, Gonneke Willemsen5, Margaret J. Wright8, Kai Xia26, Xiangjun Xiao27, Dirk J. van Veldhuisen9, Alberico L. Catapano28, Martin D. Tobin1, Alistair S. Hall14, Alexandra I. F. Blakemore4, Wiek H. van Gilst9, Haidong Zhu24, Haidong Zhu23, Jeanette Erdmann, Muredach P. Reilly29, Sekar Kathiresan30, Sekar Kathiresan31, Heribert Schunkert, Philippa J. Talmud11, Nancy L. Pedersen10, Markus Perola6, Markus Perola7, Markus Perola20, Willem H. Ouwehand, Jaakko Kaprio, Nicholas G. Martin8, Cornelia M. van Duijn, Iiris Hovatta7, Iiris Hovatta20, Christian Gieger11, Andres Metspalu6, Dorret I. Boomsma5, Marjo-Riitta Järvelin, P. Eline Slagboom3, John R Thompson1, Tim D. Spector2, Pim van der Harst1, Nilesh J. Samani32, Nilesh J. Samani1 
TL;DR: In this paper, a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals was carried out to identify seven loci, including five new loci associated with mean leukocyte telomere length (LTL) (P < 5 × 10−8).
Abstract: Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10(-8)). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5-35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases.

703 citations


Journal ArticleDOI
25 Apr 2013-Cell
TL;DR: This data indicates that suppression of miRNAs inducible regions of the eukaryotes through “silencing” other mRNAs using a “spatially aggregating” mechanism is likely to be a viable strategy for combating infectious disease.

Journal ArticleDOI
Sonja I. Berndt1, Stefan Gustafsson2, Stefan Gustafsson3, Reedik Mägi4  +382 moreInstitutions (117)
TL;DR: A genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry finds a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.
Abstract: Approaches exploiting trait distribution extremes may be used to identify loci associated with common traits, but it is unknown whether these loci are generalizable to the broader population. In a genome-wide search for loci associated with the upper versus the lower 5th percentiles of body mass index, height and waist-to-hip ratio, as well as clinical classes of obesity, including up to 263,407 individuals of European ancestry, we identified 4 new loci (IGFBP4, H6PD, RSRC1 and PPP2R2A) influencing height detected in the distribution tails and 7 new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3 and ZZZ3) for clinical classes of obesity. Further, we find a large overlap in genetic structure and the distribution of variants between traits based on extremes and the general population and little etiological heterogeneity between obesity subgroups.

Journal ArticleDOI
18 Jul 2013-Cell
TL;DR: A program, the Sanger Institute Mouse Genetics Project, that provides a step toward the aim of knocking out all genes and screening each line for a broad range of traits is described and it is found that hitherto unpublished genes were as likely to reveal phenotypes as known genes, suggesting that novel genes represent a rich resource for investigating the molecular basis of disease.

Journal ArticleDOI
TL;DR: An analysis of genome variation in 825 P. falciparum samples from Asia and Africa is described that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia, and a catalog of SNPs that show high levels of differentiation in the art Artemisinin-resistant subpopulations are provided.
Abstract: We describe an analysis of genome variation in 825 P. falciparum samples from Asia and Africa that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia. Within this relatively small geographic area, we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalog of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in transporter proteins and DNA mismatch repair proteins. These data provide a population-level genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist in its elimination.

Journal ArticleDOI
TL;DR: A genetic-pleiotropy-informed method for improving gene discovery with the use of GWAS summary-statistics data and enrichment of SNPs associated with schizophrenia (SCZ) as a function of the association with several CVD risk factors and a corresponding reduction in false discovery rate is presented.
Abstract: Several lines of evidence suggest that genome-wide association studies (GWASs) have the potential to explain more of the "missing heritability" of common complex phenotypes. However, reliable methods for identifying a larger proportion of SNPs are currently lacking. Here, we present a genetic-pleiotropy-informed method for improving gene discovery with the use of GWAS summary-statistics data. We applied this methodology to identify additional loci associated with schizophrenia (SCZ), a highly heritable disorder with significant missing heritability. Epidemiological and clinical studies suggest comorbidity between SCZ and cardiovascular-disease (CVD) risk factors, including systolic blood pressure, triglycerides, low- and high-density lipoprotein, body mass index, waist-to-hip ratio, and type 2 diabetes. Using stratified quantile-quantile plots, we show enrichment of SNPs associated with SCZ as a function of the association with several CVD risk factors and a corresponding reduction in false discovery rate (FDR). We validate this "pleiotropic enrichment" by demonstrating increased replication rate across independent SCZ substudies. Applying the stratified FDR method, we identified 25 loci associated with SCZ at a conditional FDR level of 0.01. Of these, ten loci are associated with both SCZ and CVD risk factors, mainly triglycerides and low- and high-density lipoproteins but also waist-to-hip ratio, systolic blood pressure, and body mass index. Together, these findings suggest the feasibility of using genetic-pleiotropy-informed methods for improving gene discovery in SCZ and identifying potential mechanistic relationships with various CVD risk factors.

Journal ArticleDOI
TL;DR: The SHAPEIT2 method is extended to use phase-informative sequencing reads to improve phasing accuracy and is primarily designed for high-coverage sequence data or data sets that already have genotypes called.
Abstract: High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5–20 kb read with 4%–15% error per base), phasing performance was substantially improved.


Journal ArticleDOI
TL;DR: The results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.
Abstract: Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of methylation variation and how it is regulated by genetic and environmental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h2median = 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index [BMI]). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.

Journal ArticleDOI
TL;DR: It is proposed that gene chromatin ancestrally designates hot spots within eukaryotes and PRDM9 is a derived state within vertebrates and the formation or processing of meiotic DNA double-strand breaks is promoted.
Abstract: PRDM9 directs human meiotic crossover hot spots to intergenic sequence motifs, whereas budding yeast hot spots overlap regions of low nucleosome density (LND) in gene promoters. To investigate hot spots in plants, which lack PRDM9, we used coalescent analysis of genetic variation in Arabidopsis thaliana. Crossovers increased toward gene promoters and terminators, and hot spots were associated with active chromatin modifications, including H2A.Z, histone H3 Lys4 trimethylation (H3K4me3), LND and low DNA methylation. Hot spot-enriched A-rich and CTT-repeat DNA motifs occurred upstream and downstream, respectively, of transcriptional start sites. Crossovers were asymmetric around promoters and were most frequent over CTT-repeat motifs and H2A.Z nucleosomes. Pollen typing, segregation and cytogenetic analysis showed decreased numbers of crossovers in the arp6 H2A.Z deposition mutant at multiple scales. During meiosis, H2A.Z forms overlapping chromosomal foci with the DMC1 and RAD51 recombinases. As arp6 reduced the number of DMC1 or RAD51 foci, H2A.Z may promote the formation or processing of meiotic DNA double-strand breaks. We propose that gene chromatin ancestrally designates hot spots within eukaryotes and PRDM9 is a derived state within vertebrates.

Journal ArticleDOI
TL;DR: It is found that missense POLE EDMs with good evidence of pathogenic effects are present in 7% of a set of 173 endometrial cancers, although POLD1 EDMs are uncommon.
Abstract: Accurate duplication of DNA prior to cell division is essential to suppress mutagenesis and tumour development. The high fidelity of eukaryotic DNA replication is due to a combination of accurate incorporation of nucleotides into the nascent DNA strand by DNA polymerases, the recognition and removal of mispaired nucleotides (proofreading) by the exonuclease activity of DNA polymerases δ and e, and post-replication surveillance and repair of newly synthesized DNA by the mismatch repair (MMR) apparatus. While the contribution of defective MMR to neoplasia is well recognized, evidence that faulty DNA polymerase activity is important in cancer development has been limited. We have recently shown that germline POLE and POLD1 exonuclease domain mutations (EDMs) predispose to colorectal cancer (CRC) and, in the latter case, to endometrial cancer (EC). Somatic POLE mutations also occur in 5-10% of sporadic CRCs and underlie a hypermutator, microsatellite-stable molecular phenotype. We hypothesized that sporadic ECs might also acquire somatic POLE and/or POLD1 mutations. Here, we have found that missense POLE EDMs with good evidence of pathogenic effects are present in 7% of a set of 173 endometrial cancers, although POLD1 EDMs are uncommon. The POLE mutations localized to highly conserved residues and were strongly predicted to affect proofreading. Consistent with this, POLE-mutant tumours were hypermutated, with a high frequency of base substitutions, and an especially large relative excess of G:C>T:A transversions. All POLE EDM tumours were microsatellite stable, suggesting that defects in either DNA proofreading or MMR provide alternative mechanisms to achieve genomic instability and tumourigenesis.

Journal ArticleDOI
TL;DR: The whole genome-based antimicrobial resistance prediction in clinical isolates of Escherichia coli and Klebsiella pneumoniae was as sensitive and specific as routinely deployed phenotypic methods.
Abstract: sitivityofgenome-basedresistancepredictionacrossallantibioticsforbothspecieswas0.96(95%CI:0.94‐0.98)and the specificity was 0.97 (95% CI: 0.95‐0.98). Very major and major error rates were 1.2% and 2.1%, respectively. Conclusions: Our method was as sensitive and specific as routinely deployed phenotypic methods. Validation against larger datasets and formal assessments of cost and turnaround time in a routine laboratory setting are warranted.

Journal ArticleDOI
29 Mar 2013-Science
TL;DR: Findings indicate that ancient balancing selection has shaped human variation and point to genes involved in host-pathogen interactions as common targets.
Abstract: Instances in which natural selection maintains genetic variation in a population over millions of years are thought to be extremely rare. We conducted a genome-wide scan for long-lived balancing selection by looking for combinations of SNPs shared between humans and chimpanzees. In addition to the major histocompatibility complex (MHC), we identified 125 regions in which the same haplotypes are segregating in the two species, all but two of which are non-coding. In six cases, there is evidence for an ancestral polymorphism that persisted to the present in humans and chimpanzees. Regions with shared haplotypes are significantly enriched for membrane glycoproteins, and a similar trend is seen among shared coding polymorphisms. These findings indicate that ancient balancing selection has shaped human variation and point to genes involved in host-pathogen interactions as common targets.

Journal ArticleDOI
TL;DR: Skin showed the most age-related gene expression changes of all the tissues investigated, with many of the genes being previously implicated in fatty acid metabolism, mitochondrial activity, cancer and splicing.
Abstract: Background: Previous studies have demonstrated that gene expression levels change with age These changes are hypothesized to influence the aging rate of an individual We analyzed gene expression changes with age in abdominal skin, subcutaneous adipose tissue and lymphoblastoid cell lines in 856 female twins in the age range of 39-85 years Additionally, we investigated genotypic variants involved in genotype-by-age interactions to understand how the genomic regulation of gene expression alters with age Results: Using a linear mixed model, differential expression with age was identified in 1,672 genes in skin and 188 genes in adipose tissue Only two genes expressed in lymphoblastoid cell lines showed significant changes with age Genes significantly regulated by age were compared with expression profiles in 10 brain regions from 100 postmortem brains aged 16 to 83 years We identified only one age-related gene common to the three tissues There were 12 genes that showed differential expression with age in both skin and brain tissue and three common to adipose and brain tissues Conclusions: Skin showed the most age-related gene expression changes of all the tissues investigated, with many of the genes being previously implicated in fatty acid metabolism, mitochondrial activity, cancer and splicing A significant proportion of age-related changes in gene expression appear to be tissue-specific with only a few genes sharing an age effect in expression across tissues More research is needed to improve our understanding of the genetic influences on aging and the relationship with age-related diseases

Journal ArticleDOI
TL;DR: The evidence is presented for POLE and POLD1 as important contributors to the pathogenesis of CRC and EC, and some of the key questions in this emerging field are highlighted.
Abstract: Polymerases ϵ and δ are the main enzymes that replicate eukaryotic DNA. Accurate replication occurs through Watson–Crick base pairing and also through the action of the polymerases' exonuclease (proofreading) domains. We have recently shown that germline exonuclease domain mutations (EDMs) of POLE and POLD1 confer a high risk of multiple colorectal adenomas and carcinoma (CRC). POLD1 mutations also predispose to endometrial cancer (EC). These mutations are associated with high penetrance and dominant inheritance, although the phenotype can be variable. We have named the condition polymerase proofreading-associated polyposis (PPAP). Somatic POLE EDMs have also been found in sporadic CRCs and ECs, although very few somatic POLD1 EDMs have been detected. Both the germline and the somatic DNA polymerase EDMs cause an ‘ultramutated’, apparently microsatellite-stable, type of cancer, sometimes leading to over a million base substitutions per tumour. Here, we present the evidence for POLE and POLD1 as important contributors to the pathogenesis of CRC and EC, and highlight some of the key questions in this emerging field. Copyright © 2013 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: It is shown that missense mutations of AP2 σ subunit (AP2S1) affecting Arg15, which forms key contacts with dileucine-based motifs of CCV cargo proteins, result in familial hypocalciuric hypercalcemia type 3 (FHH3), an extracellular calcium homeostasis disorder affecting the parathyroids, kidneys and bone.
Abstract: Adaptor protein-2 (AP2), a central component of clathrin-coated vesicles (CCVs), is pivotal in clathrin-mediated endocytosis, which internalizes plasma membrane constituents such as G protein-coupled receptors (GPCRs). AP2, a heterotetramer of α, β, μ and σ subunits, links clathrin to vesicle membranes and binds to tyrosine- and dileucine-based motifs of membrane-associated cargo proteins. Here we show that missense mutations of AP2 σ subunit (AP2S1) affecting Arg15, which forms key contacts with dileucine-based motifs of CCV cargo proteins, result in familial hypocalciuric hypercalcemia type 3 (FHH3), an extracellular calcium homeostasis disorder affecting the parathyroids, kidneys and bone. We found AP2S1 mutations in >20% of cases of FHH without mutations in calcium-sensing GPCR (CASR), which cause FHH1. AP2S1 mutations decreased the sensitivity of CaSR-expressing cells to extracellular calcium and reduced CaSR endocytosis, probably through loss of interaction with a C-terminal CaSR dileucine-based motif, whose disruption also decreased intracellular signaling. Thus, our results identify a new role for AP2 in extracellular calcium homeostasis.

Journal ArticleDOI
TL;DR: It is found that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection than SNPs, and the causal variant underlying some of these associations may be indels.
Abstract: Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

Journal ArticleDOI
TL;DR: In this paper, the authors characterized the clinicopathological features of BAN CRCs and interrogated their genomes using mutation profiling and high-density single nucleotide polymorphism (SNP) arrays and compared findings to CAU CRCs.
Abstract: Background: Prevalence of colorectal cancer (CRC) in the British Bangladeshi population (BAN) is low compared to British Caucasians (CAU). Genetic background may influence mutations and disease features. Methods: We characterized the clinicopathological features of BAN CRCs and interrogated their genomes using mutation profiling and high-density single nucleotide polymorphism (SNP) arrays and compared findings to CAU CRCs. Results: Age of onset of BAN CRC was significantly lower than for CAU patients (p=3.0 x 10 -5 ) and this difference was not due to Lynch syndrome or the polyposis syndromes. KRAS mutations in BAN microsatellite stable (MSS) CRCs were comparatively rare (5.4%) compared to CAU MSS CRCs (25%; p=0.04), which correlates with the high percentage of mucinous histotype observed (31%) in the BAN samples. No BRAF mutations was seen in our BAN MSS CRCs (CAU CRCs, 12%; p=0.08). Array data revealed similar patterns of gains (chromosome 7 and 8q), losses (8p, 17p and 18q) and LOH (4q, 17p and 18q) in BAN and CAU CRCs. A small deletion on chromosome 16p13.2 involving the alternative splicing factor RBFOX1 only was found in significantly more BAN (50%) than CAU CRCs (15%) cases (p=0.04). Focal deletions targeting the 5’ end of the gene were also identified. Novel RBFOX1 mutations were found in CRC cell lines and tumours; mRNA and protein expression was reduced in tumours. Conclusions: KRAS mutations were rare in BAN MSS CRC and a mucinous histotype common. Loss of RBFOX1 may explain the anomalous splicing activity associated with CRC.

Journal ArticleDOI
13 Jun 2013-Nature
TL;DR: The data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.
Abstract: A search for variants in coding exons of 25 genome-wide association study risk genes in a large cohort of autoimmune patients finds that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility, arguing against the previously proposed rare-variant synthetic genome-wide association hypothesis. Although many common variants of modest-effect size have been identified in genome-wide association studies (GWAS), much of the heritability of complex traits remains unexplained. These authors looked for variants in coding exons of 25 GWAS risk genes in a large cohort of subjects with six autoimmune disease phenotypes and controls, and show that rare coding-region variants at known loci have at most a minor role in common autoimmune disease susceptibility. These results do not support the theory that the missing heritability for common autoimmune diseases is attributable to rare coding mutations at known loci, but are consistent with disease caused by many common-variant loci of weak effect. Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute1,2. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set3,4,5. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis6 (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect7,8,9,10.

Journal ArticleDOI
17 Jan 2013-Nature
TL;DR: It is shown that rare PTVs in the p53-inducible protein phosphatase PPM1D are associated with predisposition to breast cancer and ovarian cancer and Functional studies demonstrate that the mutations result in enhanced suppression of p53 in response to ionizing radiation exposure.
Abstract: Improved sequencing technologies offer unprecedented opportunities for investigating the role of rare genetic variation in common disease. However, there are considerable challenges with respect to study design, data analysis and replication. Using pooled next-generation sequencing of 507 genes implicated in the repair of DNA in 1,150 samples, an analytical strategy focused on protein-truncating variants (PTVs) and a large-scale sequencing case-control replication experiment in 13,642 individuals, here we show that rare PTVs in the p53-inducible protein phosphatase PPM1D are associated with predisposition to breast cancer and ovarian cancer. PPM1D PTV mutations were present in 25 out of 7,781 cases versus 1 out of 5,861 controls (P = 1.12 × 10-5), including 18 mutations in 6,912 individuals with breast cancer (P = 2.42 × 10-4) and 12 mutations in 1,121 individuals with ovarian cancer (P = 3.10 × 10-9). Notably, all of the identified PPM1D PTVs were mosaic in lymphocyte DNA and clustered within a 370-base-pair region in the final exon of the gene, carboxy-terminal to the phosphatase catalytic domain. Functional studies demonstrate that the mutations result in enhanced suppression of p53 in response to ionizing radiation exposure, suggesting that the mutant alleles encode hyperactive PPM1D isoforms. Thus, although the mutations cause premature protein truncation, they do not result in the simple loss-of-function effect typically associated with this class of variant, but instead probably have a gain-of-function effect. Our results have implications for the detection and management of breast and ovarian cancer risk. More generally, these data provide new insights into the role of rare and of mosaic genetic variants in common conditions, and the use of sequencing in their identification.

Journal ArticleDOI
TL;DR: The number of susceptibility loci with genome-wide significant association with allergic sensitization was increased from three to ten, including SNPs in or near TLR6, C11orf30, STAT6, SLC25A46, HLA-DQB1, IL1RL1, LPP, MYC, IL2 and Hla-B, to provide new insights into the etiology of allergic disease.
Abstract: Allergen-specific immunoglobulin E (present in allergic sensitization) has a central role in the pathogenesis of allergic disease. We performed the first large-scale genome-wide association study (GWAS) of allergic sensitization in 5,789 affected individuals and 10,056 controls and followed up the top SNP at each of 26 loci in 6,114 affected individuals and 9,920 controls. We increased the number of susceptibility loci with genome-wide significant association with allergic sensitization from three to ten, including SNPs in or near TLR6, C11orf30, STAT6, SLC25A46, HLA-DQB1, IL1RL1, LPP, MYC, IL2 and HLA-B. All the top SNPs were associated with allergic symptoms in an independent study. Risk-associated variants at these ten loci were estimated to account for at least 25% of allergic sensitization and allergic rhinitis. Understanding the molecular mechanisms underlying these associations may provide new insights into the etiology of allergic disease.