scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2011"


Journal ArticleDOI
10 Aug 2011-Nature
TL;DR: In this article, a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, they have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci.
Abstract: Multiple sclerosis is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability. Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals, and systematic attempts to identify linkage in multiplex families have confirmed that variation within the major histocompatibility complex (MHC) exerts the greatest individual effect on risk. Modestly powered genome-wide association studies (GWAS) have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects have a key role in disease susceptibility. Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9,772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the HLA-DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the class I region. Immunologically relevant genes are significantly overrepresented among those mapping close to the identified loci and particularly implicate T-helper-cell differentiation in the pathogenesis of multiple sclerosis.

2,511 citations


Journal ArticleDOI
TL;DR: With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.
Abstract: COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

2,270 citations



Journal ArticleDOI
13 Jul 2011-Nature
TL;DR: A more detailed history of human population sizes between approximately ten thousand and a million years ago is presented, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male, a Korean male, three European individuals, and two Yoruba males.
Abstract: The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.

1,943 citations


Journal ArticleDOI
Georg Ehret1, Georg Ehret2, Georg Ehret3, Patricia B. Munroe4  +388 moreInstitutions (110)
06 Oct 2011-Nature
TL;DR: A genetic risk score based on 29 genome-wide significant variants was associated with hypertension, left ventricular wall thickness, stroke and coronary artery disease, but not kidney disease or kidney function, and these findings suggest potential novel therapeutic pathways for cardiovascular disease prevention.
Abstract: Blood pressure is a heritable trait(1) influenced by several biological pathways and responsive to environmental stimuli. Over one billion people worldwide have hypertension (>= 140 mm Hg systolic blood pressure or >= 90 mm Hg diastolic blood pressure)(2). Even small increments in blood pressure are associated with an increased risk of cardiovascular events(3). This genome-wide association study of systolic and diastolic blood pressure, which used a multi-stage design in 200,000 individuals of European descent, identified sixteen novel loci: six of these loci contain genes previously known or suspected to regulate blood pressure (GUCY1A3-GUCY1B3, NPR3-C5orf23, ADM, FURIN-FES, GOSR2, GNAS-EDN3); the other ten provide new clues to blood pressure physiology. A genetic risk score based on 29 genome-wide significant variants was associated with hypertension, left ventricular wall thickness, stroke and coronary artery disease, but not kidney disease or kidney function. We also observed associations with blood pressure in East Asian, South Asian and African ancestry individuals. Our findings provide new insights into the genetics and biology of blood pressure, and suggest potential novel therapeutic pathways for cardiovascular disease prevention.

1,829 citations


Journal ArticleDOI
Paul Hollingworth1, Denise Harold1, Rebecca Sims1, Amy Gerrish1  +174 moreInstitutions (59)
TL;DR: Meta-analyses of all data provided compelling evidence that ABCA7 and the MS4A gene cluster are new Alzheimer's disease susceptibility loci and independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance.
Abstract: We sought to identify new susceptibility loci for Alzheimer's disease through a staged association study (GERAD+) and by testing suggestive loci reported by the Alzheimer's Disease Genetic Consortium (ADGC) in a companion paper. We undertook a combined analysis of four genome-wide association datasets (stage 1) and identified ten newly associated variants with P ≤ 1 × 10−5. We tested these variants for association in an independent sample (stage 2). Three SNPs at two loci replicated and showed evidence for association in a further sample (stage 3). Meta-analyses of all data provided compelling evidence that ABCA7 (rs3764650, meta P = 4.5 × 10−17; including ADGC data, meta P = 5.0 × 10−21) and the MS4A gene cluster (rs610932, meta P = 1.8 × 10−14; including ADGC data, meta P = 1.2 × 10−16) are new Alzheimer's disease susceptibility loci. We also found independent evidence for association for three loci reported by the ADGC, which, when combined, showed genome-wide significance: CD2AP (GERAD+, P = 8.0 × 10−4; including ADGC data, meta P = 8.6 × 10−9), CD33 (GERAD+, P = 2.2 × 10−4; including ADGC data, meta P = 1.6 × 10−9) and EPHA1 (GERAD+, P = 3.4 × 10−4; including ADGC data, meta P = 6.0 × 10−10).

1,771 citations


Journal ArticleDOI
TL;DR: This paper performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals.
Abstract: We performed a meta-analysis of 14 genome-wide association studies of coronary artery disease (CAD) comprising 22,233 individuals with CAD (cases) and 64,762 controls of European descent followed by genotyping of top association signals in 56,682 additional individuals. This analysis identified 13 loci newly associated with CAD at P < 5 - 10'8 and confirmed the association of 10 of 12 previously reported CAD loci. The 13 new loci showed risk allele frequencies ranging from 0.13 to 0.91 and were associated with a 6% to 17% increase in the risk of CAD per allele. Notably, only three of the new loci showed significant association with traditional CAD risk factors and the majority lie in gene regions not previously implicated in the pathogenesis of CAD. Finally, five of the new CAD risk loci appear to have pleiotropic effects, showing strong association with various other human diseases or traits.

1,705 citations


Journal ArticleDOI
16 Jun 2011-Nature
TL;DR: High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.
Abstract: Gene targeting in embryonic stem cells has become the principal technology for manipulation of the mouse genome, offering unrivalled accuracy in allele design and access to conditional mutagenesis. To bring these advantages to the wider research community, large-scale mouse knockout programmes are producing a permanent resource of targeted mutations in all protein-coding genes. Here we report the establishment of a high-throughput gene-targeting pipeline for the generation of reporter-tagged, conditional alleles. Computational allele design, 96-well modular vector construction and high-efficiency gene-targeting strategies have been combined to mutate genes on an unprecedented scale. So far, more than 12,000 vectors and 9,000 conditional targeted alleles have been produced in highly germline-competent C57BL/6N embryonic stem cells. High-throughput genome engineering highlighted by this study is broadly applicable to rat and human stem cells and provides a foundation for future genome-wide efforts aimed at deciphering the function of all genes encoded by the mammalian genome.

1,538 citations


Journal ArticleDOI
15 Sep 2011-Nature
TL;DR: These sequences provide a starting point for a new era in the functional analysis of a key model organism and show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus.
Abstract: We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.

1,453 citations


Journal ArticleDOI
TL;DR: An overview of the project and the resources it is generating and the application of ENCODE data to interpret the human genome are provided.
Abstract: The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

1,446 citations


Journal ArticleDOI
TL;DR: Time courses obtained by targeted qPCR revealed that ‘blooms’ in specific bacterial groups occurred rapidly after a dietary change, and these were rapidly reversed by the subsequent diet.
Abstract: The populations of dominant species within the human colonic microbiota can potentially be modified by dietary intake with consequences for health. Here we examined the influence of precisely controlled diets in 14 overweight men. Volunteers were provided successively with a control diet, diets high in resistant starch (RS) or non-starch polysaccharides (NSPs) and a reduced carbohydrate weight loss (WL) diet, over 10 weeks. Analysis of 16S rRNA sequences in stool samples of six volunteers detected 320 phylotypes (defined at >98% identity) of which 26, including 19 cultured species, each accounted for >1% of sequences. Although samples clustered more strongly by individual than by diet, time courses obtained by targeted qPCR revealed that 'blooms' in specific bacterial groups occurred rapidly after a dietary change. These were rapidly reversed by the subsequent diet. Relatives of Ruminococcus bromii (R-ruminococci) increased in most volunteers on the RS diet, accounting for a mean of 17% of total bacteria compared with 3.8% on the NSP diet, whereas the uncultured Oscillibacter group increased on the RS and WL diets. Relatives of Eubacterium rectale increased on RS (to mean 10.1%) but decreased, along with Collinsella aerofaciens, on WL. Inter-individual variation was marked, however, with >60% of RS remaining unfermented in two volunteers on the RS diet, compared to <4% in the other 12 volunteers; these two individuals also showed low numbers of R-ruminococci (<1%). Dietary non-digestible carbohydrate can produce marked changes in the gut microbiota, but these depend on the initial composition of an individual's gut microbiota.

Journal ArticleDOI
07 Jul 2011-Nature
TL;DR: The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease.
Abstract: Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.

Journal ArticleDOI
Carl A. Anderson1, Gabrielle Boucher2, Charlie W. Lees3, Andre Franke4, Mauro D'Amato5, Kent D. Taylor6, James Lee7, Philippe Goyette2, Marcin Imielinski8, Anna Latiano9, Caroline Lagacé2, Regan Scott10, Leila Amininejad11, Suzannah Bumpstead1, Leonard Baidoo10, Robert N. Baldassano8, Murray L. Barclay12, Theodore M. Bayless13, Stephan Brand14, Carsten Büning15, Jean-Frederic Colombel16, Lee A. Denson17, Martine De Vos18, Marla Dubinsky6, Cathryn Edwards19, David Ellinghaus4, Rudolf S N Fehrmann20, James A B Floyd1, Timothy H. Florin21, Denis Franchimont11, Lude Franke20, Michel Georges22, Jürgen Glas14, Nicole L. Glazer23, Stephen L. Guthery24, Talin Haritunians6, Nicholas K. Hayward25, Jean-Pierre Hugot26, Gilles Jobin2, Debby Laukens18, Ian C. Lawrance27, Marc Lémann26, Arie Levine28, Cécile Libioulle22, Edouard Louis22, Dermot P.B. McGovern6, Monica Milla, Grant W. Montgomery25, Katherine I. Morley1, Craig Mowat29, Aylwin Ng30, William G. Newman31, Roel A. Ophoff32, Laura Papi33, Orazio Palmieri9, Laurent Peyrin-Biroulet, Julián Panés, Anne M. Phillips29, Natalie J. Prescott34, Deborah D. Proctor35, Rebecca L. Roberts12, Richard K Russell36, Paul Rutgeerts37, Jeremy D. Sanderson38, Miquel Sans39, Philip Schumm40, Frank Seibold41, Yashoda Sharma35, Lisa A. Simms25, Mark Seielstad42, Mark Seielstad43, A. Hillary Steinhart44, Stephan R. Targan6, Leonard H. van den Berg32, Morten H. Vatn45, Hein W. Verspaget46, Thomas D. Walters44, Cisca Wijmenga20, David C. Wilson3, Harm-Jan Westra20, Ramnik J. Xavier30, Zhen Zhen Zhao25, Cyriel Y. Ponsioen47, Vibeke Andersen48, Leif Törkvist5, Maria Gazouli49, Nicholas P. Anagnou49, Tom H. Karlsen45, Limas Kupčinskas50, Jurgita Sventoraityte50, John C. Mansfield51, Subra Kugathasan52, Mark S. Silverberg44, Jonas Halfvarson53, Jerome I. Rotter6, Christopher G. Mathew34, Anne M. Griffiths44, Richard B. Gearry12, Tariq Ahmad, Steven R. Brant13, Mathias Chamaillard54, Jack Satsangi3, Judy H. Cho35, Stefan Schreiber4, Mark J. Daly30, Jeffrey C. Barrett1, Miles Parkes7, Vito Annese9, Hakon Hakonarson55, Graham L. Radford-Smith25, Richard H. Duerr10, Severine Vermeire37, Rinse K. Weersma20, John D. Rioux2 
Wellcome Trust Sanger Institute1, Université de Montréal2, University of Edinburgh3, University of Kiel4, Karolinska Institutet5, Cedars-Sinai Medical Center6, University of Cambridge7, University of Pennsylvania8, Casa Sollievo della Sofferenza9, University of Pittsburgh10, Université libre de Bruxelles11, University of Otago12, Johns Hopkins University13, Ludwig Maximilian University of Munich14, Charité15, Lille University of Science and Technology16, Cincinnati Children's Hospital Medical Center17, Ghent University18, Torbay Hospital19, University of Groningen20, Mater Health Services21, University of Liège22, University of Washington23, University of Utah24, QIMR Berghofer Medical Research Institute25, University of Paris26, University of Western Australia27, Tel Aviv University28, University of Dundee29, Harvard University30, University of Manchester31, Utrecht University32, University of Florence33, King's College London34, Yale University35, Royal Hospital for Sick Children36, Katholieke Universiteit Leuven37, Guy's and St Thomas' NHS Foundation Trust38, University of Barcelona39, University of Chicago40, University of Bern41, Agency for Science, Technology and Research42, University of California, San Francisco43, University of Toronto44, University of Oslo45, Leiden University46, University of Amsterdam47, Aarhus University48, National and Kapodistrian University of Athens49, Lithuanian University of Health Sciences50, Newcastle University51, Emory University52, Örebro University53, French Institute of Health and Medical Research54, Center for Applied Genomics55
TL;DR: A meta-analysis of six ulcerative colitis genome-wide association study datasets found many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1.
Abstract: Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.

Journal ArticleDOI
27 Jan 2011-Nature
TL;DR: The protein coding exome is sequenced in a series of primary ccRCC and the identification of the SWI/SNF chromatin remodelling complex gene PBRM1 is reported as a second majorccRCC cancer gene, with truncating mutations in 41% (92/227) of cases.
Abstract: The genetics of renal cancer is dominated by inactivation of the VHL tumour suppressor gene in clear cell carcinoma (ccRCC), the commonest histological subtype. A recent large-scale screen of ∼3,500 genes by PCR-based exon re-sequencing identified several new cancer genes in ccRCC including UTX (also known as KDM6A), JARID1C (also known as KDM5C) and SETD2 (ref. 2). These genes encode enzymes that demethylate (UTX, JARID1C) or methylate (SETD2) key lysine residues of histone H3. Modification of the methylation state of these lysine residues of histone H3 regulates chromatin structure and is implicated in transcriptional control. However, together these mutations are present in fewer than 15% of ccRCC, suggesting the existence of additional, currently unidentified cancer genes. Here, we have sequenced the protein coding exome in a series of primary ccRCC and report the identification of the SWI/SNF chromatin remodelling complex gene PBRM1 (ref. 4) as a second major ccRCC cancer gene, with truncating mutations in 41% (92/227) of cases. These data further elucidate the somatic genetic architecture of ccRCC and emphasize the marked contribution of aberrant chromatin biology.

Journal ArticleDOI
TL;DR: Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes and were associated with down-regulation of key gene networks, including core mitochondrial pathways.
Abstract: BACKGROUND Myelodysplastic syndromes are a diverse and common group of chronic hematologic cancers. The identification of new genetic lesions could facilitate new diagnostic and therapeutic strategies. METHODS We used massively parallel sequencing technology to identify somatically acquired point mutations across all protein-coding exons in the genome in 9 patients with low-grade myelodysplasia. Targeted resequencing of the gene encoding RNA splicing factor 3B, subunit 1 (SF3B1), was also performed in a cohort of 2087 patients with myeloid or other cancers. RESULTS We identified 64 point mutations in the 9 patients. Recurrent somatically acquired mutations were identified in SF3B1. Follow-up revealed SF3B1 mutations in 72 of 354 patients (20%) with myelodysplastic syndromes, with particularly high frequency among patients whose disease was characterized by ring sideroblasts (53 of 82 [65%]). The gene was also mutated in 1 to 5% of patients with a variety of other tumor types. The observed mutations were less deleterious than was expected on the basis of chance, suggesting that the mutated protein retains structural integrity with altered function. SF3B1 mutations were associated with down-regulation of key gene networks, including core mitochondrial pathways. Clinically, patients with SF3B1 mutations had fewer cytopenias and longer event-free survival than patients without SF3B1 mutations. CONCLUSIONS Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes. (Funded by the Wellcome Trust and others.).

Journal ArticleDOI
TL;DR: The Catalogue Of Somatic Mutations In Cancer (COSMIC), one of the largest repositories of information on somatic mutations in human cancer, curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource inhuman cancer genetics.
Abstract: The Catalogue Of Somatic Mutations In Cancer (COSMIC) [1] is one of the largest repositories of information on somatic mutations in human cancer. The project has been running for more than ten years as part of the Cancer Genome Project (CGP) at the Wellcome Trust Sanger Institute in the UK. The data in COSMIC are curated from a variety of sources, primarily the scientific literature and large international consortia. The project includes information from the CGP, along with data from other consortia such as the International Cancer Genome Consortium and The Cancer Genome Atlas. In addition, COSMIC is regularly updated with the genes highlighted in the Cancer Gene Census, which curates the scientific literature for known cancer genes [2]. With the advent of whole exome and genome sequencing technology, the amount of data in COSMIC is increasing rapidly. The recent COSMIC release (version 53; 18 May 2011) contains 608,042 tumor and cell line samples, annotating 176,856 mutations across 19,439 genes, with 352 full exomes, 43 whole genome rearrangement screens and 4 full genomes now available. The data are updated regularly, with new releases scheduled every two months. COSMIC provides a large number of graphical and tabular views for interpreting and mining the large quantity of information, as well as the facility to export the relevant data in various formats. The website can be navigated in many ways to examine mutation patterns on the basis of genes, samples and phenotypes, which are the main entry points to COSMIC. COSMIC also provides various options to browse the data in a genomic context. Integration with the Ensembl genome browser allows the visualization of full genome annotations, together with COSMIC data, on the GRCh37 genome coordinates. COSMIC also contains its own genome browser, which facilitates data analysis by combining genome-wide gene structures and sequences with rearrangement breakpoints, copy number variations and all somatic substitutions, deletions, insertions and complex gene mutations. The main COSMIC website [1] encompasses all of the available data. However, within COSMIC, the Cancer Cell Line Project [3] is a specialized component, which provides details of the genotyping of almost 800 commonly used cancer cell lines, through the set of known cancer genes. Its focus is to identify driver mutations, or those likely to be implicated in the oncogenesis of each tumor. This information forms the basis for integrating COSMIC with the Genomics of Drug Sensitivity in Cancer project [4], which is a joint effort with the Massachusetts General Hospital [5] to screen this panel of cancer cell lines against potential anticancer therapeutic compounds to investigate correlations between somatic mutations and drug sensitivity. Data on somatic mutations in cancer are being produced at a rapidly increasing rate, and the combined analysis of large distributed datasets is becoming ever more difficult. However, COSMIC curates and standardizes this information in a single database, providing user-friendly browsing tools and analytical functions, thus ensuring its role as a key resource in human cancer genetics.

Journal ArticleDOI
01 Sep 2011-Nature
TL;DR: A comprehensive analysis of genotype-dependent metabolic phenotypes using a genome-wide association study with non-targeted metabolomics to identify genetic loci associated with blood metabolite concentrations and generates many new hypotheses for biomedical and pharmaceutical research.
Abstract: Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.

Journal ArticleDOI
28 Jan 2011-Science
TL;DR: How genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short time scales is detailed.
Abstract: Epidemiological studies of the naturally transformable bacterial pathogen Streptococcus pneumoniae have previously been confounded by high rates of recombination. Sequencing 240 isolates of the PMEN1 (Spain23F-1) multidrug-resistant lineage enabled base substitutions to be distinguished from polymorphisms arising through horizontal sequence transfer. More than 700 recombinations were detected, with genes encoding major antigens frequently affected. Among these were 10 capsule-switching events, one of which accompanied a population shift as vaccine-escape serotype 19A isolates emerged in the USA after the introduction of the conjugate polysaccharide vaccine. The evolution of resistance to fluoroquinolones, rifampicin, and macrolides was observed to occur on multiple occasions. This study details how genomic plasticity within lineages of recombinogenic bacteria can permit adaptation to clinical interventions over remarkably short time scales.

Journal ArticleDOI
TL;DR: IDH1 and IDH2 mutations represent the first common genetic abnormalities to be identified in conventional central and periosteal cartilaginous tumours and speculate that a mosaic pattern of IDH‐mutation‐bearing cells explains the reports of diverse tumours (gliomas, AML, multiple cartILaginous neoplasms, haemangiomas) occurring in the same patient.
Abstract: Somatic mutations in isocitrate dehydrogenase 1 (IDH1) and IDH2 occur in gliomas and acute myeloid leukaemia (AML). Since patients with multiple enchondromas have occasionally been reported to have these conditions, we hypothesized that the same mutations would occur in cartilaginous neoplasms. Approximately 1200 mesenchymal tumours, including 220 cartilaginous tumours, 222 osteosarcomas and another ∼750 bone and soft tissue tumours, were screened for IDH1 R132 mutations, using Sequenom(®) mass spectrometry. Cartilaginous tumours and chondroblastic osteosarcomas, wild-type for IDH1 R132, were analysed for IDH2 (R172, R140) mutations. Validation was performed by capillary sequencing and restriction enzyme digestion. Heterozygous somatic IDH1/IDH2 mutations, which result in the production of a potential oncometabolite, 2-hydroxyglutarate, were only detected in central and periosteal cartilaginous tumours, and were found in at least 56% of these, ∼40% of which were represented by R132C. IDH1 R132H mutations were confirmed by immunoreactivity for this mutant allele. The ratio of IDH1:IDH2 mutation was 10.6 : 1. No IDH2 R140 mutations were detected. Mutations were detected in enchondromas through to conventional central and dedifferentiated chondrosarcomas, in patients with both solitary and multiple neoplasms. No germline mutations were detected. No mutations were detected in peripheral chondrosarcomas and osteochondromas. In conclusion, IDH1 and IDH2 mutations represent the first common genetic abnormalities to be identified in conventional central and periosteal cartilaginous tumours. As in gliomas and AML, the mutations appear to occur early in tumourigenesis. We speculate that a mosaic pattern of IDH-mutation-bearing cells explains the reports of diverse tumours (gliomas, AML, multiple cartilaginous neoplasms, haemangiomas) occurring in the same patient.

Journal ArticleDOI
TL;DR: The discovery of a strain of S aureus isolated from bulk milk that was phenotypically resistant to meticillin but tested negative for the mecA gene is reported and new diagnostic guidelines for the detection of MRSA should consider the inclusion of tests for mecALGA251.
Abstract: Summary Background Animals can act as a reservoir and source for the emergence of novel meticillin-resistant Staphylococcus aureus (MRSA) clones in human beings. Here, we report the discovery of a strain of S aureus (LGA251) isolated from bulk milk that was phenotypically resistant to meticillin but tested negative for the mecA gene and a preliminary investigation of the extent to which such strains are present in bovine and human populations. Methods Isolates of bovine MRSA were obtained from the Veterinary Laboratories Agency in the UK, and isolates of human MRSA were obtained from diagnostic or reference laboratories (two in the UK and one in Denmark). From these collections, we searched for mecA PCR-negative bovine and human S aureus isolates showing phenotypic meticillin resistance. We used whole-genome sequencing to establish the genetic basis for the observed antibiotic resistance. Findings A divergent mecA homologue ( mecA LGA251 ) was discovered in the LGA251 genome located in a novel staphylococcal cassette chromosome mec element, designated type-XI SCC mec . The mecA LGA251 was 70% identical to S aureus mecA homologues and was initially detected in 15 S aureus isolates from dairy cattle in England. These isolates were from three different multilocus sequence type lineages (CC130, CC705, and ST425); spa type t843 (associated with CC130) was identified in 60% of bovine isolates. When human mecA -negative MRSA isolates were tested, the mecA LGA251 homologue was identified in 12 of 16 isolates from Scotland, 15 of 26 from England, and 24 of 32 from Denmark. As in cows, t843 was the most common spa type detected in human beings. Interpretation Although routine culture and antimicrobial susceptibility testing will identify S aureus isolates with this novel mecA homologue as meticillin resistant, present confirmatory methods will not identify them as MRSA. New diagnostic guidelines for the detection of MRSA should consider the inclusion of tests for mecA LGA251 . Funding Department for Environment, Food and Rural Affairs, Higher Education Funding Council for England, Isaac Newton Trust (University of Cambridge), and the Wellcome Trust.

Journal ArticleDOI
David M. Evans1, Spencer Cca.2, J J Pointon3, Zhan Su2, D Harvey3, Grazyna Kochan2, Udo Oppermann4, Alexander T. Dilthey5, Matti Pirinen5, Millicent A. Stone6, L H Appleton3, Loukas Moutsianas2, Stephen Leslie2, T. W. H. Wordsworth3, Tony J. Kenna7, Tugce Karaderi3, Gethin P. Thomas7, Minghong Ward8, Michael H. Weisman9, C. Farrar3, Linda A. Bradbury7, Patrick Danoy7, Robert D. Inman10, Walter P. Maksymowych11, Dafna D. Gladman10, Proton Rahman12, Ann W. Morgan13, Helena Marzo-Ortega13, Paul Bowness3, Karl Gaffney14, Gaston Jsh.15, Malcolm D. Smith15, Jácome Bruges-Armas16, Couto A-R.17, Rosa Sorrentino17, Fabiana Paladini17, Manuel A. R. Ferreira18, Huji Xu19, Yu Liu19, L. Jiang19, Carlos López-Larrea, Roberto Díaz-Peña, Antonio López-Vázquez, Tetyana Zayats5, Céline Bellenguez2, Hannah Blackburn, Jenefer M. Blackwell20, Elvira Bramon21, Suzannah Bumpstead21, Juan P. Casas22, Aiden Corvin23, N. Craddock24, Panagiotis Deloukas21, Serge Dronov21, Audrey Duncanson25, Sarah Edkins21, Colin Freeman26, Matthew W. Gillman21, Emma Gray21, R. Gwilliam21, Naomi Hammond21, Sarah E. Hunt21, Janusz Jankowski, Alagurevathi Jayakumar21, Cordelia Langford21, Jennifer Liddle21, Hugh S. Markus27, Christopher G. Mathew28, O. T. McCann21, Mark I. McCarthy29, Palmer Cna.21, Leena Peltonen21, Robert Plomin28, Simon C. Potter21, Anna Rautanen21, Radhi Ravindrarajah21, Michelle Ricketts21, Nilesh J. Samani30, Stephen Sawcer31, A. Strange26, Richard C. Trembath28, Ananth C. Viswanathan32, Ananth C. Viswanathan33, Matthew Waller21, Paul A. Weston21, Pamela Whittaker21, Sara Widaa21, Nicholas W. Wood, Gil McVean26, John D. Reveille34, B P Wordsworth35, Matthew A. Brown35, Peter Donnelly26 
TL;DR: In this paper, the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all their datasets (p < 5x 10(-6) overall, with support in each of the three datasets studied).
Abstract: Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 x 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 x 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.

Journal ArticleDOI
TL;DR: The presence of the HLA-A*3101 allele was associated with carbamazepine-induced hypersensitivity reactions among subjects of Northern European ancestry.
Abstract: Background Carbamazepine causes various forms of hypersensitivity reactions, ranging from maculopapular exanthema to severe blistering reactions. The HLA-B*1502 allele has been shown to be strongly correlated with carbamazepine-induced Stevens–Johnson syndrome and toxic epidermal necrolysis (SJS–TEN) in the Han Chinese and other Asian populations but not in European populations. Methods We performed a genomewide association study of samples obtained from 22 subjects with carbamazepine-induced hypersensitivity syndrome, 43 subjects with carbamazepine-induced maculopapular exanthema, and 3987 control subjects, all of European descent. We tested for an association between disease and HLA alleles through proxy single-nucleotide polymorphisms and imputation, confirming associations by high-resolution sequence-based HLA typing. We replicated the associations in samples from 145 subjects with carbamazepine-induced hypersensitivity reactions. Results The HLA-A*3101 allele, which has a prevalence of 2 to 5% in Nor...

Journal ArticleDOI
TL;DR: The complex genetic architecture of the risk regions of and refine the risk signals for celiac disease are defined, providing the next step toward uncovering the causal mechanisms of the disease.
Abstract: Using variants from the 1000 Genomes Project pilot European CEU dataset and data from additional resequencing studies, we densely genotyped 183 non-HLA risk loci previously associated with immune-mediated diseases in 12,041 individuals with celiac disease (cases) and 12,228 controls. We identified 13 new celiac disease risk loci reaching genome-wide significance, bringing the number of known loci (including the HLA locus) to 40. We found multiple independent association signals at over one-third of these loci, a finding that is attributable to a combination of common, low-frequency and rare genetic variants. Compared to previously available data such as those from HapMap3, our dense genotyping in a large sample collection provided a higher resolution of the pattern of linkage disequilibrium and suggested localization of many signals to finer scale regions. In particular, 29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements. Altogether, we define the complex genetic architecture of the risk regions of and refine the risk signals for celiac disease, providing the next step toward uncovering the causal mechanisms of the disease.

Journal ArticleDOI
10 Aug 2011-Nature
TL;DR: The genome sequence of Atlantic cod is presented, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates.
Abstract: The genome of the Atlantic cod has been sequenced, and genomic analysis reveals an immune system that differs significantly from that in other vertebrates. The major histocompatibility complex (MHC) II has been lost, as have some other genes that are essential for MHC II function. But there is an expansion in the number of MHC I genes and a unique composition for its toll-like receptor family. These compensatory changes in both adaptive and innate immunity mean that cod is no more susceptible to disease than most other vertebrates. These findings challenge current models of vertebrate immune evolution, and may facilitate the development of targeted vaccines for disease management in aquaculture. Atlantic cod (Gadus morhua) is a large, cold-adapted teleost that sustains long-standing commercial fisheries and incipient aquaculture1,2. Here we present the genome sequence of Atlantic cod, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates. The genome assembly was obtained exclusively by 454 sequencing of shotgun and paired-end libraries, and automated annotation identified 22,154 genes. The major histocompatibility complex (MHC) II is a conserved feature of the adaptive immune system of jawed vertebrates3,4, but we show that Atlantic cod has lost the genes for MHC II, CD4 and invariant chain (Ii) that are essential for the function of this pathway. Nevertheless, Atlantic cod is not exceptionally susceptible to disease under natural conditions5. We find a highly expanded number of MHC I genes and a unique composition of its Toll-like receptor (TLR) families. This indicates how the Atlantic cod immune system has evolved compensatory mechanisms in both adaptive and innate immunity in the absence of MHC II. These observations affect fundamental assumptions about the evolution of the adaptive immune system and its components in vertebrates.

Journal ArticleDOI
25 Mar 2011-Science
TL;DR: An overview of what exhaustive sequencing of cancer genomes across a wide range of human tumors has revealed about the origin and behavioral features of cancer cells and how this genomic information is being exploited to improve diagnosis and therapy of the disease.
Abstract: The description and interpretation of genomic abnormalities in cancer cells have been at the heart of cancer research for more than a century. With exhaustive sequencing of cancer genomes across a wide range of human tumors well under way, we are now entering the end game of this mission. In the forthcoming decade, essentially complete catalogs of somatic mutations will be generated for tens of thousands of human cancers. Here, I provide an overview of what these efforts have revealed to date about the origin and behavioral features of cancer cells and how this genomic information is being exploited to improve diagnosis and therapy of the disease.

Journal ArticleDOI
20 Oct 2011-Nature
TL;DR: This work shows that a combination of zinc finger nucleases (ZFNs) and piggyBac technology in human iPSCs can achieve biallelic correction of a point mutation in the α1-antitrypsin (A1AT, also known as SERPINA1) gene that is responsible for α1
Abstract: Human induced pluripotent stem cells (iPSCs) represent a unique opportunity for regenerative medicine because they offer the prospect of generating unlimited quantities of cells for autologous transplantation, with potential application in treatments for a broad range of disorders. However, the use of human iPSCs in the context of genetically inherited human disease will require the correction of disease-causing mutations in a manner that is fully compatible with clinical applications. The methods currently available, such as homologous recombination, lack the necessary efficiency and also leave residual sequences in the targeted genome. Therefore, the development of new approaches to edit the mammalian genome is a prerequisite to delivering the clinical promise of human iPSCs. Here we show that a combination of zinc finger nucleases (ZFNs) and piggyBac technology in human iPSCs can achieve biallelic correction of a point mutation (Glu342Lys) in the α(1)-antitrypsin (A1AT, also known as SERPINA1) gene that is responsible for α(1)-antitrypsin deficiency. Genetic correction of human iPSCs restored the structure and function of A1AT in subsequently derived liver cells in vitro and in vivo. This approach is significantly more efficient than any other gene-targeting technology that is currently available and crucially prevents contamination of the host genome with residual non-human sequences. Our results provide the first proof of principle, to our knowledge, for the potential of combining human iPSCs with genetic correction to generate clinically relevant cells for autologous cell-based therapies.

Journal ArticleDOI
John F. Peden1, Jemma C. Hopewell1, Danish Saleheen2, John C. Chambers3, Jorg Hager4, Nicole Soranzo5, Rory Collins1, John Danesh2, Paul Elliott3, Martin Farrall1, Kathy Stirrups5, Weihua Zhang3, Anders Hamsten6, Anders Hamsten7, Sarah Parish1, Mark Lathrop4, Hugh Watkins1, Robert Clarke1, Panos Deloukas5, Jaspal S. Kooner3, Anuj Goel1, Halit Ongen1, Rona J. Strawbridge6, Rona J. Strawbridge7, Simon Heath4, Anders Mälarstig7, Anders Mälarstig6, Anna Helgadottir1, John Öhrvik7, John Öhrvik6, Muhammed Murtaza5, Simon C. Potter5, Sarah E. Hunt5, Marc Delepine4, Shapour Jalilzadeh1, Tomas Axelsson8, Ann-Christine Syvänen8, Rhian Gwilliam5, Suzannah Bumpstead5, Emma Gray5, Sarah Edkins5, Lasse Folkersen6, Lasse Folkersen7, Theodosios Kyriakou1, Anders Franco-Cereceda7, Anders Gabrielsen7, Udo Seedorf9, Per Eriksson7, Per Eriksson6, Alison Offer1, Louise Bowman1, Peter Sleight1, Jane Armitage1, Richard Peto1, Gonçalo R. Abecasis10, Nabeel Ahmed, Mark J. Caulfield11, Peter Donnelly1, Philippe Froguel3, Angad S. Kooner, Mark I. McCarthy1, Nilesh J. Samani12, James Scott3, Joban Sehmi3, Angela Silveira7, Angela Silveira6, Mai-Lis Hellénius7, Ferdinand M. van't Hooft6, Ferdinand M. van't Hooft7, Gunnar O Olsson13, Stephan Rust9, Gerd Assmann9, Simona Barlera, Gianni Tognoni, Maria Grazia Franzosi, Pamela Linksted1, Fiona Green14, Asif Rasheed, Moazzam Zaidi, Nabi Shah, Maria Samuel, Nadeem Hayat Mallick, Muhammad Azhar, Khan Shah Zaman, Abdus Samad, M. Ishaq, Ali Raza Gardezi, Fazal-ur-Rehman Memon, Philippe M. Frossard, Tim D. Spector, Leena Peltonen5, Leena Peltonen15, Markku S. Nieminen, Juha Sinisalo, Veikko Salomaa, Samuli Ripatti15, Derrick A Bennett1, Karin Leander7, Bruna Gigante7, Ulf de Faire7, Silvia Pietri, Francesca Gori, Roberto Marchioli, Suthesh Sivapalaratnam16, John J.P. Kastelein16, Mieke D. Trip16, Eirini V. Theodoraki17, George V. Dedoussis17, Engert Jc18, Salim Yusuf19, Sonia S. Anand19 
TL;DR: Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD), a modest number considering the apparent heritability of CAD(8) as mentioned in this paper.
Abstract: Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD)(1-7), a modest number considering the apparent heritability of CAD(8). ...

Journal ArticleDOI
14 Oct 2011-Cell
TL;DR: This study genetically identifies multiple putative microRNA decoys for PTEN, validates Z EB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAF V600E to promote melanomagenesis.

Journal ArticleDOI
22 Sep 2011-Nature
TL;DR: It is shown here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and several transcontinental transmission events are identified.
Abstract: Vibrio cholerae is a globally important pathogen that is endemic in many areas of the world and causes 3–5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and ongoing pandemic1. Only isolates in serogroup O1 (consisting of two biotypes known as ‘classical’ and ‘El Tor’) and the derivative O139 (refs 2, 3) can cause epidemic cholera2. It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic1. Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representative V. cholerae isolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery in V. cholerae.

Journal ArticleDOI
TL;DR: A unique hyperactive piggyBac transposase is generated with 17-fold and ninefold increases in excision and integration, respectively, and its applicability is shown by demonstrating an increased efficiency of generation of transgene-free mouse induced pluripotent stem cells.
Abstract: DNA transposons have been widely used for transgenesis and insertional mutagenesis in various organisms. Among the transposons active in mammalian cells, the moth-derived transposon piggyBac is most promising with its highly efficient transposition, large cargo capacity, and precise repair of the donor site. Here we report the generation of a hyperactive piggyBac transposase. The active transposition of piggyBac in multiple organisms allowed us to screen a transposase mutant library in yeast for hyperactive mutants and then to test candidates in mouse ES cells. We isolated 18 hyperactive mutants in yeast, among which five were also hyperactive in mammalian cells. By combining all mutations, a total of 7 aa substitutions, into a single reading frame, we generated a unique hyperactive piggyBac transposase with 17-fold and ninefold increases in excision and integration, respectively. We showed its applicability by demonstrating an increased efficiency of generation of transgene-free mouse induced pluripotent stem cells. We also analyzed whether this hyperactive piggyBac transposase affects the genomic integrity of the host cells. The frequency of footprints left by the hyperactive piggyBac transposase was as low as WT transposase (~1%) and we found no evidence that the expression of the transposase affects genomic integrity. This hyperactive piggyBac transposase expands the utility of the piggyBac transposon for applications in mammalian genetics and gene therapy.