scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2010"


Journal ArticleDOI
04 Mar 2010-Nature
TL;DR: The Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals are described, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species.
Abstract: To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively

9,268 citations


Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations


Journal ArticleDOI
05 Aug 2010-Nature
TL;DR: The results identify several novel loci associated with plasma lipids that are also associated with CAD and provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.
Abstract: Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.

3,469 citations


Journal ArticleDOI
02 Sep 2010-Nature
TL;DR: An expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.
Abstract: Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

2,863 citations


Journal ArticleDOI
TL;DR: Genetic loci associated with body mass index map near key hypothalamic regulators of energy balance, and one of these loci is near GIPR, an incretin receptor, which may provide new insights into human body weight regulation.
Abstract: Obesity is globally prevalent and highly heritable, but its underlying genetic factors remain largely elusive. To identify genetic loci for obesity susceptibility, we examined associations between body mass index and similar to 2.8 million SNPs in up to 123,865 individuals with targeted follow up of 42 SNPs in up to 125,931 additional individuals. We confirmed 14 known obesity susceptibility loci and identified 18 new loci associated with body mass index (P < 5 x 10(-8)), one of which includes a copy number variant near GPRC5B. Some loci (at MC4R, POMC, SH2B1 and BDNF) map near key hypothalamic regulators of energy balance, and one of these loci is near GIPR, an incretin receptor. Furthermore, genes in other newly associated loci may provide new insights into human body weight regulation.

2,632 citations


Journal ArticleDOI
Andre Franke1, Dermot P.B. McGovern2, Jeffrey C. Barrett3, Kai Wang4, Graham L. Radford-Smith5, Tariq Ahmad6, Charlie W. Lees7, Tobias Balschun1, James Lee8, Rebecca L. Roberts9, Carl A. Anderson3, Joshua C. Bis10, Suzanne Bumpstead3, David Ellinghaus1, Eleonora M. Festen11, Michel Georges12, Todd Green13, Talin Haritunians2, Luke Jostins3, Anna Latiano14, Christopher G. Mathew15, Grant W. Montgomery5, Natalie J. Prescott15, Soumya Raychaudhuri13, Jerome I. Rotter2, Philip Schumm16, Yashoda Sharma17, Lisa A. Simms5, Kent D. Taylor2, David C. Whiteman5, Cisca Wijmenga11, Robert N. Baldassano4, Murray L. Barclay9, Theodore M. Bayless18, Stephan Brand19, Carsten Büning20, Albert Cohen21, Jean Frederick Colombel22, Mario Cottone, Laura Stronati, Ted Denson23, Martine De Vos24, Renata D'Incà, Marla Dubinsky2, Cathryn Edwards25, Timothy H. Florin26, Denis Franchimont27, Richard B. Gearry9, Jürgen Glas28, Jürgen Glas19, Jürgen Glas22, André Van Gossum27, Stephen L. Guthery29, Jonas Halfvarson30, Hein W. Verspaget31, Jean-Pierre Hugot32, Amir Karban33, Debby Laukens24, Ian C. Lawrance34, Marc Lémann32, Arie Levine35, Cécile Libioulle12, Edouard Louis12, Craig Mowat36, William G. Newman37, Julián Panés, Anne M. Phillips36, Deborah D. Proctor17, Miguel Regueiro38, Richard K Russell39, Paul Rutgeerts40, Jeremy D. Sanderson41, Miquel Sans, Frank Seibold42, A. Hillary Steinhart43, Pieter C. F. Stokkers44, Leif Törkvist45, Gerd A. Kullak-Ublick46, David C. Wilson7, Thomas D. Walters43, Stephan R. Targan2, Steven R. Brant18, John D. Rioux47, Mauro D'Amato45, Rinse K. Weersma11, Subra Kugathasan48, Anne M. Griffiths43, John C. Mansfield49, Severine Vermeire40, Richard H. Duerr38, Mark S. Silverberg43, Jack Satsangi7, Stefan Schreiber1, Judy H. Cho17, Vito Annese14, Hakon Hakonarson4, Mark J. Daly13, Miles Parkes8 
TL;DR: A meta-analysis of six Crohn's disease genome-wide association studies and a series of in silico analyses highlighted particular genes within these loci implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP.
Abstract: We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.

2,482 citations


Journal ArticleDOI
TL;DR: Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA).
Abstract: Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA). Performing CMA and G-banded karyotyping on every patient substantially increases the total cost of genetic testing. The International Standard Cytogenomic Array (ISCA) Consortium held two international workshops and conducted a literature review of 33 studies, including 21,698 patients tested by CMA. We provide an evidence-based summary of clinical cytogenetic testing comparing CMA to G-banded karyotyping with respect to technical advantages and limitations, diagnostic yield for various types of chromosomal aberrations, and issues that affect test interpretation. CMA offers a much higher diagnostic yield (15%–20%) for genetic testing of individuals with unexplained DD/ID, ASD, or MCA than a G-banded karyotype (~3%, excluding Down syndrome and other recognizable chromosomal syndromes), primarily because of its higher sensitivity for submicroscopic deletions and duplications. Truly balanced rearrangements and low-level mosaicism are generally not detectable by arrays, but these are relatively infrequent causes of abnormal phenotypes in this population (<1%). Available evidence strongly supports the use of CMA in place of G-banded karyotyping as the first-tier cytogenetic diagnostic test for patients with DD/ID, ASD, or MCA. G-banded karyotype analysis should be reserved for patients with obvious chromosomal syndromes (e.g., Down syndrome), a family history of chromosomal rearrangement, or a history of multiple miscarriages.

2,294 citations


Journal ArticleDOI
Thomas J. Hudson1, Thomas J. Hudson2, Warwick Anderson3, Axel Aretz4  +270 moreInstitutions (92)
15 Apr 2010
TL;DR: Systematic studies of more than 25,000 cancer genomes will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
Abstract: The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

2,041 citations


Journal ArticleDOI
Josée Dupuis1, Josée Dupuis2, Claudia Langenberg, Inga Prokopenko3  +336 moreInstitutions (82)
TL;DR: It is demonstrated that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.
Abstract: Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.

2,022 citations


Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: It is concluded that the heritability void left by genome-wide association studies will not be accounted for by common CNVs, and 30 loci with CNVs that are candidates for influencing disease susceptibility are identified.
Abstract: Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

1,892 citations


Journal ArticleDOI
TL;DR: By combining genome-wide association data from 8,130 individuals with type 2 diabetes and 38,987 controls of European descent and following up previously unidentified meta-analysis signals, 12 new T2D association signals are identified with combined P < 5 × 10−8.
Abstract: By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined P<5x10(-8). These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.

Journal ArticleDOI
Hana Lango Allen1, Karol Estrada2, Guillaume Lettre3, Sonja I. Berndt4  +341 moreInstitutions (90)
14 Oct 2010-Nature
TL;DR: It is shown that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait, and indicates that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Abstract: Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: The genomes of a malignant melanoma and a lymphoblastoid cell line from the same person are sequenced, providing the first comprehensive catalogue of somatic mutations from an individual cancer.
Abstract: All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

Journal ArticleDOI
TL;DR: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants in Ensembl, and a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all EnsembL and Ensemble Genomes supported species.
Abstract: Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. Availability: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software. Contact:wm2@ebi.ac.uk; fiona@ebi.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
Thomas J. Wang1, Feng Zhang2, J. Brent Richards, Bryan Kestenbaum3, Joyce B. J. van Meurs4, Diane J. Berry5, Douglas P. Kiel, Elizabeth A. Streeten6, Claes Ohlsson7, Daniel L. Koller8, Leena Peltonen9, Leena Peltonen10, Jason D. Cooper2, Paul F. O'Reilly11, Denise K. Houston12, Nicole L. Glazer3, Liesbeth Vandenput7, Munro Peacock8, Julia Shi6, Fernando Rivadeneira4, Mark I. McCarthy13, Mark I. McCarthy14, Mark I. McCarthy15, Pouta Anneli, Ian H. de Boer3, Massimo Mangino2, Bernet S. Kato2, Deborah J. Smyth7, Sarah L. Booth16, Paul F. Jacques16, Greg L. Burke12, Mark O. Goodarzi17, Ching-Lung Cheung18, Myles Wolf19, Kenneth Rice3, David Goltzman2, Nick Hidiroglou20, Martin Ladouceur, Nicholas J. Wareham7, Lynne J. Hocking16, Deborah J. Hart2, Nigel K Arden15, Cyrus Cooper15, Suneil Malik21, William D. Fraser22, Anna Liisa Hartikainen2, Guangju Zhai2, Helen M. Macdonald2, Nita G. Forouhi23, Ruth J. F. Loos23, David M. Reid24, Alan Hakim, Elaine M. Dennison25, Yongmei Liu10, Chris Power5, Helen Stevens2, Laitinen Jaana21, Ramachandran S. Vasan26, Nicole Soranzo27, Nicole Soranzo9, Jörg Bojunga28, Bruce M. Psaty3, Mattias Lorentzon7, Tatiana Foroud8, Tamara B. Harris10, Albert Hofman4, John-Olov Jansson11, Jane A. Cauley29, André G. Uitterlinden, Quince Gibson, Marjo-Riitta Järvelin, David Karasik, David S. Siscovick3, Michael J. Econs8, Stephen B. Kritchevsky22, Jose C. Florez, John A. Todd7, Josée Dupuis26, Elina Hyppönen5, Tim D. Spector27 
TL;DR: In this article, a genome-wide association study of 25-hydroxyvitamin D concentrations in 33,996 individuals of European descent from 15 cohorts was conducted to identify common genetic variants affecting vitamin D concentrations and risk of insufficiency.

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: It is found that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2–M checkpoint, and phylogenetic trees across metastases show organ-specific branches.
Abstract: Christine Iacobuzio-Donahue and colleagues use whole-genome exome sequencing to analyse primary pancreatic cancers and one or more metastases from the same patients, and find that tumours are composed of distinct subclones. The authors also determine the evolutionary maps by which metastatic cancer clones have evolved within the primary tumour, and estimate the timescales of tumour progression. On the basis of these data, they estimate a mean period of 11.8 years between the initiation of pancreatic tumorigenesis and the formation of the parental, non-metastatic tumour, and a further 6.8 years for the index metastasis clone to arise. These data point to a potentially large window of opportunity during which it might be possible to detect the cancer in a relatively early form. Peter Campbell and colleagues use next-generation sequencing to detect chromosomal rearrangements in 13 patients with pancreatic cancer. The results reveal considerable inter-patient heterogeneity and indicate ongoing genomic instability and evolution during the development of metastases. But for most of the patients studied, more than half of the genetic rearrangements found were present in all metastases and the primary tumour, making them potential targets for therapeutic intervention at early and late stages of the disease. Pancreatic cancer is highly aggressive, usually because of widespread metastasis. Here, next-generation DNA sequencing has been used to detect genomic rearrangements in 13 patients with pancreatic cancer and to explore clonal relationships among metastases. The results reveal not only considerable inter-patient heterogeneity, but also ongoing genomic instability and evolution during the development of metastases. Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97–98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations1,2,3,4,5, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours6,7, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites7, and how the tumour disseminates. Here we harness advances in DNA sequencing8,9,10,11,12 to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2–M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.

Journal ArticleDOI
21 Jan 2010-Nature
TL;DR: The identification of inactivating mutations in two genes encoding enzymes involved in histone modification and NF2 mutations were found in non-VHL mutated ccRCC, and several other probable cancer genes were identified, indicating that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene.
Abstract: Clear cell renal cell carcinoma (ccRCC) is the most common form of adult kidney cancer, characterized by the presence of inactivating mutations in the VHL gene in most cases, and by infrequent somatic mutations in known cancer genes. To determine further the genetics of ccRCC, we have sequenced 101 cases through 3,544 protein-coding genes. Here we report the identification of inactivating mutations in two genes encoding enzymes involved in histone modification-SETD2, a histone H3 lysine 36 methyltransferase, and JARID1C (also known as KDM5C), a histone H3 lysine 4 demethylase-as well as mutations in the histone H3 lysine 27 demethylase, UTX (KMD6A), that we recently reported. The results highlight the role of mutations in components of the chromatin modification machinery in human cancer. Furthermore, NF2 mutations were found in non-VHL mutated ccRCC, and several other probable cancer genes were identified. These results indicate that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene, and that systematic screens will be key to fully determining the somatic genetic architecture of cancer.

Journal ArticleDOI
22 Jan 2010-Science
TL;DR: A high-throughput genomics approach is used to show that isolates of methicillin-resistant Staphylococcus aureus are precisely differentiated into a global geographic structure and suggest that intercontinental transmission has occurred for nearly four decades.
Abstract: Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.

Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: Using massively parallel sequencing technology, a small-cell lung cancer cell line, NCI-H209, is sequenced to explore the mutational burden associated with tobacco smoking and identifies a tandem duplication that duplicates exons 3–8 of CHD7 in frame, and another two lines carrying PVT1–CHD7 fusion genes, indicating that ChD7 may be recurrently rearranged in this disease.
Abstract: Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.

Journal ArticleDOI
TL;DR: The experiences with the leading target-enrichment technologies, the optimizations that are performed, and typical results that can be obtained using each are described and detailed protocols for each are provided so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.
Abstract: We have not yet reached a point at which routine sequencing of large numbers of whole eukaryotic genomes is feasible, and so it is often necessary to select genomic regions of interest and to enrich these regions before sequencing. There are several enrichment approaches, each with unique advantages and disadvantages. Here we describe our experiences with the leading target-enrichment technologies, the optimizations that we have performed and typical results that can be obtained using each. We also provide detailed protocols for each technology so that end users can find the best compromise between sensitivity, specificity and uniformity for their particular project.

Journal ArticleDOI
TL;DR: Individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) data show that hnRNP C recognizes uridine tracts with a defined long-range spacing consistent with hmRNP particle organization, and integration of transcriptome-wide iCLIP data and alternative splicing profiles into an 'RNA map' indicates how the positioning of hn RNP particles determines their effect on the inclusion of alternative exons.
Abstract: In the nucleus of eukaryotic cells, nascent transcripts are associated with heterogeneous nuclear ribonucleoprotein (hnRNP) particles that are nucleated by hnRNP C. Despite their abundance, however, it remained unclear whether these particles control pre-mRNA processing. Here, we developed individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) to study the role of hnRNP C in splicing regulation. iCLIP data show that hnRNP C recognizes uridine tracts with a defined long-range spacing consistent with hnRNP particle organization. hnRNP particles assemble on both introns and exons but remain generally excluded from splice sites. Integration of transcriptome-wide iCLIP data and alternative splicing profiles into an 'RNA map' indicates how the positioning of hnRNP particles determines their effect on the inclusion of alternative exons. The ability of high-resolution iCLIP data to provide insights into the mechanism of this regulation holds promise for studies of other higher-order ribonucleoprotein complexes.

Journal ArticleDOI
Amy Strange1, Francesca Capon2, Chris C. A. Spencer1, Jo Knight, Michael E. Weale2, Michael H. Allen2, Anne Barton3, Gavin Band1, Céline Bellenguez1, Judith G.M. Bergboer4, Jenefer M. Blackwell, Elvira Bramon, Suzannah Bumpstead5, Juan P. Casas6, Michael J. Cork7, Aiden Corvin8, Panos Deloukas5, Alexander T. Dilthey1, Audrey Duncanson9, Sarah Edkins5, Xavier Estivill, Oliver FitzGerald, Colin Freeman9, Emiliano Giardina, Emma Gray5, Angelika Hofer10, Ulrike Hüffmeier11, Sarah E. Hunt5, Alan D. Irvine8, Janusz Jankowski12, Brian Kirby, Cordelia Langford5, Jesús Lascorz, Joyce Leman13, Stephen Leslie1, Lotus Mallbris14, Hugh S. Markus15, Christopher G. Mathew2, W.H. Irwin McLean16, Ross McManus8, Rotraut Mössner17, Loukas Moutsianas1, Åsa Torinsson Naluai18, Frank O. Nestle, Giuseppe Novelli, Alexandros Onoufriadis2, Colin N. A. Palmer16, Carlo Perricone19, Matti Pirinen1, Robert Plomin2, Simon C. Potter5, Ramon M. Pujol, Anna Rautanen9, Eva Riveira-Muñoz, Anthony W. Ryan8, Wolfgang Salmhofer10, Lena Samuelsson18, Stephen Sawcer20, Joost Schalkwijk4, Catherine H. Smith, Mona Ståhle14, Zhan Su9, Rachid Tazi-Ahnini7, Heiko Traupe21, Ananth C. Viswanathan22, Ananth C. Viswanathan23, Richard B. Warren3, Wolfgang Weger10, Katarina Wolk14, Nicholas W. Wood, Jane Worthington3, Helen S. Young3, Patrick L.J.M. Zeeuwen4, Adrian Hayday, A. David Burden, Christopher E.M. Griffiths3, Juha Kere, André Reis11, Gilean McVean1, David M. Evans24, Matthew A. Brown, Jonathan Barker, Leena Peltonen5, Peter Donnelly9, Peter Donnelly1, Richard C. Trembath 
TL;DR: These findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis and report compelling evidence for an interaction between the HLA-C and ERAP1 loci.
Abstract: To identify new susceptibility loci for psoriasis, we undertook a genome-wide association study of 594,224 SNPs in 2,622 individuals with psoriasis and 5,667 controls. We identified associations at eight previously unreported genomic loci. Seven loci harbored genes with recognized immune functions (IL28RA, REL, IFIH1, ERAP1, TRAF3IP2, NFKBIA and TYK2). These associations were replicated in 9,079 European samples (six loci with a combined P < 5 × 10⁻⁸ and two loci with a combined P < 5 × 10⁻⁷). We also report compelling evidence for an interaction between the HLA-C and ERAP1 loci (combined P = 6.95 × 10⁻⁶). ERAP1 plays an important role in MHC class I peptide processing. ERAP1 variants only influenced psoriasis susceptibility in individuals carrying the HLA-C risk allele. Our findings implicate pathways that integrate epidermal barrier dysfunction with innate and adaptive immune dysregulation in psoriasis pathogenesis.

Journal ArticleDOI
TL;DR: This work identifies p53-binding protein 1 (53BP1) as an essential factor for sustaining the growth arrest induced by Brca1 deletion, and finds reduced 53BP1 expression in subsets of sporadic triple-negative and BRCA-associated breast cancers, indicating the potential clinical implications of these findings.
Abstract: Germ-line mutations in breast cancer 1, early onset (BRCA1) result in predisposition to breast and ovarian cancer. BRCA1-mutated tumors show genomic instability, mainly as a consequence of impaired recombinatorial DNA repair. Here we identify p53-binding protein 1 (53BP1) as an essential factor for sustaining the growth arrest induced by Brca1 deletion. Depletion of 53BP1 abrogates the ATM-dependent checkpoint response and G2 cell-cycle arrest triggered by the accumulation of DNA breaks in Brca1-deleted cells. This effect of 53BP1 is specific to BRCA1 function, as 53BP1 depletion did not alleviate proliferation arrest or checkpoint responses in Brca2-deleted cells. Notably, loss of 53BP1 partially restores the homologous-recombination defect of Brca1-deleted cells and reverts their hypersensitivity to DNA-damaging agents. We find reduced 53BP1 expression in subsets of sporadic triple-negative and BRCA-associated breast cancers, indicating the potential clinical implications of our findings.

Journal ArticleDOI
TL;DR: This article performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects, and genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls.
Abstract: We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

Journal ArticleDOI
Iris M. Heid1, Anne U. Jackson2, Joshua C. Randall3, Tthomas W. Winkler1  +352 moreInstitutions (90)
TL;DR: A meta-analysis of genome-wide association studies for WHR adjusted for body mass index provides evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.
Abstract: Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10⁻⁹ to P = 1.8 × 10⁻⁴⁰) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10⁻³ to P = 1.2 × 10⁻¹³). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.

Journal ArticleDOI
TL;DR: TriTrypDB is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs, utilizing a sophisticated search strategy system.
Abstract: TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.

Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: This analysis shows that high throughput sequencing technologies reveal new properties of Genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.
Abstract: Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.

01 Jan 2010
TL;DR: Variants from 13 new regions reached genome-wide significance and most contain genes with immune functions, with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection.
Abstract: We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: This work shows that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets and shows evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience.
Abstract: The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.

Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: This study carried out a genome-wide phenotypic profiling of each of the ∼21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival.
Abstract: Despite our rapidly growing knowledge about the human genome, we do not know all of the genes required for some of the most basic functions of life. To start to fill this gap we developed a high-throughput phenotypic screening platform combining potent gene silencing by RNA interference, time-lapse microscopy and computational image processing. We carried out a genome-wide phenotypic profiling of each of the approximately 21,000 human protein-coding genes by two-day live imaging of fluorescently labelled chromosomes. Phenotypes were scored quantitatively by computational image processing, which allowed us to identify hundreds of human genes involved in diverse biological functions including cell division, migration and survival. As part of the Mitocheck consortium, this study provides an in-depth analysis of cell division phenotypes and makes the entire high-content data set available as a resource to the community.