Showing papers by "Wellcome Trust Sanger Institute published in 2009"
••
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Availability: http://samtools.sourceforge.net
Contact: [email protected]
45,957 citations
••
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.
Availability: http://maq.sourceforge.net
Contact: [email protected]
43,862 citations
••
[...]
TL;DR: This work has shown that the complete DNA sequence of large numbers of cancer genomes will be possible to obtain and will provide a detailed and comprehensive perspective on how individual cancers have developed.
Abstract: All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.
3,156 citations
••
Cardiff University1, Medical Research Council2, University of Bristol3, National Institute for Health Research4, King's College5, Trinity College, Dublin6, University of Cambridge7, University of Nottingham8, Queen's University Belfast9, University of Southampton10, University of Manchester11, John Radcliffe Hospital12, UCL Institute of Neurology13, University of Bonn14, University of Hamburg15, Charité16, University of Erlangen-Nuremberg17, University of Duisburg-Essen18, Ludwig Maximilian University of Munich19, Heidelberg University20, University College Dublin21, University of Freiburg22, Washington University in St. Louis23, Brigham Young University24, University of Antwerp25, University College London26, Wellcome Trust Sanger Institute27, King's College London28, Aristotle University of Thessaloniki29, National Institutes of Health30, Mayo Clinic31
TL;DR: A two-stage genome-wide association study of Alzheimer's disease involving over 16,000 individuals, the most powerful AD GWAS to date, produced compelling evidence for association with Alzheimer's Disease in the combined dataset.
Abstract: We undertook a two-stage genome-wide association study (GWAS) of Alzheimer's disease (AD) involving over 16,000 individuals, the most powerful AD GWAS to date. In stage 1 (3,941 cases and 7,848 controls), we replicated the established association with the apolipoprotein E (APOE) locus (most significant SNP, rs2075650, P = 1.8 10-157) and observed genome-wide significant association with SNPs at two loci not previously associated with the disease: at the CLU (also known as APOJ) gene (rs11136000, P = 1.4 10-9) and 5' to the PICALM gene (rs3851179, P = 1.9 10-8). These associations were replicated in stage 2 (2,023 cases and 2,340 controls), producing compelling evidence for association with Alzheimer's disease in the combined dataset (rs11136000, P = 8.5 10-10, odds ratio = 0.86; rs3851179, P = 1.3 10-9, odds ratio = 0.86).
2,956 citations
••
TL;DR: It is shown that the individual PB insertions can be removed from established iPS cell lines, providing an invaluable tool for discovery, and the traceless removal of reprogramming factors joined with viral 2A sequences delivered by a single transposon from murine iPS lines is demonstrated.
Abstract: Transgenic expression of just four defined transcription factors (c-Myc, Klf4, Oct4 and Sox2) is sufficient to reprogram somatic cells to a pluripotent state. The resulting induced pluripotent stem (iPS) cells resemble embryonic stem cells in their properties and potential to differentiate into a spectrum of adult cell types. Current reprogramming strategies involve retroviral, lentiviral, adenoviral and plasmid transfection to deliver reprogramming factor transgenes. Although the latter two methods are transient and minimize the potential for insertion mutagenesis, they are currently limited by diminished reprogramming efficiencies. piggyBac (PB) transposition is host-factor independent, and has recently been demonstrated to be functional in various human and mouse cell lines. The PB transposon/transposase system requires only the inverted terminal repeats flanking a transgene and transient expression of the transposase enzyme to catalyse insertion or excision events. Here we demonstrate successful and efficient reprogramming of murine and human embryonic fibroblasts using doxycycline-inducible transcription factors delivered by PB transposition. Stable iPS cells thus generated express characteristic pluripotency markers and succeed in a series of rigorous differentiation assays. By taking advantage of the natural propensity of the PB system for seamless excision, we show that the individual PB insertions can be removed from established iPS cell lines, providing an invaluable tool for discovery. In addition, we have demonstrated the traceless removal of reprogramming factors joined with viral 2A sequences delivered by a single transposon from murine iPS lines. We anticipate that the unique properties of this virus-independent simplification of iPS cell production will accelerate this field further towards full exploration of the reprogramming process and future cell-based therapies.
1,884 citations
••
Cristen J. Willer, Elizabeth K. Speliotes1, Elizabeth K. Speliotes2, Ruth J. F. Loos +163 more•Institutions (36)
TL;DR: Several of the likely causal genes are highly expressed or known to act in the central nervous system (CNS), emphasizing, as in rare monogenic forms of obesity, the role of the CNS in predisposition to obesity.
Abstract: Common variants at only two loci, FTO and MC4R, have been reproducibly associated with body mass index (BMI) in humans. To identify additional loci, we conducted meta-analysis of 15 genome-wide association studies for BMI (n > 32,000) and followed up top signals in 14 additional cohorts (n > 59,000). We strongly confirm FTO and MC4R and identify six additional loci (P < 5 x 10(-8)): TMEM18, KCTD15, GNPDA2, SH2B1, MTCH2 and NEGR1 (where a 45-kb deletion polymorphism is a candidate causal variant). Several of the likely causal genes are highly expressed or known to act in the central nervous system (CNS), emphasizing, as in rare monogenic forms of obesity, the role of the CNS in predisposition to obesity.
1,710 citations
••
deCODE genetics1, Maastricht University Medical Centre2, University of California, Los Angeles3, Utrecht University4, University of Oslo5, University of Bonn6, Ludwig Maximilian University of Munich7, Copenhagen University Hospital8, Wellcome Trust Sanger Institute9, Aarhus University Hospital10, Aarhus University11, University of Iceland12, University of Helsinki13, Bispebjerg Hospital14, Glostrup Hospital15, Heidelberg University16, Semmelweis University17, University of Verona18, Radboud University Nijmegen Medical Centre19, Russian Academy20, University of Valencia21, King's College London22, Royal Cornhill Hospital23, Duke University24, University of Santiago de Compostela25, Hospital General Universitario Gregorio Marañón26, Karolinska Institutet27, Hammersmith Hospital28, GlaxoSmithKline29, Sichuan University30
TL;DR: Findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.
Abstract: Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.
1,625 citations
••
TL;DR: An interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations.
Abstract: Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.
1,569 citations
••
TL;DR: Rather than one or two domestication events leading to the extant baker’s yeasts, the population structure of S. cerevisiae consists of a few well-defined, geographically isolated lineages and many different mosaics of these lineages, supporting the idea that human influence provided the opportunity for cross-breeding and production of new combinations of pre-existing variations.
Abstract: Since the completion of the genome sequence of Saccharomyces cerevisiae in 1996 (refs 1, 2), there has been a large increase in complete genome sequences, accompanied by great advances in our understanding of genome evolution. Although little is known about the natural and life histories of yeasts in the wild, there are an increasing number of studies looking at ecological and geographic distributions, population structure and sexual versus asexual reproduction. Less well understood at the whole genome level are the evolutionary processes acting within populations and species that lead to adaptation to different environments, phenotypic differences and reproductive isolation. Here we present one- to fourfold or more coverage of the genome sequences of over seventy isolates of the baker's yeast S. cerevisiae and its closest relative, Saccharomyces paradoxus. We examine variation in gene content, single nucleotide polymorphisms, nucleotide insertions and deletions, copy numbers and transposable elements. We find that phenotypic variation broadly correlates with global genome-wide phylogenetic relationships. S. paradoxus populations are well delineated along geographic boundaries, whereas the variation among worldwide S. cerevisiae isolates shows less differentiation and is comparable to a single S. paradoxus population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae consists of a few well-defined, geographically isolated lineages and many different mosaics of these lineages, supporting the idea that human influence provided the opportunity for cross-breeding and production of new combinations of pre-existing variations.
1,425 citations
••
Massachusetts Institute of Technology1, Boston University2, Harvard University3, University of Michigan4, Merck & Co.5, University of Oxford6, National Institutes of Health7, French Institute of Health and Medical Research8, University of Eastern Finland9, University of Southern California10, National Institute for Health and Welfare11, Imperial College London12, Lund University13, University of Helsinki14, Wellcome Trust Sanger Institute15, Tufts University16, University of North Carolina at Chapel Hill17
TL;DR: The results suggest that the cumulative effect of multiple common variants contributes to polygenic dyslipidemia.
Abstract: Blood low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglyceride levels are risk factors for cardiovascular disease. To dissect the polygenic basis of these traits, we conducted genome-wide association screens in 19,840 individuals and replication in up to 20,623 individuals. We identified 30 distinct loci associated with lipoprotein concentrations (each with P < 5 x 10(-8)), including 11 loci that reached genome-wide significance for the first time. The 11 newly defined loci include common variants associated with LDL cholesterol near ABCG8, MAFB, HNF1A and TIMD4; with HDL cholesterol near ANGPTL4, FADS1-FADS2-FADS3, HNF4A, LCAT, PLTP and TTC39B; and with triglycerides near AMAC1L2, FADS1-FADS2-FADS3 and PLTP. The proportion of individuals exceeding clinical cut points for high LDL cholesterol, low HDL cholesterol and high triglycerides varied according to an allelic dosage score (P < 10(-15) for each trend). These results suggest that the cumulative effect of multiple common variants contributes to polygenic dyslipidemia.
1,358 citations
••
Christopher Newton-Cheh1, Christopher Newton-Cheh2, Toby Johnson3, Toby Johnson4 +359 more•Institutions (64)
TL;DR: In this paper, the association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2(P = 1 × 10-23), FGF5 (P=1 × 10 -21), SH2B3(P= 3 × 10−18), MTHFR(MTHFR), c10orf107(P), ZNF652(ZNF652), PLCD3 (P,P = 5 × 10 −9),
Abstract: Elevated blood pressure is a common, heritable cause of cardiovascular disease worldwide. To date, identification of common genetic variants influencing blood pressure has proven challenging. We tested 2.5 million genotyped and imputed SNPs for association with systolic and diastolic blood pressure in 34,433 subjects of European ancestry from the Global BPgen consortium and followed up findings with direct genotyping (N ≤ 71,225 European ancestry, N ≤ 12,889 Indian Asian ancestry) and in silico comparison (CHARGE consortium, N = 29,136). We identified association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2 (P = 1 × 10(-23)), FGF5 (P = 1 × 10(-21)), SH2B3 (P = 3 × 10(-18)), MTHFR (P = 2 × 10(-13)), c10orf107 (P = 1 × 10(-9)), ZNF652 (P = 5 × 10(-9)) and PLCD3 (P = 1 × 10(-8)) genes. All variants associated with continuous blood pressure were associated with dichotomous hypertension. These associations between common variants and blood pressure and hypertension offer mechanistic insights into the regulation of blood pressure and may point to novel targets for interventions to prevent cardiovascular disease.
••
TL;DR: It is discovered that the interferon-inducible transmembrane proteins IFITM1, 2, and 3 restrict an early step in influenza A viral replication.
••
Christine G. Elsik1, Christine G. Elsik2, Christine G. Elsik3, Ross L. Tellam3 +325 more•Institutions (65)
TL;DR: To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage and provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Abstract: To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
••
Harvard University1, Broad Institute2, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico3, McMaster University4, McGill University5, University of Leicester6, University of Lübeck7, University of Pennsylvania8, Vanderbilt University9, University of Missouri–Kansas City10, University of Münster11, University of Verona12, Queen's University Belfast13, University of Washington14, Boston University15, University of Helsinki16, National Institute for Health and Welfare17, Lund University18, University of Cambridge19, Vita-Salute San Raffaele University20, University of Ferrara21, University of Turin22, Hebrew University of Jerusalem23, University of Girona24, University of Milan25, University of Leeds26, University of Regensburg27, Ludwig Maximilian University of Munich28, University of Kiel29, Wellcome Trust Sanger Institute30, University of Paris31, MedStar Washington Hospital Center32, deCODE genetics33, University of Iceland34
TL;DR: SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with my Cardiovascular Infarction risk.
Abstract: We conducted a genome-wide association study testing single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) for association with early-onset myocardial infarction in 2,967 cases and 3,075 controls We carried out replication in an independent sample with an effective sample size of up to 19,492 SNPs at nine loci reached genome-wide significance: three are newly identified (21q22 near MRPS6-SLC5A3-KCNE2, 6p24 in PHACTR1 and 2q33 in WDR12) and six replicated prior observations1, 2, 3, 4 (9p21, 1p13 near CELSR2-PSRC1-SORT1, 10q11 near CXCL12, 1q41 in MIA3, 19p13 near LDLR and 1p32 near PCSK9) We tested 554 common copy number polymorphisms (>1% allele frequency) and none met the pre-specified threshold for replication (P < 10-3) We identified 8,065 rare CNVs but did not detect a greater CNV burden in cases compared to controls, in genes compared to the genome as a whole, or at any individual locus SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with myocardial infarction risk
••
Wellcome Trust Sanger Institute1, Broad Institute2, J. Craig Venter Institute3, University of Texas Health Science Center at San Antonio4, University of York5, University of Maryland, College Park6, University of California, San Francisco7, Oswaldo Cruz Foundation8, University of Texas at San Antonio9, Universidade Federal de Minas Gerais10, University College Cork11, Iowa State University12, University of São Paulo13, University of Pittsburgh14, Kyoto University15, Natural History Museum16, University of Southampton17, Lille University of Science and Technology18, John Innes Centre19, Leiden University20, University of Göttingen21, University of Maryland, Baltimore22, Rush University Medical Center23, Illinois State University24, University at Buffalo25
TL;DR: Analysis of the 363 megabase nuclear genome of the blood fluke, the first sequenced flatworm, and a representative of the Lophotrochozoa offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and theDevelopment of tissues into organs.
Abstract: Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
••
University College Dublin1, Massachusetts Institute of Technology2, Broad Institute3, University of Aveiro4, University of Aberdeen5, Boston University6, University of Minnesota7, Duke University8, Imperial College London9, Stanford University10, University of Exeter11, Leibniz Association12, University of Amsterdam13, Wellcome Trust Sanger Institute14, University of Illinois at Urbana–Champaign15, Stony Brook University16, Newcastle University17, University of Iowa18, University of Sheffield19, University of Texas Health Science Center at Houston20
TL;DR: There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence in Candida albicans species.
Abstract: Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
••
TL;DR: A paired-end sequencing strategy is used to identify somatic rearrangements in breast cancer genomes and provides a new perspective on cancer genomes, highlighting the diversity of somatic upheavals and their potential contribution to cancer development.
Abstract: Multiple somatic rearrangements are often found in cancer genomes; however, the underlying processes of rearrangement and their contribution to cancer development are poorly characterized Here we use a paired-end sequencing strategy to identify somatic rearrangements in breast cancer genomes There are more rearrangements in some breast cancers than previously appreciated Rearrangements are more frequent over gene footprints and most are intrachromosomal Multiple rearrangement architectures are present, but tandem duplications are particularly common in some cancers, perhaps reflecting a specific defect in DNA maintenance Short overlapping sequences at most rearrangement junctions indicate that these have been mediated by non-homologous end-joining DNA repair, although varying sequence patterns indicate that multiple processes of this type are operative Several expressed in-frame fusion genes were identified but none was recurrent The study provides a new perspective on cancer genomes, highlighting the diversity of somatic rearrangements and their potential contribution to cancer development
••
University of Oulu1, University of Helsinki2, University of California, Los Angeles3, Imperial College London4, Finnish Institute of Occupational Health5, Southampton General Hospital6, Broad Institute7, Churchill Hospital8, Wellcome Trust Centre for Human Genetics9, University of Oxford10, Semel Institute for Neuroscience and Human Behavior11, Wellcome Trust Sanger Institute12
TL;DR: The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
Abstract: Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
••
TL;DR: The data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity, and identifies multiple expressive quantitative trait loci per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene.
Abstract: Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.
••
Baylor College of Medicine1, University of Missouri2, United States Department of Agriculture3, University of New England (United States)4, Commonwealth Scientific and Industrial Research Organisation5, Texas A&M University6, Norwegian University of Life Sciences7, George Mason University8, AgResearch9, Catholic University of the Sacred Heart10, International Atomic Energy Agency11, Empresa Brasileira de Pesquisa Agropecuária12, Sao Paulo State University13, International Livestock Research Institute14, Parco Tecnologico Padano15, University of Edinburgh16, Ethiopian Institute of Agricultural Research17, Livestock Improvement Corporation18, Cornell University19, University of Alberta20, Tuscia University21, Wellcome Trust Sanger Institute22, University of Melbourne23, Government of Victoria24, Trinity College, Dublin25, Simon Fraser University26
TL;DR: Data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation.
Abstract: The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.
••
TL;DR: DNAPlotter is an interactive Java application for generating circular and linear representations of genomes that filters features of interest to display on separate user-definable tracks.
Abstract: Summary: DNAPlotter is an interactive Java application for generating circular and linear representations of genomes. Making use of the Artemis libraries to provide a user-friendly method of loading in sequence files (EMBL, GenBank, GFF) as well as data from relational databases, it filters features of interest to display on separate user-definable tracks. It can be used to produce publication quality images for papers or web pages.
Availability: DNAPlotter is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/circular/
Contact: ku.ca.regnas@simetra
••
Wageningen University and Research Centre1, University of Cambridge2, University of Edinburgh3, University of Illinois at Urbana–Champaign4, Aarhus University5, Wellcome Trust Sanger Institute6, Institut national de la recherche agronomique7, Illumina8, Iowa State University9, Agricultural Research Service10, University of Missouri11, United States Department of Agriculture12
TL;DR: The results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs and demonstrate that the PorcineSNP60 Beadchip is an excellent tool that will likely be used in a variety of future studies in pigs.
Abstract: Background: The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay. Methodology/Principal Findings: A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina’s Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274. Conclusions/Significance: Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs.
••
TL;DR: UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes, identifying UTX as a new human cancer gene.
Abstract: Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase gene UTX, pointing to histone H3 lysine methylation deregulation in multiple tumor types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.
••
Broad Institute1, University of Pavia2, Uppsala University3, University of Bologna4, University of Kentucky5, University of Adelaide6, University of Tampa7, University of Veterinary Medicine Vienna8, University of Bern9, University of California, Davis10, Wellcome Trust Sanger Institute11, Cornell University12, Royal Veterinary College13, Institut national de la recherche agronomique14, Japan Racing Association15, University College Dublin16, Genetic Information Research Institute17, Swedish University of Agricultural Sciences18, University of Minnesota19, University of Bari20, Texas A&M University21, Animal Health Trust22, Massachusetts Institute of Technology23
TL;DR: The analysis reveals an evolutionarily new centromere on equine chromosome 11 that displays properties of an immature but fully functioning Centromere and is devoid of centromeric satellite sequence, suggesting thatCentromeric function may arise before satellite repeat accumulation.
Abstract: We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements: 53% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evolutionary new centromere devoid of centromeric satellite DNA, suggesting that centromeric function may arise before satellite repeat accumulation. Linkage disequilibrium, showing the influences of early domestication of large herds of female horses, is intermediate in length between dog and human, and there is long-range haplotype sharing among breeds.
••
Wellcome Trust Centre for Human Genetics1, Medical Research Council2, Harvard University3, Broad Institute4, Wellcome Trust Sanger Institute5, King's College London6, deCODE genetics7, Boston University8, University of Michigan9, Erasmus University Rotterdam10, National Institutes of Health11, VU University Amsterdam12, University of Oulu13, Lund University14, University of Virginia15, University Hospital of Lausanne16, University of Lausanne17, University of Southern California18, Imperial College London19, Ninewells Hospital20, University of California, Los Angeles21, University of Düsseldorf22, Novartis23, Swiss Institute of Bioinformatics24, European Bioinformatics Institute25, University of Eastern Finland26, GlaxoSmithKline27, University of North Carolina at Chapel Hill28, Oulu University Hospital29, University Medical Center Groningen30, University of Helsinki31, Ludwig Maximilian University of Munich32, University of Cambridge33, VU University Medical Center34, Leiden University Medical Center35, Brigham and Women's Hospital36, Massachusetts Institute of Technology37, University of Iceland38, University of Oxford39
TL;DR: Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten genome-wide association scans, and previous associations of fasting glucose with variants at the G6PC2 and GCK loci are confirmed.
Abstract: To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 x 10(-50)) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 x 10(-15)). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 x 10(-7)) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 x 10(-57)) and GCK (rs4607517, P = 1.0 x 10(-25)) loci.
••
TL;DR: The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.
Abstract: Bone mineral density (BMD) is a heritable complex trait used in the clinical diagnosis of osteoporosis and the assessment of fracture risk. We performed meta-analysis of five genome-wide association studies of femoral neck and lumbar spine BMD in 19,195 subjects of Northern European descent. We identified 20 BMD loci that reached genome-wide significance (GWS; P < 5 x 10(-8)), of which 13 map to regions not previously associated with this trait: 1p31.3 (GPR177), 2p21 (SPTBN1), 3p22 (CTNNB1), 4q21.1 (MEPE), 5q14 (MEF2C), 7p14 (STARD3NL), 7q21.3 (FLJ42280), 11p11.2 (LRP4, ARHGAP1, F2), 11p14.1 (DCDC5), 11p15 (SOX6), 16q24 (FOXL1), 17q21 (HDAC5) and 17q12 (CRHR1). The meta-analysis also confirmed at GWS level seven known BMD loci on 1p36 (ZBTB40), 6q25 (ESR1), 8q24 (TNFRSF11B), 11q13.4 (LRP5), 12q13 (SP7), 13q14 (TNFSF11) and 18q21 (TNFRSF11A). The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.
••
Cecilia M. Lindgren1, Iris M. Heid2, Joshua C. Randall1, Claudia Lamina3 +152 more•Institutions (36)
TL;DR: By focusing on anthropometric measures of central obesity and fat distribution, a meta-analysis of 16 genome-wide association studies informative for adult waist circumference and waist–hip ratio identified three loci implicated in the regulation of human adiposity.
Abstract: To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.
••
TL;DR: An amplification-free method of library preparation is presented, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly.
Abstract: Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low G+C content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly. We illustrate this by generating and analyzing DNA sequences from extremely (G+C)-poor (Plasmodium falciparum), (G+C)-neutral (Escherichia coli) and (G+C)-rich (Bordetella pertussis) genomes.
••
TL;DR: The versatility of the bacteria in the genus Stenotrophomonas is discussed and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to the understanding of the adaptation of this genus to various niches is discussed.
Abstract: The genus Stenotrophomonas comprises at least eight species. These bacteria are found throughout the environment, particularly in close association with plants. Strains of the most predominant species, Stenotrophomonas maltophilia, have an extraordinary range of activities that include beneficial effects for plant growth and health, the breakdown of natural and man-made pollutants that are central to bioremediation and phytoremediation strategies and the production of biomolecules of economic value, as well as detrimental effects, such as multidrug resistance, in human pathogenic strains. Here, we discuss the versatility of the bacteria in the genus Stenotrophomonas and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to our understanding of the adaptation of this genus to various niches.
••
TL;DR: P piggyBac transposon–based reprogramming may be used to generate therapeutically applicable iPSCs and could be identified by negative selection.
Abstract: Induced pluripotent stem cells (iPSCs) have been generated from somatic cells by transgenic expression of Oct4 (Pou5f1), Sox2, Klf4 and Myc. A major difficulty in the application of this technology for regenerative medicine, however, is the delivery of reprogramming factors. Whereas retroviral transduction increases the risk of tumorigenicity, transient expression methods have considerably lower reprogramming efficiencies. Here we describe an efficient piggyBac transposon-based approach to generate integration-free iPSCs. Transposons carrying 2A peptide-linked reprogramming factors induced reprogramming of mouse embryonic fibroblasts with equivalent efficiencies to retroviral transduction. We removed transposons from these primary iPSCs by re-expressing transposase. Transgene-free iPSCs could be identified by negative selection. piggyBac excised without a footprint, leaving the iPSC genome without any genetic alteration. iPSCs fulfilled all criteria of pluripotency, such as pluripotency gene expression, teratoma formation and contribution to chimeras. piggyBac transposon-based reprogramming may be used to generate therapeutically applicable iPSCs.