scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2009"


Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations


Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations


Journal ArticleDOI
09 Apr 2009-Nature
TL;DR: This work has shown that the complete DNA sequence of large numbers of cancer genomes will be possible to obtain and will provide a detailed and comprehensive perspective on how individual cancers have developed.
Abstract: All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed.

3,156 citations


Journal ArticleDOI
Denise Harold1, Richard Abraham2, Paul Hollingworth2, Rebecca Sims2, Amy Gerrish2, Marian L. Hamshere3, Jaspreet Singh Pahwa2, Valentina Moskvina2, Kimberley Dowzell2, Amy L. Williams2, Nicola L. Jones2, Charlene Thomas2, Alexandra Stretton2, Angharad R. Morgan2, Simon Lovestone4, John Powell5, Petroula Proitsi5, Michelle K. Lupton5, Carol Brayne6, David C. Rubinsztein7, Michael Gill6, Brian A. Lawlor6, Aoibhinn Lynch6, Kevin Morgan8, Kristelle Brown8, Peter Passmore9, David Craig9, Bernadette McGuinness9, Stephen Todd9, Clive Holmes10, David M. A. Mann11, A. David Smith12, Seth Love3, Patrick G. Kehoe3, John Hardy, Simon Mead13, Nick C. Fox13, Martin N. Rossor13, John Collinge13, Wolfgang Maier14, Frank Jessen14, Britta Schürmann14, Hendrik van den Bussche15, Isabella Heuser16, Johannes Kornhuber17, Jens Wiltfang18, Martin Dichgans19, Lutz Frölich20, Harald Hampel21, Harald Hampel19, Michael Hüll22, Dan Rujescu19, Alison Goate23, John S. K. Kauwe24, Carlos Cruchaga23, Petra Nowotny23, John C. Morris23, Kevin Mayo23, Kristel Sleegers25, Karolien Bettens25, Sebastiaan Engelborghs25, Peter Paul De Deyn25, Christine Van Broeckhoven25, Gill Livingston26, Nicholas Bass26, Hugh Gurling26, Andrew McQuillin26, Rhian Gwilliam27, Panagiotis Deloukas27, Ammar Al-Chalabi28, Christopher Shaw28, Magda Tsolaki29, Andrew B. Singleton30, Rita Guerreiro30, Thomas W. Mühleisen14, Markus M. Nöthen14, Susanne Moebus18, Karl-Heinz Jöckel18, Norman Klopp, H-Erich Wichmann19, Minerva M. Carrasquillo31, V. Shane Pankratz31, Steven G. Younkin31, Peter Holmans2, Michael Conlon O'Donovan2, Michael John Owen2, Julie Williams2 
TL;DR: A two-stage genome-wide association study of Alzheimer's disease involving over 16,000 individuals, the most powerful AD GWAS to date, produced compelling evidence for association with Alzheimer's Disease in the combined dataset.
Abstract: We undertook a two-stage genome-wide association study (GWAS) of Alzheimer's disease (AD) involving over 16,000 individuals, the most powerful AD GWAS to date. In stage 1 (3,941 cases and 7,848 controls), we replicated the established association with the apolipoprotein E (APOE) locus (most significant SNP, rs2075650, P = 1.8 10-157) and observed genome-wide significant association with SNPs at two loci not previously associated with the disease: at the CLU (also known as APOJ) gene (rs11136000, P = 1.4 10-9) and 5' to the PICALM gene (rs3851179, P = 1.9 10-8). These associations were replicated in stage 2 (2,023 cases and 2,340 controls), producing compelling evidence for association with Alzheimer's disease in the combined dataset (rs11136000, P = 8.5 10-10, odds ratio = 0.86; rs3851179, P = 1.3 10-9, odds ratio = 0.86).

2,956 citations


Journal ArticleDOI
09 Apr 2009-Nature
TL;DR: It is shown that the individual PB insertions can be removed from established iPS cell lines, providing an invaluable tool for discovery, and the traceless removal of reprogramming factors joined with viral 2A sequences delivered by a single transposon from murine iPS lines is demonstrated.
Abstract: Transgenic expression of just four defined transcription factors (c-Myc, Klf4, Oct4 and Sox2) is sufficient to reprogram somatic cells to a pluripotent state. The resulting induced pluripotent stem (iPS) cells resemble embryonic stem cells in their properties and potential to differentiate into a spectrum of adult cell types. Current reprogramming strategies involve retroviral, lentiviral, adenoviral and plasmid transfection to deliver reprogramming factor transgenes. Although the latter two methods are transient and minimize the potential for insertion mutagenesis, they are currently limited by diminished reprogramming efficiencies. piggyBac (PB) transposition is host-factor independent, and has recently been demonstrated to be functional in various human and mouse cell lines. The PB transposon/transposase system requires only the inverted terminal repeats flanking a transgene and transient expression of the transposase enzyme to catalyse insertion or excision events. Here we demonstrate successful and efficient reprogramming of murine and human embryonic fibroblasts using doxycycline-inducible transcription factors delivered by PB transposition. Stable iPS cells thus generated express characteristic pluripotency markers and succeed in a series of rigorous differentiation assays. By taking advantage of the natural propensity of the PB system for seamless excision, we show that the individual PB insertions can be removed from established iPS cell lines, providing an invaluable tool for discovery. In addition, we have demonstrated the traceless removal of reprogramming factors joined with viral 2A sequences delivered by a single transposon from murine iPS lines. We anticipate that the unique properties of this virus-independent simplification of iPS cell production will accelerate this field further towards full exploration of the reprogramming process and future cell-based therapies.

1,884 citations


Journal ArticleDOI
TL;DR: Several of the likely causal genes are highly expressed or known to act in the central nervous system (CNS), emphasizing, as in rare monogenic forms of obesity, the role of the CNS in predisposition to obesity.
Abstract: Common variants at only two loci, FTO and MC4R, have been reproducibly associated with body mass index (BMI) in humans. To identify additional loci, we conducted meta-analysis of 15 genome-wide association studies for BMI (n > 32,000) and followed up top signals in 14 additional cohorts (n > 59,000). We strongly confirm FTO and MC4R and identify six additional loci (P < 5 x 10(-8)): TMEM18, KCTD15, GNPDA2, SH2B1, MTCH2 and NEGR1 (where a 45-kb deletion polymorphism is a candidate causal variant). Several of the likely causal genes are highly expressed or known to act in the central nervous system (CNS), emphasizing, as in rare monogenic forms of obesity, the role of the CNS in predisposition to obesity.

1,710 citations


Journal ArticleDOI
Hreinn Stefansson1, Hreinn Stefansson2, Roel A. Ophoff3, Roel A. Ophoff4, Roel A. Ophoff2, Stacy Steinberg2, Stacy Steinberg1, Ole A. Andreassen5, Sven Cichon6, Dan Rujescu7, Thomas Werge8, Olli Pietilainen9, Ole Mors10, Preben Bo Mortensen11, Engilbert Sigurdsson12, Omar Gustafsson1, Mette Nyegaard11, Annamari Tuulio-Henriksson13, Andres Ingason1, Thomas Hansen8, Jaana Suvisaari13, Jouko Lönnqvist13, Tiina Paunio, Anders D. Børglum10, Anders D. Børglum11, Annette M. Hartmann7, Anders Fink-Jensen8, Merete Nordentoft14, David M. Hougaard, Bent Nørgaard-Pedersen, Yvonne Böttcher1, Jes Olesen15, René Breuer16, Hans-Jürgen Möller7, Ina Giegling7, Henrik B. Rasmussen8, Sally Timm8, Manuel Mattheisen6, István Bitter17, János Réthelyi17, Brynja B. Magnusdottir12, Thordur Sigmundsson12, Pall I. Olason1, Gisli Masson1, Jeffrey R. Gulcher1, Magnús Haraldsson12, Ragnheidur Fossdal1, Thorgeir E. Thorgeirsson1, Unnur Thorsteinsdottir1, Unnur Thorsteinsdottir12, Mirella Ruggeri18, Sarah Tosato18, Barbara Franke19, Eric Strengman4, Lambertus A. Kiemeney19, Ingrid Melle5, Srdjan Djurovic5, Lilia I. Abramova20, Kaleda Vg20, Julio Sanjuán21, Rosa de Frutos21, Elvira Bramon22, Evangelos Vassos22, Gillian Fraser23, Ulrich Ettinger22, Marco Picchioni22, Nicholas Walker, T. Toulopoulou22, Anna C. Need24, Dongliang Ge24, Joeng Lim Yoon3, Kevin V. Shianna24, Nelson B. Freimer3, Rita M. Cantor3, Robin M. Murray22, Augustine Kong1, Vera Golimbet20, Angel Carracedo25, Celso Arango26, Javier Costas, Erik G. Jönsson27, Lars Terenius27, Ingrid Agartz27, Hannes Petursson12, Markus M. Nöthen6, Marcella Rietschel16, Paul M. Matthews28, Pierandrea Muglia29, Leena Peltonen9, David St Clair23, David Goldstein24, Kari Stefansson12, Kari Stefansson1, David A. Collier30, David A. Collier22 
06 Aug 2009-Nature
TL;DR: Findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.
Abstract: Schizophrenia is a complex disorder, caused by both genetic and environmental factors and their interactions. Research on pathogenesis has traditionally focused on neurotransmitter systems in the brain, particularly those involving dopamine. Schizophrenia has been considered a separate disease for over a century, but in the absence of clear biological markers, diagnosis has historically been based on signs and symptoms. A fundamental message emerging from genome-wide association studies of copy number variations (CNVs) associated with the disease is that its genetic basis does not necessarily conform to classical nosological disease boundaries. Certain CNVs confer not only high relative risk of schizophrenia but also of other psychiatric disorders. The structural variations associated with schizophrenia can involve several genes and the phenotypic syndromes, or the 'genomic disorders', have not yet been characterized. Single nucleotide polymorphism (SNP)-based genome-wide association studies with the potential to implicate individual genes in complex diseases may reveal underlying biological pathways. Here we combined SNP data from several large genome-wide scans and followed up the most significant association signals. We found significant association with several markers spanning the major histocompatibility complex (MHC) region on chromosome 6p21.3-22.1, a marker located upstream of the neurogranin gene (NRGN) on 11q24.2 and a marker in intron four of transcription factor 4 (TCF4) on 18q21.2. Our findings implicating the MHC region are consistent with an immune component to schizophrenia risk, whereas the association with NRGN and TCF4 points to perturbation of pathways involved in brain development, memory and cognition.

1,625 citations


Journal ArticleDOI
TL;DR: An interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations.
Abstract: Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.

1,569 citations


Journal ArticleDOI
19 Mar 2009-Nature
TL;DR: Rather than one or two domestication events leading to the extant baker’s yeasts, the population structure of S. cerevisiae consists of a few well-defined, geographically isolated lineages and many different mosaics of these lineages, supporting the idea that human influence provided the opportunity for cross-breeding and production of new combinations of pre-existing variations.
Abstract: Since the completion of the genome sequence of Saccharomyces cerevisiae in 1996 (refs 1, 2), there has been a large increase in complete genome sequences, accompanied by great advances in our understanding of genome evolution. Although little is known about the natural and life histories of yeasts in the wild, there are an increasing number of studies looking at ecological and geographic distributions, population structure and sexual versus asexual reproduction. Less well understood at the whole genome level are the evolutionary processes acting within populations and species that lead to adaptation to different environments, phenotypic differences and reproductive isolation. Here we present one- to fourfold or more coverage of the genome sequences of over seventy isolates of the baker's yeast S. cerevisiae and its closest relative, Saccharomyces paradoxus. We examine variation in gene content, single nucleotide polymorphisms, nucleotide insertions and deletions, copy numbers and transposable elements. We find that phenotypic variation broadly correlates with global genome-wide phylogenetic relationships. S. paradoxus populations are well delineated along geographic boundaries, whereas the variation among worldwide S. cerevisiae isolates shows less differentiation and is comparable to a single S. paradoxus population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae consists of a few well-defined, geographically isolated lineages and many different mosaics of these lineages, supporting the idea that human influence provided the opportunity for cross-breeding and production of new combinations of pre-existing variations.

1,425 citations


Journal ArticleDOI
TL;DR: The results suggest that the cumulative effect of multiple common variants contributes to polygenic dyslipidemia.
Abstract: Blood low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglyceride levels are risk factors for cardiovascular disease. To dissect the polygenic basis of these traits, we conducted genome-wide association screens in 19,840 individuals and replication in up to 20,623 individuals. We identified 30 distinct loci associated with lipoprotein concentrations (each with P < 5 x 10(-8)), including 11 loci that reached genome-wide significance for the first time. The 11 newly defined loci include common variants associated with LDL cholesterol near ABCG8, MAFB, HNF1A and TIMD4; with HDL cholesterol near ANGPTL4, FADS1-FADS2-FADS3, HNF4A, LCAT, PLTP and TTC39B; and with triglycerides near AMAC1L2, FADS1-FADS2-FADS3 and PLTP. The proportion of individuals exceeding clinical cut points for high LDL cholesterol, low HDL cholesterol and high triglycerides varied according to an allelic dosage score (P < 10(-15) for each trend). These results suggest that the cumulative effect of multiple common variants contributes to polygenic dyslipidemia.

1,358 citations


Journal ArticleDOI
TL;DR: In this paper, the association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2(P = 1 × 10-23), FGF5 (P=1 × 10 -21), SH2B3(P= 3 × 10−18), MTHFR(MTHFR), c10orf107(P), ZNF652(ZNF652), PLCD3 (P,P = 5 × 10 −9),
Abstract: Elevated blood pressure is a common, heritable cause of cardiovascular disease worldwide. To date, identification of common genetic variants influencing blood pressure has proven challenging. We tested 2.5 million genotyped and imputed SNPs for association with systolic and diastolic blood pressure in 34,433 subjects of European ancestry from the Global BPgen consortium and followed up findings with direct genotyping (N ≤ 71,225 European ancestry, N ≤ 12,889 Indian Asian ancestry) and in silico comparison (CHARGE consortium, N = 29,136). We identified association between systolic or diastolic blood pressure and common variants in eight regions near the CYP17A1 (P = 7 × 10(-24)), CYP1A2 (P = 1 × 10(-23)), FGF5 (P = 1 × 10(-21)), SH2B3 (P = 3 × 10(-18)), MTHFR (P = 2 × 10(-13)), c10orf107 (P = 1 × 10(-9)), ZNF652 (P = 5 × 10(-9)) and PLCD3 (P = 1 × 10(-8)) genes. All variants associated with continuous blood pressure were associated with dichotomous hypertension. These associations between common variants and blood pressure and hypertension offer mechanistic insights into the regulation of blood pressure and may point to novel targets for interventions to prevent cardiovascular disease.


Journal ArticleDOI
24 Apr 2009-Science
TL;DR: To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage and provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Abstract: To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

Journal ArticleDOI
Sekar Kathiresan1, Benjamin F. Voight1, Shaun Purcell2, Kiran Musunuru1, Diego Ardissino, Pier Mannuccio Mannucci3, Sonia S. Anand4, James C. Engert5, Nilesh J. Samani6, Heribert Schunkert7, Jeanette Erdmann7, Muredach P. Reilly8, Daniel J. Rader8, Thomas M. Morgan9, John A. Spertus10, Monika Stoll11, Domenico Girelli12, Pascal P. McKeown13, Christopher Patterson13, David S. Siscovick14, Christopher J. O'Donnell15, Roberto Elosua, Leena Peltonen16, Veikko Salomaa17, Stephen M. Schwartz14, Olle Melander18, David Altshuler1, Pier Angelica Merlini, Carlo Berzuini19, Luisa Bernardinelli19, Flora Peyvandi3, Marco Tubaro, Patrizia Celli, Maurizio Ferrario, Raffaela Fetiveau, Nicola Marziliano, Giorgio Casari20, Michele Galli, Flavio Ribichini12, Marco Rossi, Francesco Bernardi21, Pietro Zonzin, Alberto Piazza22, Jean Yee14, Yechiel Friedlander23, Jaume Marrugat, Gavin Lucas, Isaac Subirana, Joan Sala24, Rafael Ramos, James B. Meigs1, Gordon H. Williams1, David M. Nathan1, Calum A. MacRae1, Aki S. Havulinna17, Göran Berglund18, Joel N. Hirschhorn1, Rosanna Asselta, Stefano Duga, Marta Spreafico25, Mark J. Daly1, James Nemesh2, Joshua M. Korn1, Steven A. McCarroll1, Aarti Surti2, Candace Guiducci2, Lauren Gianniny2, Daniel B. Mirel2, Melissa Parkin2, Noël P. Burtt2, Stacey Gabriel2, John R. Thompson6, Peter S. Braund6, Benjamin J. Wright6, Anthony J. Balmforth26, Stephen G. Ball26, Alistair S. Hall26, Patrick Linsel-Nitschke7, Wolfgang Lieb7, Andreas Ziegler7, Inke R. König7, Christian Hengstenberg27, Marcus Fischer27, Klaus Stark27, Anika Grosshennig7, Michael Preuss7, H-Erich Wichmann28, Stefan Schreiber29, Willem H. Ouwehand19, Panos Deloukas30, Michael Scholz, François Cambien31, Mingyao Li8, Zhen Chen8, Robert L. Wilensky8, William H. Matthai8, Atif Qasim8, Hakon Hakonarson8, Joe Devaney32, Mary-Susan Burnett32, Augusto D. Pichard32, Kenneth M. Kent32, Lowell F. Satler32, Joseph M. Lindsay32, Ron Waksman32, Stephen E. Epstein32, Thomas Scheffold, Klaus Berger11, Andreas Huge11, Nicola Martinelli12, Oliviero Olivieri12, Roberto Corrocher12, Hilma Holm33, Gudmar Thorleifsson33, Unnur Thorsteinsdottir34, Kari Stefansson34, Ron Do5, Changchun Xie4, David S. Siscovick14 
TL;DR: SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with my Cardiovascular Infarction risk.
Abstract: We conducted a genome-wide association study testing single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) for association with early-onset myocardial infarction in 2,967 cases and 3,075 controls We carried out replication in an independent sample with an effective sample size of up to 19,492 SNPs at nine loci reached genome-wide significance: three are newly identified (21q22 near MRPS6-SLC5A3-KCNE2, 6p24 in PHACTR1 and 2q33 in WDR12) and six replicated prior observations1, 2, 3, 4 (9p21, 1p13 near CELSR2-PSRC1-SORT1, 10q11 near CXCL12, 1q41 in MIA3, 19p13 near LDLR and 1p32 near PCSK9) We tested 554 common copy number polymorphisms (>1% allele frequency) and none met the pre-specified threshold for replication (P < 10-3) We identified 8,065 rare CNVs but did not detect a greater CNV burden in cases compared to controls, in genes compared to the genome as a whole, or at any individual locus SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with myocardial infarction risk

Journal ArticleDOI
16 Jul 2009-Nature
TL;DR: Analysis of the 363 megabase nuclear genome of the blood fluke, the first sequenced flatworm, and a representative of the Lophotrochozoa offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and theDevelopment of tissues into organs.
Abstract: Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

Journal ArticleDOI
04 Jun 2009-Nature
TL;DR: There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence in Candida albicans species.
Abstract: Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.

Journal ArticleDOI
24 Dec 2009-Nature
TL;DR: A paired-end sequencing strategy is used to identify somatic rearrangements in breast cancer genomes and provides a new perspective on cancer genomes, highlighting the diversity of somatic upheavals and their potential contribution to cancer development.
Abstract: Multiple somatic rearrangements are often found in cancer genomes; however, the underlying processes of rearrangement and their contribution to cancer development are poorly characterized Here we use a paired-end sequencing strategy to identify somatic rearrangements in breast cancer genomes There are more rearrangements in some breast cancers than previously appreciated Rearrangements are more frequent over gene footprints and most are intrachromosomal Multiple rearrangement architectures are present, but tandem duplications are particularly common in some cancers, perhaps reflecting a specific defect in DNA maintenance Short overlapping sequences at most rearrangement junctions indicate that these have been mediated by non-homologous end-joining DNA repair, although varying sequence patterns indicate that multiple processes of this type are operative Several expressed in-frame fusion genes were identified but none was recurrent The study provides a new perspective on cancer genomes, highlighting the diversity of somatic rearrangements and their potential contribution to cancer development

Journal ArticleDOI
TL;DR: The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
Abstract: Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.

Journal ArticleDOI
04 Sep 2009-Science
TL;DR: The data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity, and identifies multiple expressive quantitative trait loci per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene.
Abstract: Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

Journal ArticleDOI
Richard A. Gibbs1, Jeremy F. Taylor2, Curtis P. Van Tassell3, William Barendse4, William Barendse5, Kellye Eversole, Clare A. Gill6, Ronnie D. Green3, Debora L. Hamernik3, Steven M. Kappes3, Sigbjørn Lien7, Lakshmi K. Matukumalli3, Lakshmi K. Matukumalli8, John C. McEwan9, Lynne V. Nazareth1, Robert D. Schnabel2, George M. Weinstock1, David A. Wheeler1, Paolo Ajmone-Marsan10, Paul Boettcher11, Alexandre Rodrigues Caetano12, José Fernando Garcia13, José Fernando Garcia11, Olivier Hanotte14, Paola Mariani15, Loren C. Skow6, Tad S. Sonstegard3, John L. Williams15, John L. Williams16, Boubacar Diallo, Lemecha Hailemariam17, Mário Luiz Martinez12, C. A. Morris9, Luiz Otávio Campos da Silva12, Richard J. Spelman18, Woudyalew Mulatu14, Keyan Zhao19, Colette A. Abbey6, Morris Agaba14, Flábio R. Araújo12, Rowan J. Bunch5, Rowan J. Bunch4, James O. Burton16, C. Gorni15, Hanotte Olivier15, Blair E. Harrison5, Blair E. Harrison4, Bill Luff, Marco Antonio Machado12, Joel Mwakaya14, Graham Plastow20, Warren Sim5, Warren Sim4, Timothy P. L. Smith3, Merle B Thomas5, Merle B Thomas4, Alessio Valentini21, Paul D. Williams5, James E. Womack6, John Woolliams16, Yue Liu1, Xiang Qin1, Kim C. Worley1, Chuan Gao6, Huaiyang Jiang1, Stephen S. Moore20, Yanru Ren1, Xingzhi Song1, Carlos Bustamante19, Ryan D. Hernandez19, Donna M. Muzny1, Shobha Patil1, Anthony San Lucas1, Qing Fu1, Matthew Peter Kent7, Richard Vega1, Aruna Matukumalli3, Sean McWilliam4, Sean McWilliam5, Gert Sclep15, Katarzyna Bryc19, Jung-Woo Choi6, Hong Gao19, John J. Grefenstette8, Brenda M. Murdoch20, Alessandra Stella15, Rafael Villa-Angulo8, Mark G. Wright19, Jan Aerts16, Jan Aerts22, Oliver C. Jann16, Riccardo Negrini10, Michael E. Goddard23, Michael E. Goddard24, Ben J. Hayes24, Daniel G. Bradley25, Marcos V.B. da Silva12, Marcos V.B. da Silva3, Lilian P.L. Lau25, George E. Liu3, David J. Lynn25, David J. Lynn26, Francesca Panzitta15, Ken G. Dodds9 
24 Apr 2009-Science
TL;DR: Data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation.
Abstract: The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

Journal ArticleDOI
TL;DR: DNAPlotter is an interactive Java application for generating circular and linear representations of genomes that filters features of interest to display on separate user-definable tracks.
Abstract: Summary: DNAPlotter is an interactive Java application for generating circular and linear representations of genomes. Making use of the Artemis libraries to provide a user-friendly method of loading in sequence files (EMBL, GenBank, GFF) as well as data from relational databases, it filters features of interest to display on separate user-definable tracks. It can be used to produce publication quality images for papers or web pages. Availability: DNAPlotter is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/circular/ Contact: ku.ca.regnas@simetra

Journal ArticleDOI
05 Aug 2009-PLOS ONE
TL;DR: The results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs and demonstrate that the PorcineSNP60 Beadchip is an excellent tool that will likely be used in a variety of future studies in pigs.
Abstract: Background: The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay. Methodology/Principal Findings: A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina’s Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274. Conclusions/Significance: Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs.

Journal ArticleDOI
TL;DR: UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes, identifying UTX as a new human cancer gene.
Abstract: Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase gene UTX, pointing to histone H3 lysine methylation deregulation in multiple tumor types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.

Journal ArticleDOI
06 Nov 2009-Science
TL;DR: The analysis reveals an evolutionarily new centromere on equine chromosome 11 that displays properties of an immature but fully functioning Centromere and is devoid of centromeric satellite sequence, suggesting thatCentromeric function may arise before satellite repeat accumulation.
Abstract: We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements: 53% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evolutionary new centromere devoid of centromeric satellite DNA, suggesting that centromeric function may arise before satellite repeat accumulation. Linkage disequilibrium, showing the influences of early domestication of large herds of female horses, is intermediate in length between dog and human, and there is long-range haplotype sharing among breeds.

Journal ArticleDOI
Inga Prokopenko1, Claudia Langenberg2, Jose C. Florez3, Jose C. Florez4, Richa Saxena4, Richa Saxena3, Nicole Soranzo5, Nicole Soranzo6, Gudmar Thorleifsson7, Ruth J. F. Loos2, Alisa K. Manning8, Anne U. Jackson9, Yurii S. Aulchenko10, Simon C. Potter5, Michael R. Erdos11, Serena Sanna, Jouke-Jan Hottenga12, Eleanor Wheeler5, Marika Kaakinen13, Valeriya Lyssenko14, Wei-Min Chen15, Kourosh R. Ahmadi6, Jacques S. Beckmann16, Jacques S. Beckmann17, Richard N. Bergman18, Murielle Bochud16, Lori L. Bonnycastle11, Thomas A. Buchanan18, Antonio Cao, Alessandra C. L. Cervino6, Lachlan J. M. Coin19, Francis S. Collins11, Laura Crisponi, Eco J. C. de Geus12, Abbas Dehghan10, Panos Deloukas5, Alex S. F. Doney20, Paul Elliott19, Nelson B. Freimer21, Vesela Gateva9, Christian Herder22, Albert Hofman10, Thomas Edward Hughes23, Sarah E. Hunt5, Thomas Illig, Michael Inouye5, Bo Isomaa, Toby Johnson16, Toby Johnson17, Toby Johnson24, Augustine Kong7, Maria Krestyaninova25, Johanna Kuusisto26, Markku Laakso26, Noha Lim27, Ulf Lindblad14, Cecilia M. Lindgren1, O. T. McCann5, Karen L. Mohlke28, Andrew D. Morris20, Silvia Naitza, Marco Orru, Colin N. A. Palmer20, Anneli Pouta29, Joshua C. Randall1, Wolfgang Rathmann22, Jouko Saramies, Paul Scheet9, Laura J. Scott9, Angelo Scuteri11, Stephen J. Sharp2, Eric J.G. Sijbrands10, Jan H. Smit30, Kijoung Song27, Valgerdur Steinthorsdottir7, Heather M. Stringham9, Tiinamaija Tuomi31, Jaakko Tuomilehto, André G. Uitterlinden10, Benjamin F. Voight3, Benjamin F. Voight4, Dawn M. Waterworth27, H-Erich Wichmann32, Gonneke Willemsen12, Jacqueline C.M. Witteman10, Xin Yuan27, Jing Hua Zhao2, Eleftheria Zeggini1, David Schlessinger11, Manjinder S. Sandhu33, Manjinder S. Sandhu2, Dorret I. Boomsma12, Manuela Uda, Tim D. Spector6, Brenda W.J.H. Penninx33, Brenda W.J.H. Penninx34, Brenda W.J.H. Penninx35, David Altshuler3, David Altshuler4, Peter Vollenweider16, Marjo-Riitta Järvelin13, Marjo-Riitta Järvelin19, Edward G. Lakatta11, Gérard Waeber16, Caroline S. Fox36, Caroline S. Fox11, Leena Peltonen5, Leena Peltonen37, Leif Groop14, Vincent Mooser27, L. Adrienne Cupples8, Unnur Thorsteinsdottir38, Unnur Thorsteinsdottir7, Michael Boehnke9, Inês Barroso5, Cornelia M. van Duijn10, Josée Dupuis8, Richard M. Watanabe18, Kari Stefansson7, Kari Stefansson38, Mark I. McCarthy39, Mark I. McCarthy1, Nicholas J. Wareham2, James B. Meigs3, Gonçalo R. Abecasis9 
TL;DR: Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten genome-wide association scans, and previous associations of fasting glucose with variants at the G6PC2 and GCK loci are confirmed.
Abstract: To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 x 10(-50)) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 x 10(-15)). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 x 10(-7)) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 x 10(-57)) and GCK (rs4607517, P = 1.0 x 10(-25)) loci.

Journal ArticleDOI
TL;DR: The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.
Abstract: Bone mineral density (BMD) is a heritable complex trait used in the clinical diagnosis of osteoporosis and the assessment of fracture risk. We performed meta-analysis of five genome-wide association studies of femoral neck and lumbar spine BMD in 19,195 subjects of Northern European descent. We identified 20 BMD loci that reached genome-wide significance (GWS; P < 5 x 10(-8)), of which 13 map to regions not previously associated with this trait: 1p31.3 (GPR177), 2p21 (SPTBN1), 3p22 (CTNNB1), 4q21.1 (MEPE), 5q14 (MEF2C), 7p14 (STARD3NL), 7q21.3 (FLJ42280), 11p11.2 (LRP4, ARHGAP1, F2), 11p14.1 (DCDC5), 11p15 (SOX6), 16q24 (FOXL1), 17q21 (HDAC5) and 17q12 (CRHR1). The meta-analysis also confirmed at GWS level seven known BMD loci on 1p36 (ZBTB40), 6q25 (ESR1), 8q24 (TNFRSF11B), 11q13.4 (LRP5), 12q13 (SP7), 13q14 (TNFSF11) and 18q21 (TNFRSF11A). The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.

Journal ArticleDOI
Cecilia M. Lindgren1, Iris M. Heid2, Joshua C. Randall1, Claudia Lamina3  +152 moreInstitutions (36)
TL;DR: By focusing on anthropometric measures of central obesity and fat distribution, a meta-analysis of 16 genome-wide association studies informative for adult waist circumference and waist–hip ratio identified three loci implicated in the regulation of human adiposity.
Abstract: To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.

Journal ArticleDOI
TL;DR: An amplification-free method of library preparation is presented, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly.
Abstract: Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low G+C content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly. We illustrate this by generating and analyzing DNA sequences from extremely (G+C)-poor (Plasmodium falciparum), (G+C)-neutral (Escherichia coli) and (G+C)-rich (Bordetella pertussis) genomes.

Journal ArticleDOI
TL;DR: The versatility of the bacteria in the genus Stenotrophomonas is discussed and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to the understanding of the adaptation of this genus to various niches is discussed.
Abstract: The genus Stenotrophomonas comprises at least eight species. These bacteria are found throughout the environment, particularly in close association with plants. Strains of the most predominant species, Stenotrophomonas maltophilia, have an extraordinary range of activities that include beneficial effects for plant growth and health, the breakdown of natural and man-made pollutants that are central to bioremediation and phytoremediation strategies and the production of biomolecules of economic value, as well as detrimental effects, such as multidrug resistance, in human pathogenic strains. Here, we discuss the versatility of the bacteria in the genus Stenotrophomonas and the insight that comparative genomic analysis of clinical and endophytic isolates of S. maltophilia has brought to our understanding of the adaptation of this genus to various niches.

Journal ArticleDOI
TL;DR: P piggyBac transposon–based reprogramming may be used to generate therapeutically applicable iPSCs and could be identified by negative selection.
Abstract: Induced pluripotent stem cells (iPSCs) have been generated from somatic cells by transgenic expression of Oct4 (Pou5f1), Sox2, Klf4 and Myc. A major difficulty in the application of this technology for regenerative medicine, however, is the delivery of reprogramming factors. Whereas retroviral transduction increases the risk of tumorigenicity, transient expression methods have considerably lower reprogramming efficiencies. Here we describe an efficient piggyBac transposon-based approach to generate integration-free iPSCs. Transposons carrying 2A peptide-linked reprogramming factors induced reprogramming of mouse embryonic fibroblasts with equivalent efficiencies to retroviral transduction. We removed transposons from these primary iPSCs by re-expressing transposase. Transgene-free iPSCs could be identified by negative selection. piggyBac excised without a footprint, leaving the iPSC genome without any genetic alteration. iPSCs fulfilled all criteria of pluripotency, such as pluripotency gene expression, teratoma formation and contribution to chimeras. piggyBac transposon-based reprogramming may be used to generate therapeutically applicable iPSCs.