Institution

Wellcome Trust Sanger Institute

Nonprofit•Cambridge, United Kingdom•

About: Wellcome Trust Sanger Institute is a nonprofit organization based out in Cambridge, United Kingdom. It is known for research contribution in the topics: Population & Genome. The organization has 4009 authors who have published 9671 publications receiving 1224479 citations.

...read moreread less

Topics: Population, Genome, Gene, Genome-wide association study, Genomics ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Orphan CpG islands identify numerous conserved promoters in the mammalian genome

[...]

Robert S. Illingworth¹, Ulrike Gruenewald-Schneider¹, Shaun Webb¹, Alastair R.W. Kerr¹, Keith D. James², Daniel J. Turner², Colin Smith¹, David J. Harrison³, Robert Andrews², Adrian Bird¹ - Show less +6 more•Institutions (3)

University of Edinburgh¹, Wellcome Trust Sanger Institute², Western General Hospital³

23 Sep 2010-PLOS Genetics

TL;DR: It is found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes, supporting the hypothesis that these two properties are mechanistically interdependent.

...read moreread less

Abstract: CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are “orphans” that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development.

...read moreread less

554 citations

Journal Article•DOI•

Genome-Wide Associations of Gene Expression Variation in Humans

[...]

Barbara E. Stranger¹, Matthew S. Forrest¹, Andrew G. Clark², Andrew G. Clark¹, Mark J Minichiello¹, Samuel Deutsch³, Samuel Deutsch¹, Robert Lyle³, Robert Lyle¹, Sarah E. Hunt¹, Brenda Kahl⁴, Brenda Kahl¹, Stylianos E. Antonarakis¹, Stylianos E. Antonarakis³, Simon Tavaré⁵, Simon Tavaré⁶, Simon Tavaré¹, Panagiotis Deloukas¹, Emmanouil T. Dermitzakis¹ - Show less +15 more•Institutions (6)

Wellcome Trust Sanger Institute¹, Cornell University², University of Geneva³, Illumina⁴, University of Cambridge⁵, University of Southern California⁶

16 Dec 2005-PLOS Genetics

TL;DR: The results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans.

...read moreread less

Abstract: The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

...read moreread less

553 citations

Journal Article•DOI•

COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer

[...]

Simon A. Forbes¹, Gurpreet Tang¹, Nidhi Bindal¹, Sally Bamford¹, Elisabeth Dawson¹, Charlotte G. Cole¹, Chai Yin Kok¹, Mingming Jia¹, Rebecca Ewing¹, Andrew Menzies¹, Jon W. Teague¹, Michael R. Stratton¹, P. Andrew Futreal - Show less +9 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2010-Nucleic Acids Research

TL;DR: Examination of COSMIC’s data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype, and Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

...read moreread less

Abstract: The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13 423 genes in almost 370 000 tumours, describing over 90 000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world’s literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC’s data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

...read moreread less

553 citations

Journal Article•DOI•

Multilocus Sequence Typing as a Replacement for Serotyping in Salmonella enterica

[...]

Mark Achtman¹, John Wain², John Wain³, François-Xavier Weill⁴, Satheesh Nair², Satheesh Nair³, Zhemin Zhou⁵, Vartul Sangal¹, Mary G. Krauland⁶, James Hale⁵, Heather Harbottle⁷, Alexandra Uesbeck⁸, Gordon Dougan², Lee H. Harrison⁶, Sylvain Brisse⁴ - Show less +11 more•Institutions (8)

Max Planck Society¹, Wellcome Trust Sanger Institute², Health Protection Agency³, Pasteur Institute⁴, University College Cork⁵, University of Pittsburgh⁶, Food and Drug Administration⁷, University of Cologne⁸

21 Jun 2012-PLOS Pathogens

TL;DR: It is recommended that Salmonella classification by serotyping should be replaced by MLST or its equivalents as it confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings.

...read moreread less

Abstract: Salmonella enterica subspecies enterica is traditionally subdivided into serovars by serological and nutritional characteristics. We used Multilocus Sequence Typing (MLST) to assign 4,257 isolates from 554 serovars to 1092 sequence types (STs). The majority of the isolates and many STs were grouped into 138 genetically closely related clusters called eBurstGroups (eBGs). Many eBGs correspond to a serovar, for example most Typhimurium are in eBG1 and most Enteritidis are in eBG4, but many eBGs contained more than one serovar. Furthermore, most serovars were polyphyletic and are distributed across multiple unrelated eBGs. Thus, serovar designations confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings. An inability of serotyping to correctly group isolates was most apparent for Paratyphi B and its variant Java. Most Paratyphi B were included within a sub-cluster of STs belonging to eBG5, which also encompasses a separate sub-cluster of Java STs. However, diphasic Java variants were also found in two other eBGs and monophasic Java variants were in four other eBGs or STs, one of which is in subspecies salamae and a second of which includes isolates assigned to Enteritidis, Dublin and monophasic Paratyphi B. Similarly, Choleraesuis was found in eBG6 and is closely related to Paratyphi C, which is in eBG20. However, Choleraesuis var. Decatur consists of isolates from seven other, unrelated eBGs or STs. The serological assignment of these Decatur isolates to Choleraesuis likely reflects lateral gene transfer of flagellar genes between unrelated bacteria plus purifying selection. By confounding multiple evolutionary groups, serotyping can be misleading about the disease potential of S. enterica. Unlike serotyping, MLST recognizes evolutionary groupings and we recommend that Salmonella classification by serotyping should be replaced by MLST or its equivalents.

...read moreread less

552 citations

Journal Article•DOI•

Genomic analysis of human microRNA transcripts

[...]

Harpreet K Saini¹, Sam Griffiths-Jones, Anton J. Enright•Institutions (1)

Wellcome Trust Sanger Institute¹

06 Nov 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A large-scale analysis of transcription start sites, polyadenylation signals, CpG islands, EST data, transcription factor-binding sites, and expression ditag data surrounding intergenic miRNAs in the human genome is performed to improve understanding of the structure of their primary transcripts.

...read moreread less

Abstract: MicroRNAs (miRNAs) are important genetic regulators of development, differentiation, growth, and metabolism. The mammalian genome encodes ≈500 known miRNA genes. Approximately 50% are expressed from non-protein-coding transcripts, whereas the rest are located mostly in the introns of coding genes. Intronic miRNAs are generally transcribed coincidentally with their host genes. However, the nature of the primary transcript of intergenic miRNAs is largely unknown. We have performed a large-scale analysis of transcription start sites, polyadenylation signals, CpG islands, EST data, transcription factor-binding sites, and expression ditag data surrounding intergenic miRNAs in the human genome to improve our understanding of the structure of their primary transcripts. We show that a significant fraction of primary transcripts of intergenic miRNAs are 3–4 kb in length, with clearly defined 5′ and 3′ boundaries. We provide strong evidence for the complete transcript structure of a small number of human miRNAs.

...read moreread less

552 citations

Collapse

Authors

Showing all 4058 results

Name	H-index	Papers	Citations
Nicholas J. Wareham	212	1657	204896
Gonçalo R. Abecasis	179	595	230323
Panos Deloukas	162	410	154018
Michael R. Stratton	161	443	142586
David W. Johnson	160	2714	140778
Michael John Owen	160	1110	135795
Naveed Sattar	155	1326	116368
Robert E. W. Hancock	152	775	88481
Julian Parkhill	149	759	104736
Nilesh J. Samani	149	779	113545
Michael Conlon O'Donovan	142	736	118857
Jian Yang	142	1818	111166
Christof Koch	141	712	105221
Andrew G. Clark	140	823	123333
Stylianos E. Antonarakis	138	746	93605