Institution

Wellcome Trust Sanger Institute

Nonprofit•Cambridge, United Kingdom•

About: Wellcome Trust Sanger Institute is a nonprofit organization based out in Cambridge, United Kingdom. It is known for research contribution in the topics: Population & Genome. The organization has 4009 authors who have published 9671 publications receiving 1224479 citations.

...read moreread less

Topics: Population, Genome, Gene, Genome-wide association study, Genomics ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A haplotype map of the human genome

[...]

John W. Belmont¹, Andrew Boudreau, Suzanne M. Leal¹, Paul Hardenbol +229 more•Institutions (40)

27 Oct 2005

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

...read moreread less

5,479 citations

Journal Article•DOI•

InterProScan 5: genome-scale protein function classification

[...]

Philip Jones¹, David Binns¹, Hsin-Yu Chang¹, Matthew Fraser¹, Weizhong Li¹, Craig McAnulla¹, Hamish McWilliam¹, John Maslen¹, Alex L. Mitchell¹, Gift Nuka¹, Sebastien Pesseat¹, Antony F. Quinn¹, Amaia Sangrador-Vegas¹, Maxim Scheremetjew¹, Siew-Yit Yong¹, Rodrigo Lopez¹, Sarah Hunter¹ - Show less +13 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 May 2014-Bioinformatics

TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.

...read moreread less

Abstract: Motivation: Robust, large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterise many millions of sequences. Here we describe a new Java-based architecture for the widely-used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete re-implementation of the software framework, resulting in a flexible and stable system that is able to utilise both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the (open) source code is hosted at Google Code. Availability: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk

...read moreread less

5,434 citations

Journal Article•DOI•

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

[...]

Ewan Birney, John A. Stamatoyannopoulos¹, Anindya Dutta², Roderic Guigó³ +317 more•Institutions (44)

14 Jun 2007-Nature

TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.

...read moreread less

Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

...read moreread less

5,091 citations

Journal Article•DOI•

The mutational constraint spectrum quantified from variation in 141,456 humans

[...]

Konrad J. Karczewski¹, Laurent C. Francioli¹, Grace Tiao¹, Beryl B. Cummings¹, Jessica Alföldi¹, Qingbo Wang¹, Ryan L. Collins¹, Kristen M. Laricchia¹, Andrea Ganna¹, Daniel P. Birnbaum¹, Laura D. Gauthier¹, Harrison Brand¹, Matthew Solomonson¹, Nicholas A. Watts¹, Daniel R. Rhodes², Moriel Singer-Berk¹, Eleina M. England¹, Eleanor G. Seaby¹, Jack A. Kosmicki¹, Raymond K. Walters¹, Katherine Tashman¹, Yossi Farjoun¹, Eric Banks¹, Timothy Poterba¹, Arcturus Wang¹, Cotton Seed¹, Nicola Whiffin¹, Jessica X. Chong³, Kaitlin E. Samocha⁴, Emma Pierce-Hoffman¹, Zachary Zappala¹, Anne H. O’Donnell-Luria¹, Eric Vallabh Minikel¹, Ben Weisburd¹, Monkol Lek⁵, James S. Ware¹, Christopher Vittal⁶, Irina M. Armean¹, Louis Bergelson¹, Kristian Cibulskis¹, Kristen M. Connolly¹, Miguel Covarrubias¹, Stacey Donnelly¹, Steven Ferriera¹, Stacey Gabriel¹, Jeff Gentry¹, Namrata Gupta¹, Thibault Jeandet¹, Diane Kaplan¹, Christopher Llanwarne¹, Ruchi Munshi¹, Sam Novod¹, Nikelle Petrillo¹, David Roazen¹, Valentin Ruano-Rubio¹, Andrea Saltzman¹, Molly Schleicher¹, Jose Soto¹, Kathleen Tibbetts¹, Charlotte Tolonen¹, Gordon Wade¹, Michael E. Talkowski¹, Benjamin M. Neale¹, Mark J. Daly¹, Daniel G. MacArthur¹ - Show less +61 more•Institutions (6)

Broad Institute¹, Queen Mary University of London², University of Washington³, Wellcome Trust Sanger Institute⁴, Yale University⁵, Harvard University⁶

27 May 2020-Nature

TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

...read moreread less

4,913 citations

Journal Article•DOI•

The Pfam protein families database: towards a more sustainable future

[...]

Robert D. Finn¹, Penelope Coggill¹, Ruth Y. Eberhardt¹, Ruth Y. Eberhardt², Sean R. Eddy³, Sean R. Eddy⁴, Jaina Mistry¹, Alex L. Mitchell¹, Simon C. Potter¹, Marco Punta⁵, Marco Punta¹, Matloob Qureshi¹, Amaia Sangrador-Vegas¹, Gustavo A. Salazar¹, John Tate¹, John Tate², Alex Bateman¹ - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², Howard Hughes Medical Institute³, Harvard University⁴, University of Paris⁵

04 Jan 2016-Nucleic Acids Research

TL;DR: Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set, and the facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

...read moreread less

Abstract: In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

...read moreread less

4,906 citations

Collapse

Authors

Showing all 4058 results

Name	H-index	Papers	Citations
Nicholas J. Wareham	212	1657	204896
Gonçalo R. Abecasis	179	595	230323
Panos Deloukas	162	410	154018
Michael R. Stratton	161	443	142586
David W. Johnson	160	2714	140778
Michael John Owen	160	1110	135795
Naveed Sattar	155	1326	116368
Robert E. W. Hancock	152	775	88481
Julian Parkhill	149	759	104736
Nilesh J. Samani	149	779	113545
Michael Conlon O'Donovan	142	736	118857
Jian Yang	142	1818	111166
Christof Koch	141	712	105221
Andrew G. Clark	140	823	123333
Stylianos E. Antonarakis	138	746	93605