scispace - formally typeset
Search or ask a question
Institution

Wellcome Trust Sanger Institute

NonprofitCambridge, United Kingdom
About: Wellcome Trust Sanger Institute is a nonprofit organization based out in Cambridge, United Kingdom. It is known for research contribution in the topics: Population & Genome. The organization has 4009 authors who have published 9671 publications receiving 1224479 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with Hap Map3 SNPs located in cis to the genes, offering a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
Abstract: The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

501 citations

Journal ArticleDOI
TL;DR: This study confirms the global dispersal of a single E. coli ST131 clone and demonstrates the role of MGEs and recombination in the evolution of this important MDR pathogen.
Abstract: Escherichia coli sequence type 131 (ST131) is a globally disseminated, multidrug resistant (MDR) clone responsible for a high proportion of urinary tract and bloodstream infections. The rapid emergence and successful spread of E. coli ST131 is strongly associated with several factors, including resistance to fluoroquinolones, high virulence gene content, the possession of the type 1 fimbriae FimH30 allele, and the production of the CTX-M-15 extended spectrum β-lactamase (ESBL). Here, we used genome sequencing to examine the molecular epidemiology of a collection of E. coli ST131 strains isolated from six distinct geographical locations across the world spanning 2000-2011. The global phylogeny of E. coli ST131, determined from whole-genome sequence data, revealed a single lineage of E. coli ST131 distinct from other extraintestinal E. coli strains within the B2 phylogroup. Three closely related E. coli ST131 sublineages were identified, with little association to geographic origin. The majority of single-nucleotide variants associated with each of the sublineages were due to recombination in regions adjacent to mobile genetic elements (MGEs). The most prevalent sublineage of ST131 strains was characterized by fluoroquinolone resistance, and a distinct virulence factor and MGE profile. Four different variants of the CTX-M ESBL-resistance gene were identified in our ST131 strains, with acquisition of CTX-M-15 representing a defining feature of a discrete but geographically dispersed ST131 sublineage. This study confirms the global dispersal of a single E. coli ST131 clone and demonstrates the role of MGEs and recombination in the evolution of this important MDR pathogen.

499 citations

Journal ArticleDOI
TL;DR: Embryonic stem cell culture conditions are important for maintaining long-term self-renewal, and they influence cellular pluripotency state, with 2i being the most similar to blastocyst cells and including a subpopulation resembling the two-cell embryo state.

499 citations

Journal ArticleDOI
TL;DR: A set of 2001 potential non-coding genes are described based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments.
Abstract: Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

499 citations

Journal ArticleDOI
TL;DR: This work describes a new statistical method, EmptyDrops, based on detecting significant deviations from the expression profile of the ambient solution that retains distinct cell types that would have been discarded by existing methods in several real data sets.
Abstract: Droplet-based single-cell RNA sequencing protocols have dramatically increased the throughput of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that EmptyDrops has greater power than existing approaches while controlling the false discovery rate among detected cells. Our method also retains distinct cell types that would have been discarded by existing methods in several real data sets.

499 citations


Authors

Showing all 4058 results

NameH-indexPapersCitations
Nicholas J. Wareham2121657204896
Gonçalo R. Abecasis179595230323
Panos Deloukas162410154018
Michael R. Stratton161443142586
David W. Johnson1602714140778
Michael John Owen1601110135795
Naveed Sattar1551326116368
Robert E. W. Hancock15277588481
Julian Parkhill149759104736
Nilesh J. Samani149779113545
Michael Conlon O'Donovan142736118857
Jian Yang1421818111166
Christof Koch141712105221
Andrew G. Clark140823123333
Stylianos E. Antonarakis13874693605
Network Information
Related Institutions (5)
Broad Institute
11.6K papers, 1.5M citations

96% related

Howard Hughes Medical Institute
34.6K papers, 5.2M citations

95% related

Laboratory of Molecular Biology
24.2K papers, 2.1M citations

94% related

Salk Institute for Biological Studies
13.1K papers, 1.6M citations

93% related

National Institutes of Health
297.8K papers, 21.3M citations

93% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202317
202270
2021836
2020810
2019854
2018764