scispace - formally typeset
Open accessJournal ArticleDOI: 10.1038/S41588-021-00800-7

Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics.

04 Mar 2021-Nature Genetics (Springer Science and Business Media LLC)-Vol. 53, Iss: 3, pp 313-321
Abstract: Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.

... read more

Citations
  More

16 results found


Open accessJournal ArticleDOI: 10.1038/S41588-021-00801-6
Julie Jerber1, Daniel D Seaton2, Anna S E Cuomo2, Natsuhiko Kumasaka1  +15 moreInstitutions (4)
04 Mar 2021-Nature Genetics
Abstract: Studying the function of common genetic variants in primary human tissues and during development is challenging. To address this, we use an efficient multiplexing strategy to differentiate 215 human induced pluripotent stem cell (iPSC) lines toward a midbrain neural fate, including dopaminergic neurons, and use single-cell RNA sequencing (scRNA-seq) to profile over 1 million cells across three differentiation time points. The proportion of neurons produced by each cell line is highly reproducible and is predictable by robust molecular markers expressed in pluripotent cells. Expression quantitative trait loci (eQTL) were characterized at different stages of neuronal development and in response to rotenone-induced oxidative stress. Of these, 1,284 eQTL colocalize with known neurological trait risk loci, and 46% are not found in the Genotype-Tissue Expression (GTEx) catalog. Our study illustrates how coupling scRNA-seq with long-term iPSC differentiation enables mechanistic studies of human trait-associated genetic variants in otherwise inaccessible cell states.

... read more

38 Citations


Open accessJournal ArticleDOI: 10.7554/ELIFE.67077
14 May 2021-eLife
Abstract: The activity of the genes in a cell depends on the type of cell they are in, the interactions with other genes, the environment and genetics. Active genes produce a greater number of mRNA molecules, which act as messenger molecules to instruct the cell to produce proteins. The amount of mRNA molecules in cells can be measured to assess the levels of gene activity. Genes produce mRNAs through a process called transcription, and the collection of all the mRNA molecules in a cell is called the transcriptome. Cells obtained from human samples can be grown in the lab under different conditions, and this can be used to transform them into different types of cells. These cells can then be exposed to different treatments – such as specific chemicals – to understand how the environment affects them. Cells derived from different people may respond differently to the same treatment based on their unique genetics. Exposing different types of cells from many people to different treatments can help explain how genetics, the environment and cell type affect gene activity. Findley et al. grew three different types of cells from six different people in the lab. The cells were exposed to 28 different treatments, which reflect different environmental changes. Studying all these different factors together allowed Findley et al. to understand how genetics, cell type and environment affect the activity of over 53,000 genes. Around half of the effects due to an interaction between genetics and the environment and had not been seen in other larger studies of the transcriptome. Many of these newly observed changes are in genes that have connections to different diseases, including heart disease. The results of Findley et al. provide evidence indicating to which extent lifestyle and the environment can interact with an individual’s genetic makeup to impact gene activity and long-term health. The more researchers can understand these factors, the more useful they can be in helping to predict, detect and treat illnesses. The findings also show how genes and the environment interact, which may be relevant to understanding disease development. There is more work to be done to understand a wider range of environmental factors across more cell types. It will also be important to establish how this work on cells grown in the lab translates to human health.

... read more

Topics: Cell type (52%), Transcriptome (52%)

6 Citations


Open accessJournal ArticleDOI: 10.1186/S13059-021-02407-X
24 Jun 2021-Genome Biology
Abstract: Background Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease. Results While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. Here, we evaluate the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches. Conclusion We provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.

... read more

5 Citations


Open accessJournal ArticleDOI: 10.1111/FEBS.15885
30 Apr 2021-FEBS Journal
Abstract: The study of human evolution, long constrained by a lack of experimental model systems, has been transformed by the emergence of the induced pluripotent stem cell (iPSC) field. iPSCs can be readily established from noninvasive tissue sources, from both humans and other primates; they can be maintained in the laboratory indefinitely, and they can be differentiated into other tissue types. These qualities mean that iPSCs are rapidly becoming established as viable and powerful model systems with which it is possible to address questions in human evolution that were until now logistically and ethically intractable, especially in the quest to understand humans' place among the great apes, and the genetic basis of human uniqueness. In this review, we discuss the key lessons and takeaways of this nascent field; from the types of research, iPSCs make possible to lingering challenges and likely future directions. We provide a comprehensive overview of how the seemingly unlikely combination of iPSCs and explicit evolutionary frameworks is transforming what is possible in our understanding of humanity's past and present.

... read more

4 Citations


Open accessPosted ContentDOI: 10.1101/2021.01.20.427401
21 Jan 2021-bioRxiv
Abstract: Single-cell RNA-sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states, and promises to improve our understanding of genetic regulation across tissues in both health and disease. While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimise sc-eQTL mapping. Here, we evaluate the role of different normalisation and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches and provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.

... read more

3 Citations


References
  More

70 results found


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTP352
Heng Li1, Bob Handsaker2, Alec Wysoker2, T. J. Fennell2  +5 moreInstitutions (4)
01 Aug 2009-Bioinformatics
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

... read more

Topics: Variant Call Format (62%), Stockholm format (61%), FASTQ format (56%) ... read more

35,747 Citations


Open accessJournal ArticleDOI: 10.1086/519795
Shaun Purcell1, Shaun Purcell2, Benjamin M. Neale3, Benjamin M. Neale1  +14 moreInstitutions (4)
Abstract: Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

... read more

22,115 Citations


Open accessJournal ArticleDOI: 10.1038/NATURE15393
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

... read more

Topics: 1000 Genomes Project (62%), Exome sequencing (59%), Genome-wide association study (59%) ... read more

9,821 Citations


Open accessJournal ArticleDOI: 10.1073/PNAS.1530509100
John D. Storey, Robert Tibshirani1Institutions (1)
Abstract: With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

... read more

Topics: False positive paradox (65%), False positive rate (62%), Multiple comparisons problem (60%) ... read more

8,626 Citations


Open accessJournal ArticleDOI: 10.1038/NATURE09270
05 Aug 2010-Nature
Abstract: Plasma concentrations of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with plasma lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 x 10(-8)), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (for example, CYP7A1, NPC1L1 and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and have an impact on lipid traits in three non-European populations (East Asians, South Asians and African Americans). Our results identify several novel loci associated with plasma lipids that are also associated with CAD. Finally, we validated three of the novel genes-GALNT2, PPP1R3B and TTC39B-with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.

... read more

Topics: Lipid metabolism (58%), Blood lipids (55%), SCARB1 (54%) ... read more

3,188 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202116