Showing papers by "David R. Kelley published in 2021"

PDF

Open Access

Journal Article•DOI•

Semisupervised adversarial neural networks for single-cell classification.

[...]

24 Feb 2021-Genome Research

TL;DR: In this article, a semisupervised, adversarial neural network is proposed to transfer cell identity annotations from one experiment to another by taking advantage of information in both labeled data sets and new, unlabeled data sets.

...read moreread less

Abstract: Annotating cell identities is a common bottleneck in the analysis of single-cell genomics experiments. Here, we present scNym, a semisupervised, adversarial neural network that learns to transfer cell identity annotations from one experiment to another. scNym takes advantage of information in both labeled data sets and new, unlabeled data sets to learn rich representations of cell identity that enable effective annotation transfer. We show that scNym effectively transfers annotations across experiments despite biological and technical differences, achieving performance superior to existing methods. We also show that scNym models can synthesize information from multiple training and target data sets to improve performance. We show that in addition to high accuracy, scNym models are well calibrated and interpretable with saliency methods.

...read moreread less

43 citations

Journal Article•DOI•

Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs.

[...]

Qingbo Wang¹, David R. Kelley, Jacob C. Ulirsch², Jacob C. Ulirsch¹, Masahiro Kanai, Shuvom Sadhuka¹, Shuvom Sadhuka², Ran Cui¹, Ran Cui², Carlos Albors¹, Carlos Albors², Nathan Cheng¹, Nathan Cheng², Yukinori Okada³, François Aguet¹, Kristin G. Ardlie¹, Daniel G. MacArthur⁴, Hilary K. Finucane¹, Hilary K. Finucane² - Show less +15 more•Institutions (4)

Broad Institute¹, Harvard University², Osaka University³, Garvan Institute of Medical Research⁴

07 Jun 2021-Nature Communications

TL;DR: In this paper, the expression modifier score (EMS) is used as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putative causal eQTs, and incorporated into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

...read moreread less

Abstract: The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

...read moreread less

29 citations

Journal Article•DOI•

The landscape of alternative polyadenylation in single cells of the developing mouse embryo

[...]

Vikram Agarwal, Sereno Lopez-Darwin¹, David R. Kelley, Jay Shendure•Institutions (1)

University of Washington¹

24 Aug 2021-Nature Communications

TL;DR: In this article, the authors examined a dataset comprising ~2 million nuclei spanning E9.5-E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA).

...read moreread less

Abstract: 3′ untranslated regions (3′ UTRs) post-transcriptionally regulate mRNA stability, localization, and translation rate. While 3′-UTR isoforms have been globally quantified in limited cell types using bulk measurements, their differential usage among cell types during mammalian development remains poorly characterized. In this study, we examine a dataset comprising ~2 million nuclei spanning E9.5–E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA). We observe a global lengthening of 3′ UTRs across embryonic stages in all cell types, although we detect shorter 3′ UTRs in hematopoietic lineages and longer 3′ UTRs in neuronal cell types within each stage. An analysis of RNA-binding protein (RBP) dynamics identifies ELAV-like family members, which are concomitantly induced in neuronal lineages and developmental stages experiencing 3′-UTR lengthening, as putative regulators of APA. By measuring 3′-UTR isoforms in an expansive single cell dataset, our work provides a transcriptome-wide and organism-wide map of the dynamic landscape of alternative polyadenylation during mammalian organogenesis. Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.

...read moreread less

23 citations

Posted Content•DOI•

Effective gene expression prediction from sequence by integrating long-range interactions

[...]

Z. Avsec¹, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam¹, Agnieszka Grabska-Barwinska, K. R. Taylor, Yannis M. Assael, John M. Jumper, Pushmeet Kohli, David R. Kelley - Show less +6 more•Institutions (1)

Google¹

08 Apr 2021-bioRxiv

TL;DR: In this article, a new deep learning architecture called Enformer is proposed to integrate long-range interactions (up to 100 kb away) in the genome, which can be used to predict gene expression prediction from DNA sequence.

...read moreread less

Abstract: The next phase of genome biology research requires understanding how DNA sequence encodes phenotypes, from the molecular to organismal levels. How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequence through the use of a new deep learning architecture called Enformer that is able to integrate long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Notably, Enformer outperformed the best team on the critical assessment of genome interpretation (CAGI5) challenge for noncoding variant interpretation with no additional training. Furthermore, Enformer learned to predict promoter-enhancer interactions directly from DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of growing human disease associations to cell-type-specific gene regulatory mechanisms and provide a framework to interpret cis-regulatory evolution. To foster these downstream applications, we have made the pre-trained Enformer model openly available, and provide pre-computed effect predictions for all common variants in the 1000 Genomes dataset. One-sentence summary Improved noncoding variant effect prediction and candidate enhancer prioritization from a more accurate sequence to expression model driven by extended long-range interaction modelling.

...read moreread less

20 citations

Journal Article•DOI•

Differentiation reveals latent features of aging and an energy barrier in murine myogenesis.

[...]

Jacob C. Kimmel, Nelda Yi, Margaret Ann Roy, David G. Hendrickson, David R. Kelley - Show less +1 more

27 Apr 2021-Cell Reports

TL;DR: This article performed single-cell RNA sequencing on muscle mononuclear cells from young and aged mice and profile muscle stem cells (MuSCs) and fibro-adipose progenitors (FAPs) after differentiation.

...read moreread less

14 citations

Journal Article•DOI•

Effective gene expression prediction from sequence by integrating long-range interactions.

[...]

Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam¹, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis M. Assael, John M. Jumper, Pushmeet Kohli, David R. Kelley - Show less +6 more•Institutions (1)

Google¹

08 Apr 2021-Nature Methods

TL;DR: In this article, a deep learning architecture called Enformer was proposed to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input.

...read moreread less

Abstract: How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution. By using a new deep learning architecture, Enformer leverages long-range information to improve prediction of gene expression on the basis of DNA sequence.

...read moreread less

9 citations

Posted Content•DOI•

scBasset: Sequence-based modeling of single cell ATAC-seq using convolutional neural networks

[...]

Han Yuan, David R. Kelley

10 Sep 2021-bioRxiv

TL;DR: In this article, a sequence-based convolutional neural network (SCASSet) was proposed to model scATAC data, leveraging the DNA sequence information underlying accessibility peaks and the expressiveness of a neural network model.

...read moreread less

Abstract: 1 Abstract Single cell ATAC-seq (scATAC) shows great promise for studying cellular heterogeneity in epigenetic landscapes, but there remain significant challenges in the analysis of scATAC data due to the inherent high dimensionality and sparsity. Here we introduce scBasset, a sequence-based convolutional neural network method to model scATAC data. We show that by leveraging the DNA sequence information underlying accessibility peaks and the expressiveness of a neural network model, scBasset achieves state-of-the-art performance across a variety of tasks on scATAC and single cell multiome datasets, including cell type identification, scATAC profile denoising, data integration across assays, and transcription factor activity inference.

...read moreread less

4 citations

Posted Content•DOI•

The landscape of alternative polyadenylation in single cells of the developing mouse embryo

[...]

Vikram Agarwal, Sereno Lopez-Darwin¹, David R. Kelley, Jay Shendure¹, Jay Shendure² - Show less +1 more•Institutions (2)

University of Washington¹, Howard Hughes Medical Institute²

22 Jan 2021-bioRxiv

TL;DR: In this article, the authors examined a dataset comprising ~2 million cells spanning E9.5-E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA).

...read moreread less

Abstract: 39 untranslated regions (39 UTRs) post-transcriptionally regulate mRNA stability, localization, and translation rate. While 39-UTR isoforms have been globally quantified in limited cell types using bulk measurements, their differential usage among cell types during mammalian development remains poorly characterized. In this study, we examined a dataset comprising ~2 million cells spanning E9.5-E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA). We observe a global lengthening of 39 UTRs across embryonic stages in all cell types, although we detect shorter 39 UTRs in hematopoietic lineages and longer 39 UTRs in neuronal cell types within each stage. While the majority of individual genes possess 39 UTRs that lengthen with time, a subset appear to be spatiotemporally regulated through APA. By measuring 39-UTR isoforms in an expansive single cell dataset, our work provides a transcriptome-wide and organism-wide map of the dynamic landscape of alternative polyadenylation during mammalian organogenesis.

...read moreread less

1 citations

Posted Content•DOI•

Revisiting the Hayflick Limit: Insights from an Integrated Analysis of Changing Transcripts, Proteins, Metabolites and Chromatin

[...]

04 May 2021-bioRxiv

TL;DR: In this article, the authors revisited Hayflick's original observation of RS in human fetal lung fibroblasts equipped with a battery of high dimensional modern techniques and analytical methods to deeply profile the process of RS across each aspect of the central dogma.

...read moreread less

Abstract: Replicative senescence (RS) as a model has become the central focus of research into cellular aging in vitro. Despite decades of study, this process through which cells cease dividing is not fully understood in culture, and even much less so in vivo during development and with aging. Here, we revisit Hayflick’s original observation of RS in WI-38 human fetal lung fibroblasts equipped with a battery of high dimensional modern techniques and analytical methods to deeply profile the process of RS across each aspect of the central dogma and beyond. We applied and integrated RNA-seq, proteomics, metabolomics, and ATAC-seq to a high resolution RS time course. We found that the transcriptional changes that underlie RS manifest early, gradually increase, and correspond to a concomitant global increase in accessibility in nucleolar and lamin associated domains. During RS WI-38 fibroblast gene expression patterns acquire a striking resemblance to those of myofibroblasts in a process similar to the epithelial to mesenchymal transition (EMT). This observation is supported at the transcriptional, proteomic, and metabolomic levels of cellular biology. In addition, we provide evidence suggesting that this conversion is regulated by the transcription factors YAP1/TEAD1 and the signaling molecule TGF-β2.

...read moreread less

1 citations