scispace - formally typeset
Search or ask a question
Author

Ryan K. Dale

Other affiliations: University of Delaware
Bio: Ryan K. Dale is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Chromatin & Medicine. The author has an hindex of 21, co-authored 51 publications receiving 3227 citations. Previous affiliations of Ryan K. Dale include University of Delaware.
Topics: Chromatin, Medicine, Enhancer, CTCF, Gene


Papers
More filters
Journal ArticleDOI
11 Jul 2013-Nature
TL;DR: A systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes, provides an unprecedented overview of RNA- binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukARYotes.
Abstract: RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.

1,299 citations

Journal ArticleDOI
TL;DR: The present Bioconda, a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda, improves analysis reproducibility by allowing users to define isolated environments with defined software versions.
Abstract: We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.

699 citations

Journal ArticleDOI
TL;DR: Pybedtools as discussed by the authors is a Python software library for manipulating and exploring genomic datasets in many common formats and provides an intuitive Python interface that extends upon the popular BEDTools genome arithmetic tools, allowing researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses.
Abstract: Summary: pybedtools is a flexible Python software library for manipulating and exploring genomic datasets in many common formats. It provides an intuitive Python interface that extends upon the popular BEDTools genome arithmetic tools. The library is well documented and efficient, and allows researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses. Availability: pybedtools is maintained under the GPL license. Stable versions of pybedtools as well as documentation are available on the Python Package Index at http://pypi.python.org/pypi/pybedtools. Contact: vog.hin.kddin@rrelad; ude.ainigriv@x5qra Supplementary Information: Supplementary data are available at Bioinformatics online.

386 citations

Journal ArticleDOI
TL;DR: Results support a genome-wide role for CTCF/cohesin sites through loop formation that both influences transcription and contributes to cell-type-specific chromatin organization and function.
Abstract: CTCF sites are abundant in the genomes of diverse species but their function is enigmatic. We used chromosome conformation capture to determine long-range interactions among CTCF/cohesin sites over 2 Mb on human chromosome 11 encompassing the β-globin locus and flanking olfactory receptor genes. Although CTCF occupies these sites in both erythroid K562 cells and fibroblast 293T cells, the long-range interaction frequencies among the sites are highly cell type specific, revealing a more densely clustered organization in the absence of globin gene activity. Both CTCF and cohesins are required for the cell-type-specific chromatin conformation. Furthermore, loss of the organizational loops in K562 cells through reduction of CTCF with shRNA results in acquisition of repressive histone marks in the globin locus and reduces globin gene expression whereas silent flanking olfactory receptor genes are unaffected. These results support a genome-wide role for CTCF/cohesin sites through loop formation that both influences transcription and contributes to cell-type-specific chromatin organization and function.

265 citations

Journal ArticleDOI
TL;DR: It is found thatDSX(F) and DSX(M) bind thousands of the same targets in multiple tissues in both sexes, yet these targets have sex- and tissue-specific functions.

124 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

15,744 citations

Journal ArticleDOI
TL;DR: Key statistics on the current data contents and volume of downloads are outlined, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas are outlined.
Abstract: The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

5,735 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.
Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

2,448 citations

Journal ArticleDOI
TL;DR: This work shows that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery.
Abstract: The binding specificities of RNA- and DNA-binding proteins are determined from experimental data using a ‘deep learning’ approach.

2,352 citations