scispace - formally typeset
Search or ask a question

Showing papers by "Robert Gentleman published in 2014"


Journal ArticleDOI
TL;DR: A simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets is described.
Abstract: Motivation: High-throughput ChIP-seq studies typically identify thou- sands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naive background distribution but are of question- able biological relevance. Results: We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. Availability: The motifRG package is publically available via the bioconductor repository. Contact: yzizhen@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online.

37 citations


Journal ArticleDOI
TL;DR: Two R packages are described, g CMAP and gCMAPWeb, which provide a complete framework to construct and query connectivity maps assembled from user-defined collections of differential gene expression data, which facilitate reproducible research through automatic generation of graphical and tabular reports.
Abstract: Connections between disease phenotypes and drug effects can be made by identifying commonalities in the associated patterns of differential gene expression. Searchable databases that record the impacts of chemical or genetic perturbations on the transcriptome—here referred to as ‘connectivity maps’—permit discovery of such commonalities. We describe two R packages, gCMAP and gCMAPWeb, which provide a complete framework to construct and query connectivity maps assembled from user-defined collections of differential gene expression data. Microarray or RNAseq data are processed in a standardized way, and results can be interrogated using various wellestablished gene set enrichment methods. The packages also feature an easy-to-deploy web application that facilitates reproducible research through automatic generation of graphical and tabular reports. Availability and implementation: The gCMAP and gCMAPWeb R packages are freely available for UNIX, Windows and Mac OS X operating systems at Bioconductor (http://www.bioconductor.org). Contact: bourgon.richard@gene.com Supplementary information: Supplementary data are available at Bioinformatics online.

23 citations


Journal ArticleDOI
TL;DR: In the version of this article initially published, in Table 1, Steven Salzberg should have been listed as the second, and not the last, of the creators of the Cufflinks software.
Abstract: Nat. Biotechnol. 31, 894–897 (2013); published online 8 October 2013; corrected after print 9 May 2014 In the version of this article initially published, in Table 1, Steven Salzberg should have been listed as the second, and not the last, of the creators of the Cufflinks software. The error has been corrected in the HTML and PDF versions of the article.

1 citations