scispace - formally typeset
Open AccessJournal ArticleDOI

The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices.

Reads0
Chats0
TLDR
The GCTx file format and a suite of open‐source packages for the efficient storage, serialization and analysis of dense two‐dimensional matrices are presented and it is anticipated that the format's generalizability will lower barriers for integrated cross‐assay analysis and algorithm development.
Abstract
Motivation Facilitated by technological improvements, pharmacologic and genetic perturbational datasets have grown in recent years to include millions of experiments Sharing and publicly distributing these diverse data creates many opportunities for discovery, but in recent years the unprecedented size of data generated and its complex associated metadata have also created data storage and integration challenges Results We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization and analysis of dense two-dimensional matrices We have extensively used the format in the Connectivity Map to assemble and share massive datasets currently comprising 13 million experiments, and we anticipate that the format's generalizability, paired with code libraries that we provide, will lower barriers for integrated cross-assay analysis and algorithm development Availability and implementation Software packages (available in Python, R, Matlab and Java) are freely available at https://githubcom/cmap Additional instructions, tutorials and datasets are available at clueio/code Supplementary information Supplementary data are available at Bioinformatics online

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19

TL;DR: Three network-based drug repurposing strategies are deployed, relying on network proximity, diffusion, and AI-based metrics, allowing to rank all approved drugs based on their likely efficacy for COVID-19 patients, and aggregate all predictions, to arrive at 81 promising repurpose candidates.
Journal ArticleDOI

Cas9 activates the p53 pathway and selects for p53-inactivating mutations.

TL;DR: Genetic and transcriptional consequences of Cas9 expression induces DNA damage and activates the p53 pathway, and it can lead to the selection of cells with p53-inactivating mutations, and Cas9 is less active in wild-type TP53 cell lines than in TP53- mutant cell lines.
Journal ArticleDOI

Deep learning of pharmacogenomics resources: moving towards precision oncology.

TL;DR: This review provides an in-depth summary of state-of-the-art DL methods and up-to-date pharmacogenomics resources and future opportunities and challenges to realize the goal of precision oncology.
References
More filters
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI

The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease

TL;DR: The first installment of a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules is created, and it is demonstrated that this “Connectivity Map” resource can be used to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs.
Journal ArticleDOI

Molecular signatures database (MSigDB) 3.0

TL;DR: A new version of the database, MSigDB 3.0, is reported, with over 6700 gene sets, a complete revision of the collection of canonical pathways and experimental signatures from publications, enhanced annotations and upgrades to the web site.
Journal ArticleDOI

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

TL;DR: The expanded CMap is reported, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that is shown to be highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts.
Related Papers (5)