scispace - formally typeset
Open AccessJournal ArticleDOI

Analyzing and interpreting genome data at the network level with ConsensusPathDB.

Reads0
Chats0
TLDR
This protocol describes the use of ConsensusPathDB with respect to the functional and network-based characterization of biomolecules that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq.
Abstract
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.

read more

Content maybe subject to copyright    Report

© 2016 Nature America, Inc. All rights reserved.
PROTOCOL
NATURE PROTOCOLS
|
VOL.11 NO.10
|
2016
|
1889
INTRODUCTION
Modern high-throughput experiments such as sequencing,
microarray technology or mass spectrometry (MS) experiments
generate large genome-wide data sets that provide deep insight
into many different levels of molecular informatione.g.,
the transcriptome, proteome and metabolome, among others.
Such information is used, for example, to characterize patient
genomes using multiomics data
1
, to describe developmental
processes with temporal changes
2
or to derive predictive pat-
terns for exogenous agents
3
. An emerging goal of data analysis
is to reveal the underlying control mechanisms that govern the
measured molecular phenotypes.
Typically, a key result of genome analysis is a list of statisti-
cally significant biomolecules (genes, proteins, metabolites) that
contribute to the phenotypes of interest. A subsequent task then
is to identify which biological functions can be associated with
these molecules (over-representation analysis)
4
. This is done
mainly by exploring whether predefined annotation setsfor
example, specific signaling pathwaysare enriched by the
molecules under consideration. Independently, such enrich-
ments can be inferred without statistical preselection of the
molecules using the entirety of the experimental data ((gene set)
enrichment analysis)
5
. Furthermore, data for all or a prioritized
subset of molecules can be mapped onto interaction networks
and analyzed with graph theoretic approaches. These methods
identify subnetworks (network module analysis) that are likely
to be responsive to the experiments under analysis
6
. All three
approaches aim at enriching genome analysis with mechanistic
network information, which enables an understanding of the
underlying biological processes.
In ConsensusPathDB
7
, we have implemented statistical methods
for performing the above tasks by interrogating annotation sets
based on molecular interaction information. We agglomerated
the contents of 32 major public repositories for human molecu-
lar interactions of heterogeneous types, as well as biochemical
pathways, resulting in one of the largest interactome collections
available (Table 1). Furthermore, the database integrates the
contents of 15 mouse and 14 yeast interaction repositories. In
addition to gene ontology
8
(GO) and pathway annotations,
ConsensusPathDB systematically explores the protein–protein
interaction (PPI) network, as PPIs are key drivers of biologi-
cal function
9
. However, only a minor fraction of the estimated
~650,000 human protein interactions have yet been experi-
mentally measured
10
. Moreover, information on molecular
interactions is scattered across > 500 different databases world-
wide
11
, which necessitates the integration of as many resources
as possible into meta-databases such as ConsensusPathDB
(Box 1). Such interaction integration allows for better coverage
of the interactome, which improves guidance in the functional
interpretation of omics data.
ConsensusPathDB has been well adopted by the research
community. Applications comprise over-representation analy-
sis in order to characterize diverse sets of molecules
12–14
, gene
set enrichment analysis
15,16
and identification of upstream
regulators
17
spanning various biological contexts. Furthermore,
ConsensusPathDB is used as a database by other tools—for
example, for enrichment analysis by Chipster
18
using web service
connections or by Cytoscape
19
using a Java plugin for assessing
interaction confidence of PPIs
20
. In addition to these analyses,
the tool can be used as a resource for the generation of molecu-
lar interaction gene sets, which themselves can be used as pre-
dictive signatures. For example, it has been shown that network
modules and pathways can be derived as predictive patterns in
cancer diagnostics
21
, as well as in tumor progression monitor-
ing
22
. This enables biomarker analysis of entities ranging from
single molecules to entire pathways.
Overview of the protocol
In this protocol, we review the contents and the different
analysis scenarios enabled by ConsensusPathDB. All mod-
ules in this protocol aim to enable network-level interpreta-
tion and functional characterization of user-specified lists of
molecules (genes, proteins and metabolites) and associated
Analyzing and interpreting genome data at the
network level with ConsensusPathDB
Ralf Herwig
1
, Christopher Hardt
1
, Matthias Lienhard
1
& Atanas Kamburov
2–4
1
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
2
Department of Pathology and Cancer Center,
Massachusetts General Hospital, Boston, Massachusetts, USA.
3
Harvard Medical School, Boston, Massachusetts, USA.
4
Broad Institute of MIT and Harvard, Cambridge,
Massachusetts, USA. Correspondence should be addressed to R.H. (herwig@molgen.mpg.de) or A.K. (kamburov@broadinstitute.org).
Published online 8 September 2016; doi:10.1038/nprot.2016.117
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data
integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization
tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to
the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to
the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction
network modules, biochemical pathways and functional information that are significantly enriched by the user’s input, applying
computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be
observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-
throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up
functional assay experiments and to generate topology for kinetic models at different scales.

© 2016 Nature America, Inc. All rights reserved.
PROTOCOL
1890
|
VOL.11 NO.10
|
2016
|
NATURE PROTOCOLS
high-throughput data. ConsensusPathDB helps users working
with such data to do the following:
Infer heterogeneous interaction networks for genes, proteins,
metabolites, drugs and other biomolecules
Compute over-represented pathways, PPI networks, protein
complexes and GO annotations from a priority list of genes,
proteins or metabolites
Compute enriched pathways, PPI networks, protein complexes
and GO annotations from genome-wide data such as RNA-seq
or array technology
Generate network modules that are over-represented by genes
or proteins and thereby explore heterogeneous interactions such
as PPI, drug–target, gene regulatory and genetic interactions.
Comparison with other tools
Several excellent tools, of which only some can be mentioned
here, are available that perform either over-representation anal-
ysis (e.g., DAVID
23
, IPA
24
and Enrichr
25
), gene or metabolite set
enrichment analysis (e.g., GSEA
26
and MetaboAnalyst
27
) or net-
work module analysis (e.g., Cytoscape
28
and Genes2Networks
29
).
Although most of these tools are restricted to specific types of
analysis and to a specific type of biomolecule, ConsensusPathDB
offers a wider range of analysis functions and the option for
gene/protein and metabolite analysis (Table 2). The statistical
methods for over-representation analysis, enrichment analysis
and network module analysis implemented by the individual
tools differ, and thus results achieved with the same input can
be fairly different. With respect to content, ConsensusPathDB
has a focus on molecular interactions, and it provides deep
exploration of the interactome network, protein complexes
and pathway resources, whereas other tools incorporate addi-
tional annotation sets, for example, based on genomic locus
enrichment, disease associations, experimental signatures or
literature-derived sets. Huge collections of such annotation
signatures are accessible through systems such as MSigDB
30
.
Parallel attempts for sampling huge amounts of interac-
tion data within a common framework are STRING
31
and
PathwayCommons
32
.
Limitations of the protocol
ConsensusPathDB currently supports only three organisms
(human, mouse and yeast), and it is thereby missing widely used
model organisms such as rat, fly and worm, among others, for
which comprehensive interaction information has been collected
and made available in the past. Moreover, ConsensusPathDB
does not hold information on microorganisms—e.g., bacteria or
fungi. In cases in which interaction information is available, the
inclusion of more organisms is a key step in the future develop-
ment of ConsensusPathDB.
Another limitation is the focus on annotation sets that are derived
from molecular interactions and GO terminology. As stated in the
previous section (Table 2), several tools incorporate additional infor-
mation that allows for interpretation of data in alternative directions.
However, a review of the literature shows that, by far, most applications
use functional annotation sets defined by GO and pathway annota-
tions, thus justifying the current focus of ConsensusPathDB.
With regard to the web server, ConsensusPathDB has some limi-
tations with respect to visualization components. Presumably the
most widely used tool available in this regard is Cytoscape, and
thus we offer network downloads in a Cytoscape-compatible for-
mat, which enables easy transfer of computed network modules.
For some of its functionality, ConsensusPathDB already offers
web services in order to allow the integration of analysis steps
into automated workflows and stand-alone tools. However, not
all steps described in this protocol are yet implemented as web
services; their further development is a primary future task.
The response time of ConsensusPathDB depends on the size
and complexity of the interaction network under investigation.
For example, performing network analyses with many input nodes
or many different types of interactions can lead to slow response
and limited visualization performance.
Experimental design
Analysis paths. ConsensusPathDB contains predefined annota-
tion sets that hold functional information such as pathways, GO
categories, protein complexes and PPI network neighborhoods
that were derived from the integrated resources.
Depending on the users input, ConsensusPathDB allows the
following analyses (Fig. 1):
Analysis path 1. The interaction neighborhood of a single mol-
ecule can be inferred and a corresponding network can be gener-
ated; this can be done, for example, to reveal network-level in-
formation (i.e., interaction partners) for biomarkers of interest.
Analysis path 2. Uploading a list of molecules (genes, proteins
and metabolites) allows either performing over-representation
analysis with predefined annotation sets or computing network
associations between the molecules of interest through mining
of the integrated interaction network.
Analysis path 3. Inserting molecules and associated experi-
mental data allows computing enrichment analysis of the
annotation sets; this path uses a more unbiased analysis com-
pared with analysis path 2, because it is not dependent on a
predefined priority list of molecules.
To exemplify the procedures in this protocol, we use differ-
ent data sets from various biological backgrounds, and measure
using different high-throughput technologies. For analysis path 1,
TABLE 1
|
Content of ConsensusPathDB.
Content type Human Mouse Yeast
Integrated databases 32 15 14
Unique physical entities 158,523 31,679 17,672
Unique interactions 458,570 34,064 272,094
Gene regulations 17,098 2,196 316
Protein interactions 261,085 23,488 123,842
Genetic interactions 443 194 145,151
Biochemical reactions 21,070 8,186 2,785
Drug–target interactions 158,874 0 0
Pathway gene sets 4,593 2,173 1,101

© 2016 Nature America, Inc. All rights reserved.
PROTOCOL
NATURE PROTOCOLS
|
VOL.11 NO.10
|
2016
|
1891
we exemplify the protocol steps using the epidermal growth factor
receptor (EGFR) gene that is a widely mutated gene in different
types of cancer and also a primary target of cancer therapy
33
.
For analysis path 2, we exemplify the over-representation analy-
sis for genes using a list of 18 frequently mutated genes derived
from whole-exome sequencing of a large lung adenocarcinoma
cohort
1
(Supplementary Data 1). As a test case for over-repre-
sentation analysis of metabolites, we use a list of 130 known ure-
mic toxins that are associated with dysfunction of the kidney
34
(Supplementary Data 2). To demonstrate the network mod-
ule analysis, we examined 691 targets of histone modification
(H3K4me2) measured with ChIP-seq that are specific to T helper
type 2 (T
H
2) cells, as compared with naive T cells (Supplementary
Data 3). The goal of the analysis is to recover potential gene regu-
latory networks controlling these genes, as was done in the original
publication
17
. For analysis path 3, we use public expression data
derived from different stages of human embryonic development
that were generated with RNA-seq
2
(Supplementary Data 4).
These data cover a wide range of genome analysis applications
and diverse biological backgrounds. The corresponding gene lists
vary in size from 18 (lung adenocarcinoma driver mutations) to
~16,000 (RNA-seq data set), demonstrating the scalability of the
ConsensusPathDB analysis tools.
Identifier mapping. A recurrent problem when integrating data
from different resources, or when analyzing high-throughput
data by comparison with existing databases, is the nonuniform-
ity of gene/protein/metabolite identifiers. In ConsensusPathDB,
we have created comprehensive identifier maps by parsing
the contents of 11 major genomic, proteomic and metabolite
databases such as Ensembl, Uniprot and PubChem. These maps
were used to match gene, protein and metabolite identifiers
Box 1
|
Molecular interactions
Molecular interactions are key drivers of cellular function. In the times of omics technology, an ever-increasing number of molecular
interactions are measured and cataloged. For example, huge amounts of PPIs have been measured by co-immunoprecipitation,
tandem-affinity purification and yeast two-hybrid analysis, among others. ChIP-seq experiments allow the charting of protein–DNA
interactions and histone modifications. Phosphoproteome measurements with MS such as ITRAQ and SILAC provide new insights into
signaling networks. Metabolomics technologies such as NMR or gas chromatography–MS measure metabolites and fluxes through
metabolic networks. These technologies gave rise to the development of multiple repositories that store and curate the experimental
data along with previous literature annotation.
The ConsensusPathDB is a meta-database that currently consolidates human molecular interactions from 32 different databases,
mouse molecular interactions from 15 different databases and yeast molecular interactions from 14 different databases.
Interaction databases and interaction types (human)
Interaction types include the following:
Protein interactions (BIND, Biogrid, CORUM, DIP, DrugBank, HPRD, InnateDB, Intact, MINT, MIPS-MPPI, MatrixDB, NetPath,
PDB, PDZBase, PIG, PINdb, PhosphoPOINT, Reactome and Spike)
Signaling reactions (BioCarta, INOH, InnateDB, KEGG, NetPath, PID, PhosphoPOINT, PhosphoSitePlus, Reactome, Spike and
Wikipathways)
Metabolic reactions (EHMN, HumanCyc, INOH, KEGG, Reactome and Wikipathways)
Gene regulatory interactions (BIND, BioCarta, InnateDB, PID and Spike)
• Genetic interactions (Biogrid)
• Drug–target interactions (Chembl, DrugBank, KEGG, PharmGKB, and TTD)
Biochemical pathways (BioCarta, EHMN, HumanCyc, INOH, KEGG, NetPath, PID, PharmGKB, Reactome, SMDPB, Signalink and
Wikipathways)
Interaction databases and interaction types (mouse)
Interaction types include the following:
• Protein interactions (BIND, Biogrid, DIP, InnateDB, Intact, MINT, MIPS-MPPI, MatrixDB, PDB, PDZBase and Reactome)
• Signaling reactions (InnateDB, KEGG, PhosphoSitePlus, Reactome and Wikipathways)
• Metabolic reactions (KEGG, MouseCyc, Reactome and Wikipathways)
• Gene regulatory interactions (BIND and InnateDB)
• Genetic interactions (Biogrid)
• Drug–target interactions (KEGG)
• Biochemical pathways (KEGG, MouseCyc, Reactome and Wikipathways)
Interaction databases and interaction types (yeast)
Interaction types include the following:
• Protein interactions (BIND, Biogrid, CYC2008, DIP, Intact, MINT, MIPS-MPACT, PDB, PINdb, PTM and Reactome)
• Signaling reactions (KEGG, Reactome and Wikipathways)
• Metabolic reactions (KEGG, PTM, Reactome, Wikipathways and YeastCyc)
• Gene regulatory interactions (BIND and PTM)
• Genetic interactions (Biogrid)
• Drug–target interactions (KEGG)
• Biochemical pathways (KEGG, Reactome, Wikipathways and YeastCyc)

© 2016 Nature America, Inc. All rights reserved.
PROTOCOL
1892
|
VOL.11 NO.10
|
2016
|
NATURE PROTOCOLS
originating from the 32 integrated sources of interaction and
pathway information. Furthermore, they are used to map
identifiers from the user input to these physical entities, and
hence they allow great flexibility with respect to what identifier
namespace is chosen by the user.
Annotation sets. ConsensusPathDB offers four types of prede-
fined annotation sets: neighborhood-based entity sets (NESTs),
protein complexes, pathways and GO terms.
NESTs. These sets are derived from the integrated interaction
network, which includes four types of biological interactions:
protein–protein, biochemical, gene regulatory and genetic
interactions. A NEST is defined as a central protein and its
network neighbors. The size of the network neighborhood is
determined by its radius. The user can choose between a radius
equal to one and a radius equal to two. A radius equal to one
adds only the direct neighbors to the center protein, whereas
a radius equal to two adds, in addition, all direct neighbors
of the direct neighbors. We recommend using a radius equal
to one; otherwise, the neighborhoods grow too large and
lose specificity. There are as many NESTs as proteins in the
integrated network.
Protein complexes. These sets are derived from specific
databases that hold information on protein complexes.
Note that most annotated protein complex sets are rather
small ( < 5 members).
Pathways. These sets comprise metabolic, signaling and gene
regulatory pathways annotated by 12 source databases for
human (4 each for mouse and yeast). Pathways range from
very large biological processes—covering, for example, the
complete metabolism and having > 1,000 members—to
very specific concepts that describe detailed processes.
GO terms. ConsensusPathDB offers four levels of GO
categories ranging from very general terms (level = 2)
with > 1,000 members to more specific terms (level = 5).
In the analysis, the user can restrict the categories to specific
level(s) or to the specific GO tree branches covering ‘biologi-
cal process, molecular function and cellular compartment.
Pathway annotation—specificity and redundancy. The
pathway concept is essential for modern biology, and it usu-
ally describes a certain cellular process, for example ‘apoptosis’,
in which the involved proteins or metabolites exert specific
functions and are interconnected by molecular interactions
of diverse types. ConsensusPathDB agglomerates 4,593
human pathway concepts (mouse: 2,173 and yeast: 1,101)
originating from 12 different resources (Table 1). On one hand,
these pathway concepts are partially redundant because they
describe subpathways of a given pathway that are annotated by
the same database. For example, the pathway ‘apoptosis’ might
cover the subpathways ‘extrinsic apoptosis’ and intrinsic apop-
tosis’ among others, with corresponding subsets of proteins.
On the other hand, most generic pathways are annotated
by several databases, leading to more than one annotation
set referring to ‘apoptosis’.
It is worth noting that pathway concepts from different
resources might in fact involve different sets of molecules even
when describing similar molecular processes. As a consequence,
TABLE 2
|
Tool comparison.
Tool Access
Analysis functions Content types
Free
Upload of user-defined
background
Gene-based
ORA GSEA
Network module
analysis
Metabolite-
based ORA MSEA
Network
neighbors
Protein
complexes
Pathway
resources GOASs Other
ConsensusPathDB
X X X X X X X X X X X
DAVID
X X X X X X X
EnrichR
X X X X X X X X
Cytoscape
X X X X X X X X X
IPA
X X X X X X X X
GSEA/MSIGDB
X X X X X X X
Genes2Networks
X X X
MetaboAnalyst
X X X X X X
STRING
X X X X X X X
PathwayCommons
X X X X X
GOAS, gene ontology annotation set; GSEA, gene set enrichment analysis; MSEA, metabolite set enrichment analysis; ORA, over-representation analysis; other, annotation sets based on literature, experimental data, chromosomal location, disease associations,
protein domains and so on.

© 2016 Nature America, Inc. All rights reserved.
PROTOCOL
NATURE PROTOCOLS
|
VOL.11 NO.10
|
2016
|
1893
this can lead to differences in functional enrichment analyses
(analysis paths 2 and 3), because the different annotation sets
might have deviating overlaps with the gene list submitted by
the user. For example, comparison of gene sets for the ‘apoptosis’
pathway in the three widely used databases KEGG
35
, Reactome
36
and WikiPathways
37
reveals that 79% of the annotated proteins
are specific to a single database, as compared with the number
of proteins that are shared by at least two of the three databases
(Fig. 2a). The reason for this is that pathway boundaries are not
clearly defined and that expert opinion on the extent of cross
talk with other pathways is highly variable. In addition, pathway
annotations are commonly focused on specific substructures or
specific cellular contexts (e.g. tissues, diseases and organisms),
which might result in variations of the assembled gene lists.
Consequently, in ConsensusPathDB, such overlapping path-
way concepts are not merged to generalized pathways; instead,
the redundancy is kept and the annotated pathway set is always
disclosed together with its source database.
Interaction retrieval for single biomolecules. ConsensusPathDB
holds 158,523 unique physical entities (mouse: 31,679 and yeast:
17,672), and it offers the possibility of retrieving interaction
information for these entities. The concept of an interaction in
ConsensusPathDB is very general, so that proteins can have con-
nections not only to other proteins but also to drugs, complexes
or metabolites (Box 1). By selecting specific interactions, the user
can generate fairly complex interaction networks.
The source database for each interaction is tracked by a color
code, providing the user with the information on where the inter-
action originates. This allows for easy visualization of possible
redundancy between databases, which might serve as an indicator
for assessing confidence of the particular interaction. Figure 2b
shows the distribution of the different interaction types and their
origins. Most interactions are present for protein–protein and
drug–target interaction types and are predominantly specific
for a single or low number of databases.
Another level of confidence assessment is available for
binary PPIs. Because a lot of PPI resources are integrated in
ConsensusPathDB, control of false-positive interaction is of
utmost importance. Therefore, binary PPIs have a quality score
(range [0,1]) that is displayed with a color code. This score was
computed as a meta-score integrating different methods for inter-
action confidence assessment, including graph-based topological
criteria
38–40
, literature evidence and pathway co-occurrence
41
, and
semantic similarity
42
using our IntScore
43
web tool (Box 2).
This section starts by defining the biomolecule of interest.
Next, all interactions of that molecule are shown, which can be
selected and visualized by the user based on prioritization or
quality assessment. After generating the graph, the user can
expand it at any given node and update the graph accordingly
with further interactions.
Analysis path 1
Single
molecule
Input
Analysis
Contents
Output
Analysis path 2
Omics data
Analysis path 3
Molecules
with data
Identifier mapping
Enrichment
Identifier mapping
Network analysis
Over-representation
Identifier mapping
Interaction retrieval
Integrated interaction network:
Protein–protein interactions
Gene regulatory interactions
Drug–target interactions
Biochemical reactions
Genetic interactions
Interaction neighborhoods (NESTs)
Protein complexes
Pathways
Gene ontology terms (GOs)
Annotation sets:
Interaction
neighborhood
Induced
network
module
Functional
enrichment
results
Preselected
molecule list
Figure 1
|
Outline of the protocol. Three paths of analysis are described
in the protocol that depend on the user’s input. The content of the
ConsensusPathDB (i.e., the integrated interaction graph and the predefined
annotation sets) can be explored with single molecules (analysis path 1),
with priority lists of molecules (genes, proteins and metabolites; analysis
path 2) or with associated experimental data (analysis path 3). The Web
server functionality includes over-representation analysis, enrichment
analysis and network module analysis. The outputs are the generated tables
and graphs that can be downloaded for further inspection.
Wikipathways Reactome
31
(10.5%)
2
(0.7%)
123
(41.6%)
19
(6.4%)
32
(10.8%)
8
(2.7%)
81
(27.4%)
KEGG
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
2,500
2,000
1,500
1,000
500
0
6 7 8 9 10 11 12 13 14
Interaction types
Biochemical
Gene regulatory
Protein–protein
Drug–target
1
2 3 4
5
6 7
8 9
10 11
12 13
14
Source databases per interaction
Interactions
a
b
Figure 2
|
Pathways and interactions in ConsensusPathDB. (a) Annotation
specificity. Venn diagram generated with Venny 2.1 showing the proteins
annotated for the apoptosis signaling pathway in three different databases
(Wikipathways, WP254; Reactome, R-HSA-109581; and KEGG, hsa:04210).
In total, 296 different proteins are annotated for apoptosis signaling, 84
in Wikipathways, 165 in Reactome (proteins with gene symbols) and 140
in KEGG. 61 of these proteins are common to all or at least two databases
(20.6%), whereas the vast majority of proteins (235; 79.4%) are specific for
a single database. (b) Histogram of the number of contributing databases
per interaction (genetic interactions have been omitted in this figure, as
their total number, n = 443, is comparatively too small to be visible).

Figures
Citations
More filters
Journal ArticleDOI

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

TL;DR: The latest version of STRING more than doubles the number of organisms it covers, and offers an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input.
Journal ArticleDOI

Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.

TL;DR: A biologist-oriented portal that provides a gene list annotation, enrichment and interactome resource and enables integrated analysis of multi-OMICs datasets, Metascape is an effective and efficient tool for experimental biologists to comprehensively analyze and interpret OMICs-based studies in the big data era.
Journal Article

Interactome networks and human disease

TL;DR: Why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease are detailed.
Journal ArticleDOI

The BioGRID interaction database: 2019 update

TL;DR: A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene–phenotype and gene–gene relationships, and captures chemical interaction data, including chemical–protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature.
Journal ArticleDOI

Temporal Control of Mammalian Cortical Neurogenesis by m6A Methylation

TL;DR: It is shown that m6A depletion by Mettl14 knockout in embryonic mouse brains prolongs the cell cycle of radial glia cells and extends cortical neurogenesis into postnatal stages and uncovers previously unappreciated transcriptional prepatterning in cortical neural stem cells.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

Hallmarks of cancer: the next generation.

TL;DR: Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.
Journal ArticleDOI

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions in this paper?

In ConsensusPathDB7, the authors have implemented statistical methods for performing the above tasks by interrogating annotation sets based on molecular interaction information. Furthermore, the database integrates the contents of 15 mouse and 14 yeast interaction repositories. Furthermore, ConsensusPathDB is used as a database by other tools—for example, for enrichment analysis by Chipster18 using web service connections or by Cytoscape19 using a Java plugin for assessing interaction confidence of PPIs20. 

For highly connected biomolecules such as EGFR, it may take 10–30 s until the full list of interactions is loaded, depending on the Internet connection. 

information on molecular interactions is scattered across > 500 different databases worldwide11, which necessitates the integration of as many resources as possible into meta-databases such as ConsensusPathDB (Box 1). 

For some of its functionality, ConsensusPathDB already offers web services in order to allow the integration of analysis steps into automated workflows and stand-alone tools. 

The time required to execute the above protocol is strongly dependent on the size of the analyzed data set, the load generated from the number of users on the local servers and in general on the network traffic. 

ConsensusPathDB allows the generation of a network that connects as many members of an inputgene list (seed genes) as possible with intermediate nodes using the induced network graph algorithm. 

In addition to these analyses, the tool can be used as a resource for the generation of molecular interaction gene sets, which themselves can be used as predictive signatures. 

The interaction neighborhood of a single molecule can be inferred and a corresponding network can be generated; this can be done, for example, to reveal network-level information (i.e., interaction partners) for biomarkers of interest. 

All three approaches aim at enriching genome analysis with mechanistic network information, which enables an understanding of the underlying biological processes. 

These sets comprise metabolic, signaling and gene regulatory pathways annotated by 12 source databases for human (4 each for mouse and yeast). 

In ConsensusPathDB, all binary PPIs have an aggregated confidence score, range [0,1], that was computed as a consensus score across the six methods described above. 

As in the case ofgenes/proteins, submit a list of metabolites of interest and perform over-representation with predefined functional sets such as pathways and GO associations. 

To judge the significance of the intermediate node, a Z score value is computed using a binomial proportions test as follows:Za c b db db dd= − − 1Here, a equals the number of links from the intermediate node being examined to nodes from the input seed list, b equals the number of total links for the intermediate node in the consolidated background reference network, c is the number of total links in the output subnetwork and d is the number of total links in the consolidated background reference network. 

To map these names to database entries, it is usually necessary to convert them into the corresponding identifiers; otherwise, the system cannot match the entry. 

A WSDL file needed for connecting to the SOAP/WSDL interface is provided in the ‘download / data access’ section on the web page. 

The authors are grateful to all scientists who provided annotation of the original molecular interaction data and are allowing automated access to their databases.