How long does it take to load the full list of interactions?

For highly connected biomolecules such as EGFR, it may take 10–30 s until the full list of interactions is loaded, depending on the Internet connection.

How many databases are used to integrate the interactome?

information on molecular interactions is scattered across > 500 different databases worldwide11, which necessitates the integration of as many resources as possible into meta-databases such as ConsensusPathDB (Box 1).

How long does it take to execute the protocol?

The time required to execute the above protocol is strongly dependent on the size of the analyzed data set, the load generated from the number of users on the local servers and in general on the network traffic.

What is the induced network graph algorithm?

ConsensusPathDB allows the generation of a network that connects as many members of an inputgene list (seed genes) as possible with intermediate nodes using the induced network graph algorithm.

What is the consensus score for a binary PPI?

In ConsensusPathDB, all binary PPIs have an aggregated confidence score, range [0,1], that was computed as a consensus score across the six methods described above.

What is the way to perform a metabolite set analysis?

As in the case ofgenes/proteins, submit a list of metabolites of interest and perform over-representation with predefined functional sets such as pathways and GO associations.

What is the Z score for a network?

To judge the significance of the intermediate node, a Z score value is computed using a binomial proportions test as follows:Za c b db db dd= − − 1Here, a equals the number of links from the intermediate node being examined to nodes from the input seed list, b equals the number of total links for the intermediate node in the consolidated background reference network, c is the number of total links in the output subnetwork and d is the number of total links in the consolidated background reference network.

What is the way to map a metabolite to a database entry?

To map these names to database entries, it is usually necessary to convert them into the corresponding identifiers; otherwise, the system cannot match the entry.

What is the WSDL file needed for connecting to the web page?

A WSDL file needed for connecting to the SOAP/WSDL interface is provided in the ‘download / data access’ section on the web page.

Who is grateful to the authors for their contributions?

The authors are grateful to all scientists who provided annotation of the original molecular interaction data and are allowing automated access to their databases.

(Open Access) Analyzing and interpreting genome data at the network level with ConsensusPathDB. (2016) | Ralf Herwig

Q: What are the contributions in this paper?

In ConsensusPathDB7, the authors have implemented statistical methods for performing the above tasks by interrogating annotation sets based on molecular interaction information. Furthermore, the database integrates the contents of 15 mouse and 14 yeast interaction repositories. Furthermore, ConsensusPathDB is used as a database by other tools—for example, for enrichment analysis by Chipster18 using web service connections or by Cytoscape19 using a Java plugin for assessing interaction confidence of PPIs20.

Q: What can be done to reveal network-level information for biomarkers of interest?

The interaction neighborhood of a single molecule can be inferred and a corresponding network can be generated; this can be done, for example, to reveal network-level information (i.e., interaction partners) for biomarkers of interest.

Q: What are the common pathways annotated by ConsensusPathDB?

These sets comprise metabolic, signaling and gene regulatory pathways annotated by 12 source databases for human (4 each for mouse and yeast).

PROTOCOL

NATURE PROTOCOLS

VOL.11 NO.10

2016

1889

INTRODUCTION

Modern high-throughput experiments such as sequencing,

microarray technology or mass spectrometry (MS) experiments

generate large genome-wide data sets that provide deep insight

into many different levels of molecular information—e.g.,

the transcriptome, proteome and metabolome, among others.

Such information is used, for example, to characterize patient

genomes using multiomics data

, to describe developmental

processes with temporal changes

or to derive predictive pat-

terns for exogenous agents

. An emerging goal of data analysis

is to reveal the underlying control mechanisms that govern the

measured molecular phenotypes.

Typically, a key result of genome analysis is a list of statisti-

cally significant biomolecules (genes, proteins, metabolites) that

contribute to the phenotypes of interest. A subsequent task then

is to identify which biological functions can be associated with

these molecules (over-representation analysis)

. This is done

mainly by exploring whether predefined annotation sets—for

example, specific signaling pathways—are enriched by the

molecules under consideration. Independently, such enrich-

ments can be inferred without statistical preselection of the

molecules using the entirety of the experimental data ((gene set)

enrichment analysis)

. Furthermore, data for all or a prioritized

subset of molecules can be mapped onto interaction networks

and analyzed with graph theoretic approaches. These methods

identify subnetworks (network module analysis) that are likely

to be responsive to the experiments under analysis

. All three

approaches aim at enriching genome analysis with mechanistic

network information, which enables an understanding of the

underlying biological processes.

In ConsensusPathDB

, we have implemented statistical methods

for performing the above tasks by interrogating annotation sets

based on molecular interaction information. We agglomerated

the contents of 32 major public repositories for human molecu-

lar interactions of heterogeneous types, as well as biochemical

pathways, resulting in one of the largest interactome collections

available (Table 1). Furthermore, the database integrates the

contents of 15 mouse and 14 yeast interaction repositories. In

addition to gene ontology

(GO) and pathway annotations,

ConsensusPathDB systematically explores the protein–protein

interaction (PPI) network, as PPIs are key drivers of biologi-

cal function

. However, only a minor fraction of the estimated

~650,000 human protein interactions have yet been experi-

mentally measured

. Moreover, information on molecular

interactions is scattered across > 500 different databases world-

wide

, which necessitates the integration of as many resources

as possible into meta-databases such as ConsensusPathDB

(Box 1). Such interaction integration allows for better coverage

of the interactome, which improves guidance in the functional

interpretation of omics data.

ConsensusPathDB has been well adopted by the research

community. Applications comprise over-representation analy-

sis in order to characterize diverse sets of molecules

12–14

, gene

set enrichment analysis

15,16

and identification of upstream

regulators

spanning various biological contexts. Furthermore,

ConsensusPathDB is used as a database by other tools—for

example, for enrichment analysis by Chipster

using web service

connections or by Cytoscape

using a Java plugin for assessing

interaction confidence of PPIs

. In addition to these analyses,

the tool can be used as a resource for the generation of molecu-

lar interaction gene sets, which themselves can be used as pre-

dictive signatures. For example, it has been shown that network

modules and pathways can be derived as predictive patterns in

cancer diagnostics

, as well as in tumor progression monitor-

ing

. This enables biomarker analysis of entities ranging from

single molecules to entire pathways.

Overview of the protocol

In this protocol, we review the contents and the different

analysis scenarios enabled by ConsensusPathDB. All mod-

ules in this protocol aim to enable network-level interpreta-

tion and functional characterization of user-specified lists of

molecules (genes, proteins and metabolites) and associated

Analyzing and interpreting genome data at the

network level with ConsensusPathDB

Ralf Herwig

, Christopher Hardt

, Matthias Lienhard

& Atanas Kamburov

2–4

Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.

Department of Pathology and Cancer Center,

Massachusetts General Hospital, Boston, Massachusetts, USA.

Harvard Medical School, Boston, Massachusetts, USA.

Broad Institute of MIT and Harvard, Cambridge,

Massachusetts, USA. Correspondence should be addressed to R.H. (herwig@molgen.mpg.de) or A.K. (kamburov@broadinstitute.org).

Published online 8 September 2016; doi:10.1038/nprot.2016.117

ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data

integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization

tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to

the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to

the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction

network modules, biochemical pathways and functional information that are significantly enriched by the user’s input, applying

computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be

observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-

throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up

functional assay experiments and to generate topology for kinetic models at different scales.

PROTOCOL

1890

VOL.11 NO.10

2016

NATURE PROTOCOLS

high-throughput data. ConsensusPathDB helps users working

with such data to do the following:

Infer heterogeneous interaction networks for genes, proteins,

metabolites, drugs and other biomolecules

Compute over-represented pathways, PPI networks, protein

complexes and GO annotations from a priority list of genes,

proteins or metabolites

Compute enriched pathways, PPI networks, protein complexes

and GO annotations from genome-wide data such as RNA-seq

or array technology

Generate network modules that are over-represented by genes

or proteins and thereby explore heterogeneous interactions such

as PPI, drug–target, gene regulatory and genetic interactions.

Comparison with other tools

Several excellent tools, of which only some can be mentioned

here, are available that perform either over-representation anal-

ysis (e.g., DAVID

, IPA

and Enrichr

), gene or metabolite set

enrichment analysis (e.g., GSEA

and MetaboAnalyst

) or net-

work module analysis (e.g., Cytoscape

and Genes2Networks

Although most of these tools are restricted to specific types of

analysis and to a specific type of biomolecule, ConsensusPathDB

offers a wider range of analysis functions and the option for

gene/protein and metabolite analysis (Table 2). The statistical

methods for over-representation analysis, enrichment analysis

and network module analysis implemented by the individual

tools differ, and thus results achieved with the same input can

be fairly different. With respect to content, ConsensusPathDB

has a focus on molecular interactions, and it provides deep

exploration of the interactome network, protein complexes

and pathway resources, whereas other tools incorporate addi-

tional annotation sets, for example, based on genomic locus

enrichment, disease associations, experimental signatures or

literature-derived sets. Huge collections of such annotation

signatures are accessible through systems such as MSigDB

Parallel attempts for sampling huge amounts of interac-

tion data within a common framework are STRING

and

PathwayCommons

•

Limitations of the protocol

ConsensusPathDB currently supports only three organisms

(human, mouse and yeast), and it is thereby missing widely used

model organisms such as rat, fly and worm, among others, for

which comprehensive interaction information has been collected

and made available in the past. Moreover, ConsensusPathDB

does not hold information on microorganisms—e.g., bacteria or

fungi. In cases in which interaction information is available, the

inclusion of more organisms is a key step in the future develop-

ment of ConsensusPathDB.

Another limitation is the focus on annotation sets that are derived

from molecular interactions and GO terminology. As stated in the

previous section (Table 2), several tools incorporate additional infor-

mation that allows for interpretation of data in alternative directions.

However, a review of the literature shows that, by far, most applications

use functional annotation sets defined by GO and pathway annota-

tions, thus justifying the current focus of ConsensusPathDB.

With regard to the web server, ConsensusPathDB has some limi-

tations with respect to visualization components. Presumably the

most widely used tool available in this regard is Cytoscape, and

thus we offer network downloads in a Cytoscape-compatible for-

mat, which enables easy transfer of computed network modules.

For some of its functionality, ConsensusPathDB already offers

web services in order to allow the integration of analysis steps

into automated workflows and stand-alone tools. However, not

all steps described in this protocol are yet implemented as web

services; their further development is a primary future task.

The response time of ConsensusPathDB depends on the size

and complexity of the interaction network under investigation.

For example, performing network analyses with many input nodes

or many different types of interactions can lead to slow response

and limited visualization performance.

Experimental design

Analysis paths. ConsensusPathDB contains predefined annota-

tion sets that hold functional information such as pathways, GO

categories, protein complexes and PPI network neighborhoods

that were derived from the integrated resources.

Depending on the user’s input, ConsensusPathDB allows the

following analyses (Fig. 1):

Analysis path 1. The interaction neighborhood of a single mol-

ecule can be inferred and a corresponding network can be gener-

ated; this can be done, for example, to reveal network-level in-

formation (i.e., interaction partners) for biomarkers of interest.

Analysis path 2. Uploading a list of molecules (genes, proteins

and metabolites) allows either performing over-representation

analysis with predeﬁned annotation sets or computing network

associations between the molecules of interest through mining

of the integrated interaction network.

Analysis path 3. Inserting molecules and associated experi-

mental data allows computing enrichment analysis of the

annotation sets; this path uses a more unbiased analysis com-

pared with analysis path 2, because it is not dependent on a

predeﬁned priority list of molecules.

To exemplify the procedures in this protocol, we use differ-

ent data sets from various biological backgrounds, and measure

using different high-throughput technologies. For analysis path 1,

•

TABLE 1

Content of ConsensusPathDB.

Content type Human Mouse Yeast

Integrated databases 32 15 14

Unique physical entities 158,523 31,679 17,672

Unique interactions 458,570 34,064 272,094

Gene regulations 17,098 2,196 316

Protein interactions 261,085 23,488 123,842

Genetic interactions 443 194 145,151

Biochemical reactions 21,070 8,186 2,785

Drug–target interactions 158,874 0 0

Pathway gene sets 4,593 2,173 1,101

PROTOCOL

NATURE PROTOCOLS

VOL.11 NO.10

2016

1891

we exemplify the protocol steps using the epidermal growth factor

receptor (EGFR) gene that is a widely mutated gene in different

types of cancer and also a primary target of cancer therapy

For analysis path 2, we exemplify the over-representation analy-

sis for genes using a list of 18 frequently mutated genes derived

from whole-exome sequencing of a large lung adenocarcinoma

cohort

(Supplementary Data 1). As a test case for over-repre-

sentation analysis of metabolites, we use a list of 130 known ure-

mic toxins that are associated with dysfunction of the kidney

(Supplementary Data 2). To demonstrate the network mod-

ule analysis, we examined 691 targets of histone modification

(H3K4me2) measured with ChIP-seq that are specific to T helper

type 2 (T

2) cells, as compared with naive T cells (Supplementary

Data 3). The goal of the analysis is to recover potential gene regu-

latory networks controlling these genes, as was done in the original

publication

. For analysis path 3, we use public expression data

derived from different stages of human embryonic development

that were generated with RNA-seq

(Supplementary Data 4).

These data cover a wide range of genome analysis applications

and diverse biological backgrounds. The corresponding gene lists

vary in size from 18 (lung adenocarcinoma driver mutations) to

~16,000 (RNA-seq data set), demonstrating the scalability of the

ConsensusPathDB analysis tools.

Identifier mapping. A recurrent problem when integrating data

from different resources, or when analyzing high-throughput

data by comparison with existing databases, is the nonuniform-

ity of gene/protein/metabolite identifiers. In ConsensusPathDB,

we have created comprehensive identifier maps by parsing

the contents of 11 major genomic, proteomic and metabolite

databases such as Ensembl, Uniprot and PubChem. These maps

were used to match gene, protein and metabolite identifiers

Box 1

Molecular interactions

Molecular interactions are key drivers of cellular function. In the times of omics technology, an ever-increasing number of molecular

interactions are measured and cataloged. For example, huge amounts of PPIs have been measured by co-immunoprecipitation,

tandem-afﬁnity puriﬁcation and yeast two-hybrid analysis, among others. ChIP-seq experiments allow the charting of protein–DNA

interactions and histone modiﬁcations. Phosphoproteome measurements with MS such as ITRAQ and SILAC provide new insights into

signaling networks. Metabolomics technologies such as NMR or gas chromatography–MS measure metabolites and ﬂuxes through

metabolic networks. These technologies gave rise to the development of multiple repositories that store and curate the experimental

data along with previous literature annotation.

The ConsensusPathDB is a meta-database that currently consolidates human molecular interactions from 32 different databases,

mouse molecular interactions from 15 different databases and yeast molecular interactions from 14 different databases.

Interaction databases and interaction types (human)

Interaction types include the following:

• Protein interactions (BIND, Biogrid, CORUM, DIP, DrugBank, HPRD, InnateDB, Intact, MINT, MIPS-MPPI, MatrixDB, NetPath,

PDB, PDZBase, PIG, PINdb, PhosphoPOINT, Reactome and Spike)

• Signaling reactions (BioCarta, INOH, InnateDB, KEGG, NetPath, PID, PhosphoPOINT, PhosphoSitePlus, Reactome, Spike and

Wikipathways)

• Metabolic reactions (EHMN, HumanCyc, INOH, KEGG, Reactome and Wikipathways)

• Gene regulatory interactions (BIND, BioCarta, InnateDB, PID and Spike)

• Genetic interactions (Biogrid)

• Drug–target interactions (Chembl, DrugBank, KEGG, PharmGKB, and TTD)

• Biochemical pathways (BioCarta, EHMN, HumanCyc, INOH, KEGG, NetPath, PID, PharmGKB, Reactome, SMDPB, Signalink and

Wikipathways)

Interaction databases and interaction types (mouse)

Interaction types include the following:

• Protein interactions (BIND, Biogrid, DIP, InnateDB, Intact, MINT, MIPS-MPPI, MatrixDB, PDB, PDZBase and Reactome)

• Signaling reactions (InnateDB, KEGG, PhosphoSitePlus, Reactome and Wikipathways)

• Metabolic reactions (KEGG, MouseCyc, Reactome and Wikipathways)

• Gene regulatory interactions (BIND and InnateDB)

• Genetic interactions (Biogrid)

• Drug–target interactions (KEGG)

• Biochemical pathways (KEGG, MouseCyc, Reactome and Wikipathways)

Interaction databases and interaction types (yeast)

Interaction types include the following:

• Protein interactions (BIND, Biogrid, CYC2008, DIP, Intact, MINT, MIPS-MPACT, PDB, PINdb, PTM and Reactome)

• Signaling reactions (KEGG, Reactome and Wikipathways)

• Metabolic reactions (KEGG, PTM, Reactome, Wikipathways and YeastCyc)

• Gene regulatory interactions (BIND and PTM)

• Genetic interactions (Biogrid)

• Drug–target interactions (KEGG)

• Biochemical pathways (KEGG, Reactome, Wikipathways and YeastCyc)

PROTOCOL

1892

VOL.11 NO.10

2016

NATURE PROTOCOLS

originating from the 32 integrated sources of interaction and

pathway information. Furthermore, they are used to map

identifiers from the user input to these physical entities, and

hence they allow great flexibility with respect to what identifier

namespace is chosen by the user.

Annotation sets. ConsensusPathDB offers four types of prede-

fined annotation sets: neighborhood-based entity sets (NESTs),

protein complexes, pathways and GO terms.

• NESTs. These sets are derived from the integrated interaction

network, which includes four types of biological interactions:

protein–protein, biochemical, gene regulatory and genetic

interactions. A NEST is deﬁned as a central protein and its

network neighbors. The size of the network neighborhood is

determined by its radius. The user can choose between a radius

equal to one and a radius equal to two. A radius equal to one

adds only the direct neighbors to the center protein, whereas

a radius equal to two adds, in addition, all direct neighbors

of the direct neighbors. We recommend using a radius equal

to one; otherwise, the neighborhoods grow too large and

lose speciﬁcity. There are as many NESTs as proteins in the

integrated network.

• Protein complexes. These sets are derived from speciﬁc

databases that hold information on protein complexes.

Note that most annotated protein complex sets are rather

small ( < 5 members).

• Pathways. These sets comprise metabolic, signaling and gene

regulatory pathways annotated by 12 source databases for

human (4 each for mouse and yeast). Pathways range from

very large biological processes—covering, for example, the

complete metabolism and having > 1,000 members—to

very speciﬁc concepts that describe detailed processes.

• GO terms. ConsensusPathDB offers four levels of GO

categories ranging from very general terms (level = 2)

with > 1,000 members to more speciﬁc terms (level = 5).

In the analysis, the user can restrict the categories to speciﬁc

level(s) or to the speciﬁc GO tree branches covering ‘biologi-

cal process’, ‘molecular function’ and ‘cellular compartment’.

Pathway annotation—specificity and redundancy. The

pathway concept is essential for modern biology, and it usu-

ally describes a certain cellular process, for example ‘apoptosis’,

in which the involved proteins or metabolites exert specific

functions and are interconnected by molecular interactions

of diverse types. ConsensusPathDB agglomerates 4,593

human pathway concepts (mouse: 2,173 and yeast: 1,101)

originating from 12 different resources (Table 1). On one hand,

these pathway concepts are partially redundant because they

describe subpathways of a given pathway that are annotated by

the same database. For example, the pathway ‘apoptosis’ might

cover the subpathways ‘extrinsic apoptosis’ and ‘intrinsic apop-

tosis’ among others, with corresponding subsets of proteins.

On the other hand, most generic pathways are annotated

by several databases, leading to more than one annotation

set referring to ‘apoptosis’.

It is worth noting that pathway concepts from different

resources might in fact involve different sets of molecules even

when describing similar molecular processes. As a consequence,

TABLE 2

Tool comparison.

Tool Access

Analysis functions Content types

Free

Upload of user-deﬁned

background

Gene-based

ORA GSEA

Network module

analysis

Metabolite-

based ORA MSEA

Network

neighbors

Protein

complexes

Pathway

resources GOASs Other

ConsensusPathDB

X X X X X X X X X X X

DAVID

X X X X X X X

EnrichR

X X X X X X X X

Cytoscape

X X X X X X X X X

IPA

X X X X X X X X

GSEA/MSIGDB

X X X X X X X

Genes2Networks

X X X

MetaboAnalyst

X X X X X X

STRING

X X X X X X X

PathwayCommons

X X X X X

GOAS, gene ontology annotation set; GSEA, gene set enrichment analysis; MSEA, metabolite set enrichment analysis; ORA, over-representation analysis; other, annotation sets based on literature, experimental data, chromosomal location, disease associations,

protein domains and so on.

PROTOCOL

NATURE PROTOCOLS

VOL.11 NO.10

2016

1893

this can lead to differences in functional enrichment analyses

(analysis paths 2 and 3), because the different annotation sets

might have deviating overlaps with the gene list submitted by

the user. For example, comparison of gene sets for the ‘apoptosis’

pathway in the three widely used databases KEGG

, Reactome

and WikiPathways

reveals that 79% of the annotated proteins

are specific to a single database, as compared with the number

of proteins that are shared by at least two of the three databases

(Fig. 2a). The reason for this is that pathway boundaries are not

clearly defined and that expert opinion on the extent of cross

talk with other pathways is highly variable. In addition, pathway

annotations are commonly focused on specific substructures or

specific cellular contexts (e.g. tissues, diseases and organisms),

which might result in variations of the assembled gene lists.

Consequently, in ConsensusPathDB, such overlapping path-

way concepts are not merged to generalized pathways; instead,

the redundancy is kept and the annotated pathway set is always

disclosed together with its source database.

Interaction retrieval for single biomolecules. ConsensusPathDB

holds 158,523 unique physical entities (mouse: 31,679 and yeast:

17,672), and it offers the possibility of retrieving interaction

information for these entities. The concept of an interaction in

ConsensusPathDB is very general, so that proteins can have con-

nections not only to other proteins but also to drugs, complexes

or metabolites (Box 1). By selecting specific interactions, the user

can generate fairly complex interaction networks.

The source database for each interaction is tracked by a color

code, providing the user with the information on where the inter-

action originates. This allows for easy visualization of possible

redundancy between databases, which might serve as an indicator

for assessing confidence of the particular interaction. Figure 2b

shows the distribution of the different interaction types and their

origins. Most interactions are present for protein–protein and

drug–target interaction types and are predominantly specific

for a single or low number of databases.

Another level of confidence assessment is available for

binary PPIs. Because a lot of PPI resources are integrated in

ConsensusPathDB, control of false-positive interaction is of

utmost importance. Therefore, binary PPIs have a quality score

(range [0,1]) that is displayed with a color code. This score was

computed as a meta-score integrating different methods for inter-

action confidence assessment, including graph-based topological

criteria

38–40

, literature evidence and pathway co-occurrence

, and

semantic similarity

using our IntScore

web tool (Box 2).

This section starts by defining the biomolecule of interest.

Next, all interactions of that molecule are shown, which can be

selected and visualized by the user based on prioritization or

quality assessment. After generating the graph, the user can

expand it at any given node and update the graph accordingly

with further interactions.

Analysis path 1

Single

molecule

Input

Analysis

Contents

Output

Analysis path 2

Omics data

Analysis path 3

Molecules

with data

Identifier mapping

Enrichment

Identifier mapping

Network analysis

Over-representation

Identifier mapping

Interaction retrieval

Integrated interaction network:

Protein–protein interactions

Gene regulatory interactions

Drug–target interactions

Biochemical reactions

Genetic interactions

Interaction neighborhoods (NESTs)

Protein complexes

Pathways

Gene ontology terms (GOs)

Annotation sets:

Interaction

neighborhood

Induced

network

module

Functional

enrichment

results

Preselected

molecule list

Figure 1

Outline of the protocol. Three paths of analysis are described

in the protocol that depend on the user’s input. The content of the

ConsensusPathDB (i.e., the integrated interaction graph and the predefined

annotation sets) can be explored with single molecules (analysis path 1),

with priority lists of molecules (genes, proteins and metabolites; analysis

path 2) or with associated experimental data (analysis path 3). The Web

server functionality includes over-representation analysis, enrichment

analysis and network module analysis. The outputs are the generated tables

and graphs that can be downloaded for further inspection.

Wikipathways Reactome

(10.5%)

(0.7%)

123

(41.6%)

(6.4%)

(10.8%)

(2.7%)

(27.4%)

KEGG

400,000

350,000

300,000

250,000

200,000

150,000

100,000

50,000

2,500

2,000

1,500

1,000

500

6 7 8 9 10 11 12 13 14

Interaction types

Biochemical

Gene regulatory

Protein–protein

Drug–target

2 3 4

6 7

8 9

10 11

12 13

Source databases per interaction

Interactions

Figure 2

Pathways and interactions in ConsensusPathDB. (a) Annotation

specificity. Venn diagram generated with Venny 2.1 showing the proteins

annotated for the apoptosis signaling pathway in three different databases

(Wikipathways, WP254; Reactome, R-HSA-109581; and KEGG, hsa:04210).

In total, 296 different proteins are annotated for apoptosis signaling, 84

in Wikipathways, 165 in Reactome (proteins with gene symbols) and 140

in KEGG. 61 of these proteins are common to all or at least two databases

(20.6%), whereas the vast majority of proteins (235; 79.4%) are specific for

a single database. (b) Histogram of the number of contributing databases

per interaction (genetic interactions have been omitted in this figure, as

their total number, n = 443, is comparatively too small to be visible).

Analyzing and interpreting genome data at the network level with ConsensusPathDB.

Figures

Citations

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.

Interactome networks and human disease

The BioGRID interaction database: 2019 update

Temporal Control of Mammalian Cortical Neurogenesis by m6A Methylation

References

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Hallmarks of cancer: the next generation.

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Hallmarks of Cancer: The Next Generation

Related Papers (5)

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

STAR: ultrafast universal RNA-seq aligner

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Frequently Asked Questions (16)

Q1. What are the contributions in this paper?

Q2. How long does it take to load the full list of interactions?

Q3. How many databases are used to integrate the interactome?

Q4. What is the main limitation of ConsensusPathDB?

Q5. How long does it take to execute the protocol?

Q6. What is the induced network graph algorithm?

Q7. What can be used as a resource for the generation of molecular interaction gene sets?

Q8. What can be done to reveal network-level information for biomarkers of interest?

Q9. What are the three approaches to genome analysis?

Q10. What are the common pathways annotated by ConsensusPathDB?

Q11. What is the consensus score for a binary PPI?

Q12. What is the way to perform a metabolite set analysis?

Q13. What is the Z score for a network?

Q14. What is the way to map a metabolite to a database entry?

Q15. What is the WSDL file needed for connecting to the web page?

Q16. Who is grateful to the authors for their contributions?