Posted Content•DOI•

Phenotype-Driven Transitions In Regulatory Network Structure

Q: What are the contributions in "Detecting phenotype-driven transitions in regulatory network structure" ?

Here, the authors describe ALPACA ( ALtered Partitions Across Community Architectures ), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. As an application, the authors use ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex networks, but also that these changes may be relevant for characterizing biological phenotypes.

Megha Padi¹, John Quackenbush¹•Institutions (1)

Harvard University¹

25 May 2017-bioRxiv (Cold Spring Harbor Labs Journals)-pp 142281

TL;DR: In comparing the modular structure of networks in female and male breast tissue, it is found that female breast has distinct modules enriched for genes involved in estrogen receptor and ERK signaling, indicating that not only does phenotypic change correlate with network structural changes, but also that ALPACA can identify such modules in complex networks.

read less

Abstract: Complex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead arise from multiple factors that together functionally perturb the underlying molecular network. Biological networks are known to be highly modular and contain dense “communities” of genes that carry out cellular processes, but these structures change between tissues, during development, and in disease. While many methods exist for inferring networks, we lack robust methods for quantifying changes in network structure. Here, we describe ALPACA (ALtered Partitions Across Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than currently available network comparison methods. We used ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. In each case, ALPACA identified modules enriched for processes relevant to the phenotype. For example, modules specific to angiogenic ovarian tumors were enriched for genes associated with blood vessel development, interferon signaling, and flavonoid biosynthesis. In comparing the modular structure of networks in female and male breast tissue, we found that female breast has distinct modules enriched for genes involved in estrogen receptor and ERK signaling. The functional relevance of these new modules indicate that not only does phenotypic change correlate with network structural changes, but also that ALPACA can identify such modules in complex networks.

...read moreread less

Summary (2 min read)

Jump to: [INTRODUCTION] – [RESULTS] – [DISCUSSION] – [METHODS] and [AUTHOR CONTRIBUTIONS]

INTRODUCTION

1,2 Despite the increasing power and depth of sequencing studies, identifying the causal mutations and singlenucleotide polymorphisms (SNPs) that are responsible for determining heritable traits and disease susceptibility remains challenging.
Biological networks are known to have modular structure and contain closely interacting groups of nodes, or “communities”, that work together to carry out cellular functions.
One way to address these issues and find more robust differences between networks is to identify changes in groups of nodes, rather than in individual edges.
17–20 However, these methods are limited to examining pre-defined gene modules and network features, and fail to take full advantage of the network structure.

RESULTS

The modularity represents to what extent the proposed communities have more edges within them than expected in a randomly connected graph with the same degree properties; this null expectation is represented in the second term of the equation above.
31 Community comparison and edge subtraction Having arrived at a pair of inferred networks corresponding to different phenotypic states, there are two straightforward ways to compare the community structures based on the modularity metric (Fig. 1).
The authors previously found that a gene signature associated with angiogenesis is able to classify ovarian cancer patients into a poorprognosis subtype.35.
The authors also computed the correlation in expression among the genes in each ALPACA module.
Finally, the authors ranked the genes by their contribution to the differential modularity and used Gene Set Enrichment Analysis (GSEA) to evaluate enrichment for GO terms across the whole network (see Materials and Methods).

DISCUSSION

Biological networks have complex modular and hierarchical topologies that allow organisms to carry out the functions necessary for survival.
ALPACA differs from other community Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16 detection methods in that it compares the structure of networks to each other rather than to a random background network and is thus better able to detect subtle differences in network modular structure.
The differential modularity also incorporates increased and decreased edge weights across the entire network into a single, simple framework for module detection.
This is because network-level analysis, and ALPACA in particular, helps organize both strongly and weakly differentially expressed genes into new modules that are under common regulatory control, identifying signaling pathways that could not have been distinguished if genes were ranked purely by differential expression.
Genes annotated by the shown GO terms are labeled in large font Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16 “edgetic” perturbations, in order to discover functional changes in protein complexes and signaling associated with disease.

METHODS

ALPACA algorithm ALPACA comprises the following two steps: Step 1: The input network consists of edges between regulators and target genes.
The authors first used either CONDOR or Louvain method to find the community structure of the baseline and perturbed networks, in each case keeping only edges that had positive z-scores.
The authors evaluated the results of each method on the simulated networks by comparing the ranks of true positives (the target genes in the added module) against a background consisting of target genes not in the added module.
For the baseline network, the edges between groups A and B were set to weight 0.8 and for the perturbed network, the edges between groups A and B were set to weight 0.2.
38,52 Differential expression analysis was carried out using the R package limma, and p-values were adjusted for multiple testing using the Benjamini–Hochberg method.53.

AUTHOR CONTRIBUTIONS

M.P. conceived of the project, performed analysis, and wrote the paper.
J.Q. helped refine the analysis and wrote the paper.
The authors declare no competing financial interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations, also known as Publisher's note.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

ARTICLE

OPEN

Detecting phenotype-driven transitions in regulatory network

structure

Megha Padi

and John Quackenbush

2,3

Complex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead

arise from functional changes in the underlying molecular network. Biological networks are known to be highly modular and

contain dense “communities” of genes that carry out cellular processes, but these structures change between tissues, during

development, and in disease. While many methods exist for inferring networks and analyzing their topologies separately, there is a

lack of robust methods for quantifying differences in network structure. Here, we describe ALPACA (ALtered Partitions Across

Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to

identify condition-speciﬁc modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than

currently available network comparison methods. As an application, we use ALPACA to compare transcriptional networks in three

contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human ﬁbroblasts expressing transforming viral oncogenes,

and sexual dimorphism in human breast tissue. In each case, ALPACA identiﬁes modules enriched for processes relevant to the

phenotype. For example, modules speciﬁc to angiogenic ovarian tumors are enriched for genes associated with blood vessel

development, and modules found in female breast tissue are enriched for genes involved in estrogen receptor and ERK signaling.

The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex

networks, but also that these changes may be relevant for characterizing biological phenotypes.

npj Systems Biology and Applications (2018)4:16; doi:10.1038/s41540-018-0052-5

INTRODUCTION

We tend to think of phenotypes as being characterized by

differentially expressed genes or mutations in particular genes.

However, the individual genes that show the greatest changes in

expression in a phenotype do not tend to be drivers of that

phenotype.

1,2

Despite the increasing power and depth of

sequencing studies, identifying the causal mutations and single-

nucleotide polymorphisms (SNPs) that are responsible for

determining heritable traits and disease susceptibility remains

challenging. Indeed, many studies have found thousands of

genetic variants of small effect size contribute to common traits.

3–

It has become apparent that complex regulatory interactions

between multiple genes and variants can contribute to deﬁning

the state of the cell. Modeling such phenotypes requires that we

have a clearer picture of how genes and proteins work together to

perform normal cellular functions, and how remodeling the

interactions between genes can cause changes in phenotype

including disease.

In this context, it is useful to make a subtle shift and think of a

phenotype as being deﬁned by a network of interacting genes

and gene products. It has been shown that analyzing the

mathematical properties of such networks can provide important

biological insight into phenotypic properties. For example, high-

degree “hubs” in protein –protein interaction (PPI) networks are

enriched for genes essential to growth.

Biological networks are

known to have modular structure and contain closely interacting

groups of nodes, or “communities”, that work together to carry

out cellular functions.

7–9

There are many analytical and experi-

mental methods for inferring network models associated with

different phenotypic states, and for computing topological

properties like centrality and community structure.

10–13

However,

the most signiﬁcant questions we can ask of biological networks—

how networks differ from each other, and how these differences in

network structure drive functional changes —remain largely

unanswered. A signiﬁcant challenge in this area is the lack of

computational approaches for ﬁnding meaningful changes in the

structure of large complex networks.

Previous work on comparative analysis of biological networks

has focused on the so-called “differential network”, the set of

edges that are altered relative to a reference network.

While the

advantage of this approach is its simplicity, there are several issues

that arise in such an edge-based analysis. First, biological network

inference has a relatively high rate of false negatives due to noise

in both the experimental data that are used and in the network

inference methods themselves. Consequently, it can be difﬁcult to

determine whether the appearance or disappearance of a single

edge is “real”. The uncertainty in the estimate of the difference

between two edge weights is the sum of the uncertainties in each

individual edge, which inﬂates noise in the ﬁnal differential

network. Second, the perturbed network will in general contain

both positive and negative changes in edge weight relative to the

reference network, and it is challenging to analyze and interpret a

differential network with mixed signs. If we only consider the new

edges associated with a phenotype, we would miss the functional

Received: 14 August 2017 Revised: 29 March 2018 Accepted: 2 April 2018

Published online: 19 April 2018

Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA;

Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute,

Boston, MA, USA and

Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA

Correspondence: Megha Padi (mpadi@email.arizona.edu)

www.nature.com/npjsba

Published in partnership with the Systems Biology Institute

effects of decreases in edge activity. Third, by focusing only on the

altered edges and discarding common edges, the differential

interactions are taken out of their functional context, making it

difﬁcult to connect them to global cellular changes. For example,

adding or deleting ten scattered edges in a network may have

very different consequences on the phenotype than would the

same number of changes concentrated in a local functional

neighborhood of the network.

One way to address these issues and ﬁnd more robust

differences between networks is to identify changes in groups

of nodes, rather than in individual edges. Computational methods

that have been developed to do this fall into several categories.

First there are methods that evaluate differences in pre-speciﬁed

network features, like user-deﬁned gene sets, small regulatory

motifs, or global topological characteristics. For example, Gam-

berdella et al. evaluated the statistical signiﬁcance of differences in

co-expression of a user-deﬁned gene set between two condi-

tions.

Similarly, the coXpress method deﬁnes clusters using co-

expression in the reference condition, and tests for signiﬁcant

changes in each cluster under a new condition.

Landeghem

et al. developed a method for inferring the best differential

network that contrasts two datasets, and new measures have

been devised to test whether global modular structure and

degree characteristics are different between two networks.

17–20

However, these methods are limited to examining pre-deﬁned

gene modules and network features, and fail to take full

advantage of the network structure. As such, they lack the ability

to discover new pathways and network modules that functionally

distinguish different phenotypes.

Other methods have been developed to discover de novo gene

modules that differ between conditions. The DiffCoEx algorithm

iteratively groups genes that are differentially co-expressed to ﬁnd

new modules.

21,22

Valcarcel et al. compared metabolite correlation

networks to discover groups of metabolites that changed their

correlation pattern between normal weight and obese mice.

These methods are based on ﬁrst computing the most differential

edges and then grouping them together, which increases the

uncertainty of each edge estimate and does not incorporate

functional edges that are present in both conditions,

14,24

thus

losing network context.

Another class of methods attempts to identify “active modules ”,

which are groups of genes that are differentially expressed in a

particular disease or condition and also highly connected in a

reference network, such as the PPI network.

However, the “active

modules” framework only uses differential gene expression and so

focuses on the nodes rather than accounting for changes in the

strength of regulatory edges.

We present a new graph-based approach called

ALtered

Partitions Across Community Architectures (ALPACA) that com-

pares two networks and identiﬁes de novo the gene modules that

best distinguish the networks. ALPACA is based on modularity

maximization, a technique commonly used to ﬁnd communities in

a single graph. As applied previously, modularity is a measure of

the observed edge density of the communities as compared to

their expected density in a degree-matched random graph.

Although this technique is powerful, it has a “resolution limit”

because communities can only be identiﬁed if they are larger than

the typical cluster size in random graph conﬁgurations.

This lack

of resolution is especially disadvantageous when studying

transcriptional networks, which tend to have a dense and

hierarchical structure, and whose functional units only become

evident under different environmental conditions.

A framework

based on modularity maximization has been created to ﬁnd

common community structure among multiple networks,

but

the only way to detect differences is to apply modularity

maximization to each network separately, followed by brute-

force comparison of the two resulting community structures.

In ALPACA, we adapt the modularity framework to compare

condition-speciﬁc networks to each other rather than to a random

graph null model. We deﬁne a score called the “differential

modularity” that compares the density of modules in the

“perturbed” network to the expected density in a matched

“baseline” network, allowing us to contrast, for example, networks

from disease and healthy tissue samples and partition the nodes

into optimal differential modules, without relying on predeﬁned

gene sets or pathways. In contrast to methods that simply cluster

the most differential edges, ALPACA compares the full network

structures active in each condition and reduces the noise from

individual edges by estimating an aggregated null model. And

because the null model is based on the community structure of a

known reference network rather than on a random graph, the

“resolution limit” is substantially smaller, and ALPACA can detect

small disease modules otherwise hidden within larger regulatory

programs associated with normal cellular functions.

To demonstrate the utility of ALPACA, we show that it can

identify changes in the modular structure of simulated networks,

and that it exhibits higher resolution and robustness than other

network approaches. We then apply it to compare transcriptional

networks derived from non-angiogenic and angiogenic subtypes

of ovarian cancer, normal human ﬁbroblasts and ﬁbroblasts

expressing tumor virus oncogenes, and male and female breast

tissue from the Genotype-Tissue Expression (GTEx) project. In each

case, we ﬁnd that ALPACA identiﬁes modules enriched in

biological processes relevant to the phenotypes we are

comparing.

RESULTS

Modularity maximization and detecting community structure

Many methods for determining the community structure of a

network are based on maximizing the modularity:

Q ¼

i;j





δðC

; C

Þ:

(1)

Here, A

indicates the adjacency matrix of the network, m is the

number of edges, d

is the degree of node i, and C

is the

community assignment of node i. The modularity represents to

what extent the proposed communities have more edges within

them than expected in a randomly connected graph with the

same degree properties; this null expectation is represented in the

second term of the equation above. The modularity is optimized

over the space of all possible partitions {C} and the value of C

corresponding to the maximum modularity then determines the

community structure of the network. An exhaustive search is not

possible for large networks, but many methods have been

developed to ﬁnd locally optimal community structure, including

ones based on edge betweenness, label propagation, and random

walks.

13,29,30

The Louvain algorithm is a particularly efﬁcient way

to ﬁnd high-quality local optima of the modularity function.

Community comparison and edge subtraction

Having arrived at a pair of inferred networks corresponding to

different phenotypic states, there are two straightforward ways to

compare the community structures based on the modularity

metric (Fig. 1). One method, which we will call “community

comparison”, consists of using modularity maximization to ﬁnd

the community structure for each network individually, and then

ﬁnding the nodes that alter their community membership

between the two networks. Another method, which we will call

“edge subtraction”, is to compute the differences in the edge

weights between the two networks, and then apply modularity

maximization to the resulting subtracted weights.

Detecting phenotype-driven transitions in regulatory

M Padi and J Quackenbush

npj Systems Biology and Applications (2018) 16 Published in partnership with the Systems Biology Institute

1234567890():,;

Both methods can detect large, dramatic changes in network

structure. However, there are important differences in these

methods. “Community comparison” is limited in its ability to

detect structural changes smaller than the average community

size in each individual network. In contrast, “edge subtraction”

acts on the difference of the edge weights, which reduces the

density of the network and increases the resolution, but this

method is also more strongly affected by noise in the individual

edges. Further, only positive edge weight differences can be used

to run modularity maximization in the subtracted network, so

edges that are lost are not appropriately accounted for;

incorporating both positive and negative edge weight differences

requires more complex techniques.

32,33

ALPACA: a new method for detecting changes in community

structure

To overcome some of the limitations of the community

comparison and edge subtraction methods, we developed

ALPACA, a new algorithm based on modularity maximization.

The unique aspect of ALPACA is that, rather than comparing edge

distributions to a random null model, we compare edges of the

“perturbed” network to a null model based on the “baseline”

network to ﬁnd differential gene modules between the two

networks (Fig. 1). ALPACA optimizes a new quantity called

“differential modularity”, which we deﬁne as:

D ¼

i;j

δ M

; M



i;j

ðA

 N

Þ δðM

; M

Þ:

(2)

This score compares the number of edges in a module M in the

perturbed network—whose adjacency matrix is given by A

and

total edge weight is m

—to the expected number of edges N

based on the pre-computed community structure {C} of the

baseline network. Here, N

is deﬁned as:

b2C



a2C



a2C

;b2C

;

(3)

where C

is the community assignment of node i in the baseline

network, and

is the normalized weight of the edge between

node a and node b in the baseline network:



. For

the normalization, we have chosen to globally scale the edge

weights of the baseline network so that the total matches m

, the

sum of the edge weights in the perturbed network. This allows a

fair comparison between two networks that could be derived from

two datasets of differing quality or sample size and may have

different global sensitivity properties. To identify the modules {M}

that maximize the differential modularity, we use the following

two-step procedure. First, we determine the community structure

Baseline

Perturbed

Community comparison

Edge subtraction

ALPACA

Compute community structure

{C} of baseline network

Compute differential modularity

matrix D

for perturbed network

relative to {C}

Apply Louvain algorithm to D

and find optimal assignment of

nodes to differential modules {M}

GO term enrichment on top-

ranked genes in each module

Baseline

Perturbed

Fig. 1 Methods to compare networks and ﬁnd changes in modular structure. “Community comparison” identiﬁes communities separately in

each network and looks for nodes that change their community membership. “Edge subtraction” ﬁnds communities by subtracting the

networks and ﬁnding communities in the resulting differential edges (red arrows). ALPACA looks for groups of genes that are more

interconnected in the perturbed network than expected given the community structure of the baseline network. Flowchart shows the major

steps in the implementation of ALPACA

Detecting phenotype-driven transitions in regulatory

M Padi and J Quackenbush

Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16

of the baseline network using established methods.

9,31

Second,

we compute the differential modularity matrix D

and apply the

Louvain optimization algorithm to iteratively aggregate the nodes

into modules.

Note that the equation above is presented in a form that applies

to weighted bipartite networks, as we will be applying it to

analyze transcription factor (TF)–gene interactions. It can be easily

adapted to analyze other types of networks. More details about

the implementation of all three methods—community compar-

ison, edge subtraction, and differential modularity—are presented

in the Materials and Methods section.

Evaluating the performance of ALPACA on simulated networks

We reasoned that ALPACA would be more sensitive to small

changes in modular structure than methods based on standard

community detection, because the null model is computed using

detailed properties of the baseline network rather than relying on

random graphs. We also believed that ALPACA would be less

sensitive to noise in individual edge weights than edge

subtraction, because the null model is estimated by averaging

over communities in the baseline network. We set out to test

these properties in a setting that resembles real biological

networks as much as possible, but where we have control over

the changes in modularity.

To do this, we constructed a baseline network and then created

new modules through the “addition” of new edges, resulting in a

perturbed network. For the noiseless version of this simulation, we

inferred a regulatory network by integrating known human TF-

binding sites with gene expression data in normal human

ﬁbroblasts using the algorithm PANDA

(see Materials and

Methods for further details). After thresholding the edge weights

and applying CONDOR,

a method for community detection in

bipartite networks, we found that the baseline network had ﬁve

communities of varying sizes. Next, we simulated a set of

perturbed networks by choosing a random subset of TFs and

genes and adding new edges between them, thus artiﬁcially

creating a new module. The new module consisted of between 3

and 21 TFs, and ﬁve times as many genes as TFs.

To these simulated networks, we applied three differential

community detection methods—community comparison, edge

subtraction, and ALPACA—and ranked the nodes by their

contribution to the ﬁnal score for each method. We then used

Kolmogorov–Smirnov and Wilcoxon tests to evaluate whether the

“true” module ranked higher than expected by chance in each

ranked list. The edge subtraction method demonstrated superior

performance for recovering modules of all sizes (Fig. 2a); this is to

be expected, since the only new edges added to the networks

were within the new modules. Examining the results from the

other two methods, we observed that ALPACA is substantially

better than community comparison at detecting smaller modules

ranging down to a size of 50 nodes.

We then introduced edge noise into the “addition” simulation

while retaining the modular structure of the underlying network.

To do this, we made another series of perturbed networks, where,

in addition to introducing the new module as described above, we

also randomly resampled the edges from the baseline network

while retaining the inter-community and intra-community edge

density. In this more realistic set of simulations, we found that

ALPACA outperformed the other methods on modules in the

range of 18–90 nodes (Fig. 2b).

To check that these results are independent of the particular

optimization algorithm used, we repeated the analysis using the

Louvain method instead of CONDOR for initial community

detection in the community comparison and edge subtraction

methods. The results were similar in both cases, and in particular,

ALPACA still outperformed the other methods on modules in the

range of 18–54 nodes (Supplementary Fig. 1). This indicates that

the superior performance of ALPACA is not due to the

Fig. 2 Performance of three methods on simulated networks with added module. Network at left visualizes the regulator y network derived

from normal human ﬁbroblasts, with purple, yellow, orange, pink, and blue denoting the pre-existing community structure, and red nodes

depicting the synthetically added module. Bar graphs show performance of each method—ALPACA, edge subtraction or community

comparison—on network simulations with (a) or without (b) resampling of edges among the pre-existing communities. P-values were

computed using a one-sided Wilcoxon test. Bar graphs show mean of −log

P over 20 network simulations, and error bars depict the

corresponding standard deviation. Boxplots represent same data as the bar plots. Boxplot elements are deﬁned as follows: center line,

median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. Note that the color of the boxplots for the

edge subtraction method in a is not visible because the distribution is very narrow

Detecting phenotype-driven transitions in regulatory

M Padi and J Quackenbush

npj Systems Biology and Applications (2018) 16 Published in partnership with the Systems Biology Institute

optimization method used, but rather arises directly from the

deﬁnition of the differential modularity.

While the edge subtraction method works well to detect

“added” modules under low noise conditions, it becomes

problematic if edges are deleted or if their weights decrease in

the perturbed state relative to the control, because most network

clustering methods are only formulated for positive edge weights.

One might suggest transformation of edge weights, but any

simple transformation of negative edge weights to make them

positive (for example, by exponentiation or a linear shift) would

bias the results. Algorithms that directly incorporate negative

edge weights are complex and involve multiple steps and

assumptions.

32,33

In contrast, ALPACA’s differential modularity

matrix D

contains both negative and positive values, correspond-

ing to areas of decreasing and increasing edge density relative to

the baseline network and its community structure. By optimizing

over the sum of D

, ALPACA incorporates positive and negative

changes in edge density in a symmetric fashion.

As a simple demonstration of ALPACA’s ability to detect

community structure changes with negative weights, we created

“subtracted” simulations in which selected edges in a baseline

network are reduced in weight to produce a substantially different

perturbed network structure (Fig. 3 and Supplementary Fig. 2; see

Materials and Methods for more details). In Fig. 3, for example, the

network consists of two dense node groups, A and B, which are

more strongly connected together in the baseline condition (edge

weight 0.8) than in the perturbed condition (edge weight 0.2).

Therefore, the perturbation causes groups A and B to separate and

perform distinct functions; intuitively, this means groups A and B

characterize the change in modular structure between the two

networks. Because the only change in edge weights is the

decrease in edges between A and B, the edge subtraction method

results in a network with negative edge weights.

If instead we reverse the process and subtract the perturbed

network from the baseline network, the resulting positive edge

weight network produces two modules, one consisting of TFs in

group A linked with genes in group B, the other consisting of TFs

in group B linked with genes in group A. This does not match the

intuitive result we are looking for. The community comparison

method detects no change because both the baseline and

perturbed networks are composed of the same two node

communities. However, ALPACA correctly identiﬁes groups A

and B as the differential modules characterizing this transition.

An example with three node groups is shown in Supplementary

Fig. 2. Again, we ﬁnd that ALPACA identiﬁes the key change in

modular structure and edge subtraction cannot. Although these

examples are simple, such areas of decreased edge density will be

locally embedded in any realistic biological network and will

strongly inﬂuence the identiﬁcation of neighboring modules.

Angiogenic vs. non-angiogenic ovarian cancer tumors

Ovarian cancer is the second most common cause of cancer death

among women in the developed world. Available treatment

options for ovarian cancer, such as platinum-based therapies,

often lead to chemoresistance and recurrence. Ovarian cancer

tumors can be stratiﬁed by gene expression proﬁle, tissue of

origin, or other characteristics, in order to better understand

heterogeneity and predict patient-speciﬁc therapeutic strategies.

We previously found that a gene signature associated with

angiogenesis is able to classify ovarian cancer patients into a poor-

prognosis subtype.

We classiﬁed 510 ovarian cancer patients from The Cancer

Genome Atlas into 188 angiogenic and 322 non-angiogenic

tumors and used PANDA to infer separate gene regulatory

networks for the two subtypes, as previously described.

We then

applied a variety of methods to look for changes in community

structure associated with the angiogenic tumors, ranked the nodes

by their contribution to the total score for each method (see

Materials and Methods), and evaluated the core genes in each set

for functional enrichment. In order to evaluate the unique

contributions of ALPACA, we ﬁrst applied standard community

detection techniques to identify communities in each subtype-

speciﬁc network, using both the Louvain method and CONDOR,

and we looked for GO terms that were statistically enriched in the

angiogenic network but not in the non-angiogenic network. Next,

we applied edge subtraction, community comparison, and ALPACA

to directly identify differential modules associated with angiogenic

tumors. Finally, we also computed the differentially expressed

genes between the non-angiogenic and angiogenic cancer

subtypes. The GO term enrichment with P

adj

< 0.05 for each

method is presented in full in Supplementary Table 1.

Edge subtraction (rev.)

ALPACA

Baseline Perturbed

Transcription factor

Gene

Community comparison

Group A

Group B

Fig. 3 Performance of three methods on perturbations that decrease edge density. Left-hand side shows a network transition involving a

decrease in edge weights between nodes in groups A and B. All other edges remain the same. Right-hand side shows the results of three

methods when comparing these two networks. Each method identiﬁed up to two differential modules, which are distinguished by their light

blue and light pink colors in each case. Note that the “edge subtraction” method needs to be applied in the reverse manner, comparing the

baseline network against the perturbed network, in order to have positive differential edge weights

Detecting phenotype-driven transitions in regulatory

M Padi and J Quackenbush

Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16

HTML Viewer

Frequently Asked Questions (1)

Q1. What are the contributions in "Detecting phenotype-driven transitions in regulatory network structure" ?

Here, the authors describe ALPACA ( ALtered Partitions Across Community Architectures ), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. As an application, the authors use ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex networks, but also that these changes may be relevant for characterizing biological phenotypes.

Phenotype-Driven Transitions In Regulatory Network Structure

Summary (2 min read)

INTRODUCTION

RESULTS

DISCUSSION

METHODS

AUTHOR CONTRIBUTIONS

Citations

References

"Phenotype-Driven Transitions In Reg..." refers background in this paper

"Phenotype-Driven Transitions In Reg..." refers methods in this paper

"Phenotype-Driven Transitions In Reg..." refers background or methods in this paper

Related Papers (5)

Frequently Asked Questions (1)

Q1. What are the contributions in "Detecting phenotype-driven transitions in regulatory network structure" ?