scispace - formally typeset

Posted ContentDOI

Phenotype-Driven Transitions In Regulatory Network Structure

25 May 2017-bioRxiv (Cold Spring Harbor Labs Journals)-pp 142281

TL;DR: In comparing the modular structure of networks in female and male breast tissue, it is found that female breast has distinct modules enriched for genes involved in estrogen receptor and ERK signaling, indicating that not only does phenotypic change correlate with network structural changes, but also that ALPACA can identify such modules in complex networks.

AbstractComplex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead arise from multiple factors that together functionally perturb the underlying molecular network. Biological networks are known to be highly modular and contain dense “communities” of genes that carry out cellular processes, but these structures change between tissues, during development, and in disease. While many methods exist for inferring networks, we lack robust methods for quantifying changes in network structure. Here, we describe ALPACA (ALtered Partitions Across Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than currently available network comparison methods. We used ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. In each case, ALPACA identified modules enriched for processes relevant to the phenotype. For example, modules specific to angiogenic ovarian tumors were enriched for genes associated with blood vessel development, interferon signaling, and flavonoid biosynthesis. In comparing the modular structure of networks in female and male breast tissue, we found that female breast has distinct modules enriched for genes involved in estrogen receptor and ERK signaling. The functional relevance of these new modules indicate that not only does phenotypic change correlate with network structural changes, but also that ALPACA can identify such modules in complex networks.

Topics: Biological network (55%)

Summary (2 min read)

Jump to: [INTRODUCTION][RESULTS][DISCUSSION][METHODS] and [AUTHOR CONTRIBUTIONS]

INTRODUCTION

  • 1,2 Despite the increasing power and depth of sequencing studies, identifying the causal mutations and singlenucleotide polymorphisms (SNPs) that are responsible for determining heritable traits and disease susceptibility remains challenging.
  • Biological networks are known to have modular structure and contain closely interacting groups of nodes, or “communities”, that work together to carry out cellular functions.
  • One way to address these issues and find more robust differences between networks is to identify changes in groups of nodes, rather than in individual edges.
  • 17–20 However, these methods are limited to examining pre-defined gene modules and network features, and fail to take full advantage of the network structure.

RESULTS

  • The modularity represents to what extent the proposed communities have more edges within them than expected in a randomly connected graph with the same degree properties; this null expectation is represented in the second term of the equation above.
  • 31 Community comparison and edge subtraction Having arrived at a pair of inferred networks corresponding to different phenotypic states, there are two straightforward ways to compare the community structures based on the modularity metric (Fig. 1).
  • The authors previously found that a gene signature associated with angiogenesis is able to classify ovarian cancer patients into a poorprognosis subtype.35.
  • The authors also computed the correlation in expression among the genes in each ALPACA module.
  • Finally, the authors ranked the genes by their contribution to the differential modularity and used Gene Set Enrichment Analysis (GSEA) to evaluate enrichment for GO terms across the whole network (see Materials and Methods).

DISCUSSION

  • Biological networks have complex modular and hierarchical topologies that allow organisms to carry out the functions necessary for survival.
  • ALPACA differs from other community Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16 detection methods in that it compares the structure of networks to each other rather than to a random background network and is thus better able to detect subtle differences in network modular structure.
  • The differential modularity also incorporates increased and decreased edge weights across the entire network into a single, simple framework for module detection.
  • This is because network-level analysis, and ALPACA in particular, helps organize both strongly and weakly differentially expressed genes into new modules that are under common regulatory control, identifying signaling pathways that could not have been distinguished if genes were ranked purely by differential expression.
  • Genes annotated by the shown GO terms are labeled in large font Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16 “edgetic” perturbations, in order to discover functional changes in protein complexes and signaling associated with disease.

METHODS

  • ALPACA algorithm ALPACA comprises the following two steps: Step 1: The input network consists of edges between regulators and target genes.
  • The authors first used either CONDOR or Louvain method to find the community structure of the baseline and perturbed networks, in each case keeping only edges that had positive z-scores.
  • The authors evaluated the results of each method on the simulated networks by comparing the ranks of true positives (the target genes in the added module) against a background consisting of target genes not in the added module.
  • For the baseline network, the edges between groups A and B were set to weight 0.8 and for the perturbed network, the edges between groups A and B were set to weight 0.2.
  • 38,52 Differential expression analysis was carried out using the R package limma, and p-values were adjusted for multiple testing using the Benjamini–Hochberg method.53.

AUTHOR CONTRIBUTIONS

  • M.P. conceived of the project, performed analysis, and wrote the paper.
  • J.Q. helped refine the analysis and wrote the paper.
  • The authors declare no competing financial interests.
  • Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations, also known as Publisher's note.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

ARTICLE
OPEN
Detecting phenotype-driven transitions in regulatory network
structure
Megha Padi
1
and John Quackenbush
2,3
Complex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead
arise from functional changes in the underlying molecular network. Biological networks are known to be highly modular and
contain dense communities of genes that carry out cellular processes, but these structures change between tissues, during
development, and in disease. While many methods exist for inferring networks and analyzing their topologies separately, there is a
lack of robust methods for quantifying differences in network structure. Here, we describe ALPACA (ALtered Partitions Across
Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to
identify condition-specic modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than
currently available network comparison methods. As an application, we use ALPACA to compare transcriptional networks in three
contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human broblasts expressing transforming viral oncogenes,
and sexual dimorphism in human breast tissue. In each case, ALPACA identies modules enriched for processes relevant to the
phenotype. For example, modules specic to angiogenic ovarian tumors are enriched for genes associated with blood vessel
development, and modules found in female breast tissue are enriched for genes involved in estrogen receptor and ERK signaling.
The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex
networks, but also that these changes may be relevant for characterizing biological phenotypes.
npj Systems Biology and Applications (2018)4:16; doi:10.1038/s41540-018-0052-5
INTRODUCTION
We tend to think of phenotypes as being characterized by
differentially expressed genes or mutations in particular genes.
However, the individual genes that show the greatest changes in
expression in a phenotype do not tend to be drivers of that
phenotype.
1,2
Despite the increasing power and depth of
sequencing studies, identifying the causal mutations and single-
nucleotide polymorphisms (SNPs) that are responsible for
determining heritable traits and disease susceptibility remains
challenging. Indeed, many studies have found thousands of
genetic variants of small effect size contribute to common traits.
3
5
It has become apparent that complex regulatory interactions
between multiple genes and variants can contribute to dening
the state of the cell. Modeling such phenotypes requires that we
have a clearer picture of how genes and proteins work together to
perform normal cellular functions, and how remodeling the
interactions between genes can cause changes in phenotype
including disease.
In this context, it is useful to make a subtle shift and think of a
phenotype as being dened by a network of interacting genes
and gene products. It has been shown that analyzing the
mathematical properties of such networks can provide important
biological insight into phenotypic properties. For example, high-
degree hubs in protein protein interaction (PPI) networks are
enriched for genes essential to growth.
6
Biological networks are
known to have modular structure and contain closely interacting
groups of nodes, or communities, that work together to carry
out cellular functions.
79
There are many analytical and experi-
mental methods for inferring network models associated with
different phenotypic states, and for computing topological
properties like centrality and community structure.
1013
However,
the most signicant questions we can ask of biological networks
how networks differ from each other, and how these differences in
network structure drive functional changes remain largely
unanswered. A signicant challenge in this area is the lack of
computational approaches for nding meaningful changes in the
structure of large complex networks.
Previous work on comparative analysis of biological networks
has focused on the so-called differential network, the set of
edges that are altered relative to a reference network.
14
While the
advantage of this approach is its simplicity, there are several issues
that arise in such an edge-based analysis. First, biological network
inference has a relatively high rate of false negatives due to noise
in both the experimental data that are used and in the network
inference methods themselves. Consequently, it can be difcult to
determine whether the appearance or disappearance of a single
edge is real. The uncertainty in the estimate of the difference
between two edge weights is the sum of the uncertainties in each
individual edge, which inates noise in the nal differential
network. Second, the perturbed network will in general contain
both positive and negative changes in edge weight relative to the
reference network, and it is challenging to analyze and interpret a
differential network with mixed signs. If we only consider the new
edges associated with a phenotype, we would miss the functional
Received: 14 August 2017 Revised: 29 March 2018 Accepted: 2 April 2018
Published online: 19 April 2018
1
Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA;
2
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute,
Boston, MA, USA and
3
Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
Correspondence: Megha Padi (mpadi@email.arizona.edu)
www.nature.com/npjsba
Published in partnership with the Systems Biology Institute

effects of decreases in edge activity. Third, by focusing only on the
altered edges and discarding common edges, the differential
interactions are taken out of their functional context, making it
difcult to connect them to global cellular changes. For example,
adding or deleting ten scattered edges in a network may have
very different consequences on the phenotype than would the
same number of changes concentrated in a local functional
neighborhood of the network.
One way to address these issues and nd more robust
differences between networks is to identify changes in groups
of nodes, rather than in individual edges. Computational methods
that have been developed to do this fall into several categories.
First there are methods that evaluate differences in pre-specied
network features, like user-dened gene sets, small regulatory
motifs, or global topological characteristics. For example, Gam-
berdella et al. evaluated the statistical signicance of differences in
co-expression of a user-dened gene set between two condi-
tions.
15
Similarly, the coXpress method denes clusters using co-
expression in the reference condition, and tests for signicant
changes in each cluster under a new condition.
16
Landeghem
et al. developed a method for inferring the best differential
network that contrasts two datasets, and new measures have
been devised to test whether global modular structure and
degree characteristics are different between two networks.
1720
However, these methods are limited to examining pre-dened
gene modules and network features, and fail to take full
advantage of the network structure. As such, they lack the ability
to discover new pathways and network modules that functionally
distinguish different phenotypes.
Other methods have been developed to discover de novo gene
modules that differ between conditions. The DiffCoEx algorithm
iteratively groups genes that are differentially co-expressed to nd
new modules.
21,22
Valcarcel et al. compared metabolite correlation
networks to discover groups of metabolites that changed their
correlation pattern between normal weight and obese mice.
23
These methods are based on rst computing the most differential
edges and then grouping them together, which increases the
uncertainty of each edge estimate and does not incorporate
functional edges that are present in both conditions,
14,24
thus
losing network context.
Another class of methods attempts to identify active modules ,
which are groups of genes that are differentially expressed in a
particular disease or condition and also highly connected in a
reference network, such as the PPI network.
25
However, the active
modules framework only uses differential gene expression and so
focuses on the nodes rather than accounting for changes in the
strength of regulatory edges.
We present a new graph-based approach called
ALtered
Partitions Across Community Architectures (ALPACA) that com-
pares two networks and identies de novo the gene modules that
best distinguish the networks. ALPACA is based on modularity
maximization, a technique commonly used to nd communities in
a single graph. As applied previously, modularity is a measure of
the observed edge density of the communities as compared to
their expected density in a degree-matched random graph.
Although this technique is powerful, it has a resolution limit
because communities can only be identied if they are larger than
the typical cluster size in random graph congurations.
26
This lack
of resolution is especially disadvantageous when studying
transcriptional networks, which tend to have a dense and
hierarchical structure, and whose functional units only become
evident under different environmental conditions.
27
A framework
based on modularity maximization has been created to nd
common community structure among multiple networks,
28
but
the only way to detect differences is to apply modularity
maximization to each network separately, followed by brute-
force comparison of the two resulting community structures.
In ALPACA, we adapt the modularity framework to compare
condition-specic networks to each other rather than to a random
graph null model. We dene a score called the differential
modularity that compares the density of modules in the
perturbed network to the expected density in a matched
baseline network, allowing us to contrast, for example, networks
from disease and healthy tissue samples and partition the nodes
into optimal differential modules, without relying on predened
gene sets or pathways. In contrast to methods that simply cluster
the most differential edges, ALPACA compares the full network
structures active in each condition and reduces the noise from
individual edges by estimating an aggregated null model. And
because the null model is based on the community structure of a
known reference network rather than on a random graph, the
resolution limit is substantially smaller, and ALPACA can detect
small disease modules otherwise hidden within larger regulatory
programs associated with normal cellular functions.
To demonstrate the utility of ALPACA, we show that it can
identify changes in the modular structure of simulated networks,
and that it exhibits higher resolution and robustness than other
network approaches. We then apply it to compare transcriptional
networks derived from non-angiogenic and angiogenic subtypes
of ovarian cancer, normal human broblasts and broblasts
expressing tumor virus oncogenes, and male and female breast
tissue from the Genotype-Tissue Expression (GTEx) project. In each
case, we nd that ALPACA identies modules enriched in
biological processes relevant to the phenotypes we are
comparing.
RESULTS
Modularity maximization and detecting community structure
Many methods for determining the community structure of a
network are based on maximizing the modularity:
13
Q ¼
1
2m
X
i;j
A
ij
d
i
d
j
2m

δðC
i
; C
j
Þ:
(1)
Here, A
ij
indicates the adjacency matrix of the network, m is the
number of edges, d
i
is the degree of node i, and C
i
is the
community assignment of node i. The modularity represents to
what extent the proposed communities have more edges within
them than expected in a randomly connected graph with the
same degree properties; this null expectation is represented in the
second term of the equation above. The modularity is optimized
over the space of all possible partitions {C} and the value of C
i
corresponding to the maximum modularity then determines the
community structure of the network. An exhaustive search is not
possible for large networks, but many methods have been
developed to nd locally optimal community structure, including
ones based on edge betweenness, label propagation, and random
walks.
13,29,30
The Louvain algorithm is a particularly efcient way
to nd high-quality local optima of the modularity function.
31
Community comparison and edge subtraction
Having arrived at a pair of inferred networks corresponding to
different phenotypic states, there are two straightforward ways to
compare the community structures based on the modularity
metric (Fig. 1). One method, which we will call community
comparison, consists of using modularity maximization to nd
the community structure for each network individually, and then
nding the nodes that alter their community membership
between the two networks. Another method, which we will call
edge subtraction, is to compute the differences in the edge
weights between the two networks, and then apply modularity
maximization to the resulting subtracted weights.
Detecting phenotype-driven transitions in regulatory
M Padi and J Quackenbush
2
npj Systems Biology and Applications (2018) 16 Published in partnership with the Systems Biology Institute
1234567890():,;

Both methods can detect large, dramatic changes in network
structure. However, there are important differences in these
methods. Community comparison is limited in its ability to
detect structural changes smaller than the average community
size in each individual network. In contrast, edge subtraction
acts on the difference of the edge weights, which reduces the
density of the network and increases the resolution, but this
method is also more strongly affected by noise in the individual
edges. Further, only positive edge weight differences can be used
to run modularity maximization in the subtracted network, so
edges that are lost are not appropriately accounted for;
incorporating both positive and negative edge weight differences
requires more complex techniques.
32,33
ALPACA: a new method for detecting changes in community
structure
To overcome some of the limitations of the community
comparison and edge subtraction methods, we developed
ALPACA, a new algorithm based on modularity maximization.
The unique aspect of ALPACA is that, rather than comparing edge
distributions to a random null model, we compare edges of the
perturbed network to a null model based on the baseline
network to nd differential gene modules between the two
networks (Fig. 1). ALPACA optimizes a new quantity called
differential modularity, which we dene as:
D ¼
1
m
P
X
i;j
D
ij
δ M
i
; M
j

¼
1
m
P
X
i;j
ðA
P
ij
N
ij
Þ δðM
i
; M
j
Þ:
(2)
This score compares the number of edges in a module M in the
perturbed networkwhose adjacency matrix is given by A
P
ij
and
total edge weight is m
P
to the expected number of edges N
ij
based on the pre-computed community structure {C} of the
baseline network. Here, N
ij
is dened as:
N
ij
¼
P
b2C
j
~
w
ib

P
a2C
i
~
w
aj

P
a2C
i
;b2C
j
~
w
ab
;
(3)
where C
i
is the community assignment of node i in the baseline
network, and
~
w
ab
is the normalized weight of the edge between
node a and node b in the baseline network:
~
w
ab
¼
m
P
m

w
ab
. For
the normalization, we have chosen to globally scale the edge
weights of the baseline network so that the total matches m
P
, the
sum of the edge weights in the perturbed network. This allows a
fair comparison between two networks that could be derived from
two datasets of differing quality or sample size and may have
different global sensitivity properties. To identify the modules {M}
that maximize the differential modularity, we use the following
two-step procedure. First, we determine the community structure
Baseline
Perturbed
Community comparison
Edge subtraction
ALPACA
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
TF
Compute community structure
{C} of baseline network
Compute differential modularity
matrix D
ij
for perturbed network
relative to {C}
Apply Louvain algorithm to D
ij
and find optimal assignment of
nodes to differential modules {M}
GO term enrichment on top-
ranked genes in each module
Baseline
Perturbed
Fig. 1 Methods to compare networks and nd changes in modular structure. Community comparison identies communities separately in
each network and looks for nodes that change their community membership. Edge subtraction nds communities by subtracting the
networks and nding communities in the resulting differential edges (red arrows). ALPACA looks for groups of genes that are more
interconnected in the perturbed network than expected given the community structure of the baseline network. Flowchart shows the major
steps in the implementation of ALPACA
Detecting phenotype-driven transitions in regulatory
M Padi and J Quackenbush
3
Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16

of the baseline network using established methods.
9,31
Second,
we compute the differential modularity matrix D
ij
and apply the
Louvain optimization algorithm to iteratively aggregate the nodes
into modules.
31
Note that the equation above is presented in a form that applies
to weighted bipartite networks, as we will be applying it to
analyze transcription factor (TF)gene interactions. It can be easily
adapted to analyze other types of networks. More details about
the implementation of all three methodscommunity compar-
ison, edge subtraction, and differential modularityare presented
in the Materials and Methods section.
Evaluating the performance of ALPACA on simulated networks
We reasoned that ALPACA would be more sensitive to small
changes in modular structure than methods based on standard
community detection, because the null model is computed using
detailed properties of the baseline network rather than relying on
random graphs. We also believed that ALPACA would be less
sensitive to noise in individual edge weights than edge
subtraction, because the null model is estimated by averaging
over communities in the baseline network. We set out to test
these properties in a setting that resembles real biological
networks as much as possible, but where we have control over
the changes in modularity.
To do this, we constructed a baseline network and then created
new modules through the addition of new edges, resulting in a
perturbed network. For the noiseless version of this simulation, we
inferred a regulatory network by integrating known human TF-
binding sites with gene expression data in normal human
broblasts using the algorithm PANDA
34
(see Materials and
Methods for further details). After thresholding the edge weights
and applying CONDOR,
9
a method for community detection in
bipartite networks, we found that the baseline network had ve
communities of varying sizes. Next, we simulated a set of
perturbed networks by choosing a random subset of TFs and
genes and adding new edges between them, thus articially
creating a new module. The new module consisted of between 3
and 21 TFs, and ve times as many genes as TFs.
To these simulated networks, we applied three differential
community detection methodscommunity comparison, edge
subtraction, and ALPACAand ranked the nodes by their
contribution to the nal score for each method. We then used
KolmogorovSmirnov and Wilcoxon tests to evaluate whether the
true module ranked higher than expected by chance in each
ranked list. The edge subtraction method demonstrated superior
performance for recovering modules of all sizes (Fig. 2a); this is to
be expected, since the only new edges added to the networks
were within the new modules. Examining the results from the
other two methods, we observed that ALPACA is substantially
better than community comparison at detecting smaller modules
ranging down to a size of 50 nodes.
We then introduced edge noise into the addition simulation
while retaining the modular structure of the underlying network.
To do this, we made another series of perturbed networks, where,
in addition to introducing the new module as described above, we
also randomly resampled the edges from the baseline network
while retaining the inter-community and intra-community edge
density. In this more realistic set of simulations, we found that
ALPACA outperformed the other methods on modules in the
range of 1890 nodes (Fig. 2b).
To check that these results are independent of the particular
optimization algorithm used, we repeated the analysis using the
Louvain method instead of CONDOR for initial community
detection in the community comparison and edge subtraction
methods. The results were similar in both cases, and in particular,
ALPACA still outperformed the other methods on modules in the
range of 1854 nodes (Supplementary Fig. 1). This indicates that
the superior performance of ALPACA is not due to the
Fig. 2 Performance of three methods on simulated networks with added module. Network at left visualizes the regulator y network derived
from normal human broblasts, with purple, yellow, orange, pink, and blue denoting the pre-existing community structure, and red nodes
depicting the synthetically added module. Bar graphs show performance of each methodALPACA, edge subtraction or community
comparisonon network simulations with (a) or without (b) resampling of edges among the pre-existing communities. P-values were
computed using a one-sided Wilcoxon test. Bar graphs show mean of log
10
P over 20 network simulations, and error bars depict the
corresponding standard deviation. Boxplots represent same data as the bar plots. Boxplot elements are dened as follows: center line,
median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. Note that the color of the boxplots for the
edge subtraction method in a is not visible because the distribution is very narrow
Detecting phenotype-driven transitions in regulatory
M Padi and J Quackenbush
4
npj Systems Biology and Applications (2018) 16 Published in partnership with the Systems Biology Institute

optimization method used, but rather arises directly from the
denition of the differential modularity.
While the edge subtraction method works well to detect
added modules under low noise conditions, it becomes
problematic if edges are deleted or if their weights decrease in
the perturbed state relative to the control, because most network
clustering methods are only formulated for positive edge weights.
One might suggest transformation of edge weights, but any
simple transformation of negative edge weights to make them
positive (for example, by exponentiation or a linear shift) would
bias the results. Algorithms that directly incorporate negative
edge weights are complex and involve multiple steps and
assumptions.
32,33
In contrast, ALPACAs differential modularity
matrix D
ij
contains both negative and positive values, correspond-
ing to areas of decreasing and increasing edge density relative to
the baseline network and its community structure. By optimizing
over the sum of D
ij
, ALPACA incorporates positive and negative
changes in edge density in a symmetric fashion.
As a simple demonstration of ALPACAs ability to detect
community structure changes with negative weights, we created
subtracted simulations in which selected edges in a baseline
network are reduced in weight to produce a substantially different
perturbed network structure (Fig. 3 and Supplementary Fig. 2; see
Materials and Methods for more details). In Fig. 3, for example, the
network consists of two dense node groups, A and B, which are
more strongly connected together in the baseline condition (edge
weight 0.8) than in the perturbed condition (edge weight 0.2).
Therefore, the perturbation causes groups A and B to separate and
perform distinct functions; intuitively, this means groups A and B
characterize the change in modular structure between the two
networks. Because the only change in edge weights is the
decrease in edges between A and B, the edge subtraction method
results in a network with negative edge weights.
If instead we reverse the process and subtract the perturbed
network from the baseline network, the resulting positive edge
weight network produces two modules, one consisting of TFs in
group A linked with genes in group B, the other consisting of TFs
in group B linked with genes in group A. This does not match the
intuitive result we are looking for. The community comparison
method detects no change because both the baseline and
perturbed networks are composed of the same two node
communities. However, ALPACA correctly identies groups A
and B as the differential modules characterizing this transition.
An example with three node groups is shown in Supplementary
Fig. 2. Again, we nd that ALPACA identies the key change in
modular structure and edge subtraction cannot. Although these
examples are simple, such areas of decreased edge density will be
locally embedded in any realistic biological network and will
strongly inuence the identication of neighboring modules.
Angiogenic vs. non-angiogenic ovarian cancer tumors
Ovarian cancer is the second most common cause of cancer death
among women in the developed world. Available treatment
options for ovarian cancer, such as platinum-based therapies,
often lead to chemoresistance and recurrence. Ovarian cancer
tumors can be stratied by gene expression prole, tissue of
origin, or other characteristics, in order to better understand
heterogeneity and predict patient-specic therapeutic strategies.
We previously found that a gene signature associated with
angiogenesis is able to classify ovarian cancer patients into a poor-
prognosis subtype.
35
We classied 510 ovarian cancer patients from The Cancer
Genome Atlas into 188 angiogenic and 322 non-angiogenic
tumors and used PANDA to infer separate gene regulatory
networks for the two subtypes, as previously described.
36
We then
applied a variety of methods to look for changes in community
structure associated with the angiogenic tumors, ranked the nodes
by their contribution to the total score for each method (see
Materials and Methods), and evaluated the core genes in each set
for functional enrichment. In order to evaluate the unique
contributions of ALPACA, we rst applied standard community
detection techniques to identify communities in each subtype-
specic network, using both the Louvain method and CONDOR,
and we looked for GO terms that were statistically enriched in the
angiogenic network but not in the non-angiogenic network. Next,
we applied edge subtraction, community comparison, and ALPACA
to directly identify differential modules associated with angiogenic
tumors. Finally, we also computed the differentially expressed
genes between the non-angiogenic and angiogenic cancer
subtypes. The GO term enrichment with P
adj
< 0.05 for each
method is presented in full in Supplementary Table 1.
Edge subtraction (rev.)
ALPACA
Baseline Perturbed
Transcription factor
Gene
Community comparison
Group A
Group B
Fig. 3 Performance of three methods on perturbations that decrease edge density. Left-hand side shows a network transition involving a
decrease in edge weights between nodes in groups A and B. All other edges remain the same. Right-hand side shows the results of three
methods when comparing these two networks. Each method identied up to two differential modules, which are distinguished by their light
blue and light pink colors in each case. Note that the edge subtraction method needs to be applied in the reverse manner, comparing the
baseline network against the perturbed network, in order to have positive differential edge weights
Detecting phenotype-driven transitions in regulatory
M Padi and J Quackenbush
5
Published in partnership with the Systems Biology Institute npj Systems Biology and Applications (2018) 16

Citations
More filters

01 Jan 2016
TL;DR: Large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes, but most fell within regions previously identified by genome-wide association studies.
Abstract: The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

698 citations


01 Mar 2016
Abstract: Mapping perturbed molecular circuits that underlie complex diseases remains a great challenge. We developed a comprehensive resource of 394 cell type– and tissue-specific gene regulatory networks for human, each specifying the genome-wide connectivity among transcription factors, enhancers, promoters and genes. Integration with 37 genome-wide association studies (GWASs) showed that disease-associated genetic variants—including variants that do not reach genome-wide significance—often perturb regulatory modules that are highly specific to disease-relevant cell types or tissues. Our resource opens the door to systematic analysis of regulatory programs across hundreds of human cell types and tissues (http://regulatorycircuits.org).

203 citations


26 Jul 2012
TL;DR: Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual geno-phenotypes.
Abstract: Genotypic differences greatly influence susceptibility and resistance to disease. Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual geno

197 citations


04 Dec 2014
TL;DR: The results indicate a genetic architecture for human height that is characterized by a very large but finite number of causal variants, including mTOR, osteoglycin and binding of hyaluronic acid.
Abstract: Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700 and ∼9,500 SNPs explained ∼21%, ∼24% and ∼29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate–related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.

97 citations


References
More filters

Journal ArticleDOI
04 Mar 2011-Cell
TL;DR: Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.
Abstract: The hallmarks of cancer comprise six biological capabilities acquired during the multistep development of human tumors. The hallmarks constitute an organizing principle for rationalizing the complexities of neoplastic disease. They include sustaining proliferative signaling, evading growth suppressors, resisting cell death, enabling replicative immortality, inducing angiogenesis, and activating invasion and metastasis. Underlying these hallmarks are genome instability, which generates the genetic diversity that expedites their acquisition, and inflammation, which fosters multiple hallmark functions. Conceptual progress in the last decade has added two emerging hallmarks of potential generality to this list-reprogramming of energy metabolism and evading immune destruction. In addition to cancer cells, tumors exhibit another dimension of complexity: they contain a repertoire of recruited, ostensibly normal cells that contribute to the acquisition of hallmark traits by creating the "tumor microenvironment." Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.

42,275 citations


"Phenotype-Driven Transitions In Reg..." refers background in this paper

  • ...Finally, modules 16 and 17 were enriched for various terms involving interferon response, interleukins, and regulation of the NFκB pathway, consistent with the theory that chronic inflammation is associated with risk of cancer (46)....

    [...]


Journal ArticleDOI
TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Abstract: We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

11,600 citations


"Phenotype-Driven Transitions In Reg..." refers background or methods in this paper

  • ...An exhaustive search is not possible for large networks, but many methods have been developed to find locally optimal community structure, including ones based on edge betweenness, label propagation, and random walks (27-29)....

    [...]

  • ...RESULTS Modularity maximization and comparing community structures Many methods for determining the community structure of a network are based on maximizing the modularity (27): Q = 1 2m A!" − d!d! 2m !,! δ(C! ,C!) Here, A!" indicates the adjacency matrix of the network, m is the number of edges, d! is the degree of node i, and C! is the community assignment of node i....

    [...]


Journal ArticleDOI
06 Sep 2012-Nature
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

11,598 citations


Journal ArticleDOI
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

11,078 citations


Journal ArticleDOI
TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

10,260 citations


"Phenotype-Driven Transitions In Reg..." refers methods in this paper

  • ...We then used the Δw!" values as new edge weights to perform community detection using CONDOR or Louvain optimization (9, 30)....

    [...]

  • ...First, we determine the community structure of the baseline network using established methods (9, 30)....

    [...]

  • ...Second, we compute the differential modularity matrix D!" and apply the Louvain optimization algorithm to iteratively aggregate the nodes into modules (30)....

    [...]

  • ...The Louvain algorithm is a particularly efficient way to find high-quality local optima of the modularity function (30)....

    [...]


Frequently Asked Questions (1)
Q1. What are the contributions in "Detecting phenotype-driven transitions in regulatory network structure" ?

Here, the authors describe ALPACA ( ALtered Partitions Across Community Architectures ), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. As an application, the authors use ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex networks, but also that these changes may be relevant for characterizing biological phenotypes.