scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Reverse engineering of gene regulation models from multi-condition experiments

16 Apr 2013-pp 112-119
TL;DR: This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization to present a novel method capable of inferring robust GRN models from multi-condition GRN experiments.
Abstract: Reverse-engineering of quantitative, dynamic gene-regulatory network (GRN) models from time-series gene expression data is becoming important as such data are increasingly generated for research and other purposes. A key problem in the reverse-engineering process is the under-determined nature of these data. Because of this, the reverse-engineered GRN models often lack robustness and perform poorly when used to simulate system responses to new conditions. In this study, we present a novel method capable of inferring robust GRN models from multi-condition GRN experiments. This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization.

Summary (2 min read)

Introduction

  • A range of mathematical methods facilitating the reverse-engineering of quantitative, dynamic gene-regulatory network (GRN) models from time-series gene expression data have been reported in the literature [4].
  • In the present study, the authors investigate the reverse engineering of robust GRN models based on repeated measures from the same GRN system under different conditions.
  • Their approach combines different data types which represent the same experimental condition.
  • The authors current study focuses on data combination utilizing all available data rather than a voting algorithm for the creation of a combined model.

B. Model equation and data generation

  • Common rate laws to model the reaction kinetics (regulatory interactions) of GRN systems include the ssystem, Hill functions, mass action kinetics, general rate law of transcription, and artificial neural network formulations [4].
  • Equation (1) defines the ANN-based rate of change Xi /dt of transcript Xi of gene i within a GRN system of u genes.
  • For each of the three GRN systems depicted in Fig. 2, a single GRN reference model was manually created on the basis of the ANN rate law defined in Equation (1).
  • Each model was simulated with different parameter configurations multiple times using different initial conditions.
  • A great deal of thought went into this process, to ensure that the data is representative of current time-series gene expression experiments.

C. Reverse engineering

  • Equipped with four training or learning data sets for each of the three GRN reference systems, the authors reverse engineered four individual GRN models for each system from the corresponding training data sets.
  • These fixed weights are not subject to the optimization procedure (Step 3).
  • The algorithm then proceeds to optimize or estimate the remaining ANN weights { wij ≠ 0 } only for genes i and j that are known to interact in the underlying GRN system, plus the parameters { di } and { ki }.
  • The authors used Copasi [27] to code and represent the GRN models and implement the reverseengineer algorithm described above.
  • The comparison Step 5 of the reverse engineering algorithm calculates the deviation between each corresponding time-course in the learning data set L and the predicted or simulated timecourse data .

D. Model validation

  • The training error is an indicator of how well the reverseengineered model can replicate the data from which it was constructed.
  • A robust measure to assess how well the reverse-engineered model has captured the characteristics of the underlying system needs to determine the prediction error on unseen data.
  • Because experiments generating gene expression time-course data are costly, the validation on independent data is frequently not reported in the literature.
  • For each of the three GRN systems investigated in this study, have the authors use two independent validation data sets (V1, V2) to estimate the generalization error, and hence the 116 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology robustness, of their models, by averaging the total RMSE of the models on the two validation data sets.

E. Combined modeling algorithm

  • The steps described above explain the procedures the authors adopted to reverse-engineer and validate 3 times 4 individual models, 1 model for each of the 4 learning data sets for each of the 3 GRN systems depicted in Fig.
  • The training and validation errors the authors obtained are shown in TABLE III.
  • The purpose of this study is to demonstrate that by using multi-condition experimental data, it is possible to generate more robust GRN models.
  • The main steps of this algorithm are described below.
  • The authors illustrate this based on the 3-gene Vohradsky GRN system.

2. Formulation of combined model

  • Essentially, the first step of this combined-data GRN modeling algorithm creates a combined learning data set Λ with nL × u genes or gene expression time-series.
  • (Notice, is not yet their final combined GRN model MΛ).
  • The combined model does not allow gene influences across the boundaries of the individual data sets; genes can only influence each other within the same individual training sets.

3. Reverse-engineering of combined-data model

  • With the formulation of the combined-data model in place, the reverse-engineering algorithm described above is applied to determine/estimate the model’s parameters.
  • The advantage is that now the authors have more data to estimate the model parameters and hence are likely to produce more reliable estimates.

4. Creation of final combined model

  • The combined-data model specifies nL times the number of model equations than the final combined GRN model MΛ.
  • The RMSE is shown for each reverse-engineered model against both the learning and validation data sets for 600 and 20 time points, respectively.
  • The observed results indicate the data combination approach to reverse engineering results in more robust GRN models than reverse engineering models from single-condition data sets.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Reverse engineering of gene regulation models from
multi-condition experiments
Noel Kennedy, Alexandru Mizeranschi,
Paul Thompson, Huiru Zheng, Werner Dubitzky
University of Ulster, Northern Ireland, UK
AbstractReverse-engineering of quantitative, dynamic gene-
regulatory network (GRN) models from time-series gene
expression data is becoming important as such data are
increasingly generated for research and other purposes. A key
problem in the reverse-engineering process is the under-
determined nature of these data. Because of this, the reverse-
engineered GRN models often lack robustness and perform
poorly when used to simulate system responses to new
conditions. In this study, we present a novel method capable of
inferring robust GRN models from multi-condition GRN
experiments. This study uses two important computational
intelligence methods: artificial neural networks and particle
swarm optimization.
Keywords—Gene regulatory networks; reverse-engineering;
machine learning; multi-model fusion; optimization
I. I
NTRODUCTION
Regulation of gene expression (or gene regulation) refers to
processes that cells use to create functional gene products
(RNA, proteins) from the information stored in genes (DNA).
Gene regulation is essential for life as it increases the
versatility and adaptability of an organism by allowing it to
express protein when needed. While aspects of gene
regulation are well understood, many open research questions
still remain [1]. The dynamic behavior and regulatory
interactions of genes can be revealed by time-series
experiments, that is, experiments that measure the expression
of multiple genes over time [2]. In contrast to static gene
expression data, the modeling and simulation approach allows
the determination of stable states in response to a condition or
stimulus as well as the identification of pathways and
networks that are activated in the process [3]. A range of
mathematical methods facilitating the reverse-engineering of
quantitative, dynamic gene-regulatory network (GRN) models
from time-series gene expression data have been reported in
the literature [4]. Typical methods based on differential
equations include the S-system (SS), artificial neural networks
(ANN), and general rate law of transcription (GRLOT)
method [5] [6].
One of the issues in reverse-engineering GRN models is the
under-determined nature of the problem [7]. Essentially, this
means that for the given data and the differential equations
specifying the model, there is no unique solution to these
equations. A consequence of this is that models derived from
such data lack robustness. Thus, the predictive accuracy on
unseen data sets is often poor. Various approaches have been
employed to address this issue [8] [9]. The fact that the data is
normally noisy and that the reverse-engineering process
involves a non-deterministic element (optimization) is also a
factor that influences robustness, but it is not as fundamental
as the lack of complete information.
In the present study, we investigate the reverse engineering
of robust GRN models based on repeated measures from the
same GRN system under different conditions. We refer to
experiments that generate data in this way as multi-condition
experiments. The principal idea is that when the same GRN
system is subject to different (non-destructive) conditions or
stimuli, it will display a range of responses that together are
more characteristic for the underlying system properties than
a single response to a single stimulus. This concept is similar
to data fusion, which is a process that integrates multiple
sources of information representing the same real-world entity
into a consistent and accurate model of that entity [10].
However, whereas in conventional data fusion the merging of
information is normally achieved on the data level by a
straightforward join operation based on a common attribute or
key, combining time-series data is not that simple.
In this paper, we present a novel approach to exploit data
obtained from repeated GRN time-series measures under
varying conditions to infer robust GRN models. We
demonstrate the usefulness of this approach by comparing the
resultant multi-condition GRN models with the individual
single-condition models. At present it is still relatively costly
and time consuming to perform multi-condition experiments,
hence we base our study on artificial GRN time-series data
sets. While this is a limitation in the present study, we believe
that the results we obtained are still valid when applied to data
derived from real GRN systems. It is likely that in the future
the costs for multi-condition experiments will be lower and
that multi-condition experiments are therefore expected to
become commonplace.
II. R
EVERSE ENGINEERING AND DATA COMBINATION
Many reverse-engineering methods have been proposed in
recent years. Furthermore, different approaches have been
adopted in an attempt to improve the performance of these
methods and create more robust models, improve predictive
performance, and identify regulatory interactions from gene
expression measurements. Swain et al. [5] evaluated three
commonly used approaches which formulate dynamic GRN
models as ordinary differential equations (ODEs). The authors
assessed the ability of the different ODE structures to
112
978-1-4673-5875-0/13/$31.00
c
2013 IEEE

replicate the GRN systems’ regulatory structure and dynamic
gene expression behavior under varying conditions. The study
evaluated three commonly encountered ODE rate law
formulations: SS, ANN and GRLOT. The results suggest that
the ANN and GRLOT methods are superior to the SS method
in their ability to accurately predict network behavior. The
former two methods also produced more accurate network
structures from the underlying data than the SS method.
However, the study did not explore combining data from
multi-condition experiments. Although the research has
thoroughly investigated the rate law formalisms, relying on
sparse data based on only a single experiment may limit a
model’s usefulness in terms of the range of the experimental
conditions that can be successfully predicted by it.
Andrews et al. [11] use an artificial intelligence model
known as Cost Based Abduction (CBA) to generate GRN
models from multiple data sources. They successfully apply
their method to study the pheromone pathway in yeast using
protein-DNA data, protein-protein interaction data and gene
knock-out data. Their approach combines different data types
which represent the same experimental condition. Yeang et al.
[12] also explore the pheromone pathway in yeast by creating
annotated interaction graphs they call physical network
models. They combine multiple data types also. In our
approach we consider only one gene expression data type,
measurement of mRNA abundance, each data set representing
the systems response to a different experimental condition.
Ting and Low [13] compare two approaches, model
combination and data combination when multiple batches of
data are available. The former method uses the available data
sets creating a single model for each then combining the
output. The latter approach creates one model, using the
available data sets to train the model. The model combination
method involves estimating predictive accuracy for a given
instance through k-fold cross validation and creating a single
model from the instances achieving the highest predictive
performance (low error or high accuracy). They conclude that
the model combination approach is stronger when there is
only a small difference in the predictive error rate for each
model. Our current study focuses on data combination
utilizing all available data rather than a voting algorithm for
the creation of a combined model.
Peeling and Tucker [14] present a method for modeling
GRN’s by forming a consensus Bayesian network model from
multiple microarray gene expression data sets. Their study
focuses on qualitative combination of Bayesian networks to
determine the dependency structure between genes. The
method of data combination presented in our study primarily
focuses on reproducing quantitative network dynamics and
predicting behavior. We have identified and discussed the
need to develop this method further to include structure
recovery in this article’s section on future work.
Steele et al. [15] developed a method for transforming
literature-based gene association scores to network prior
probabilities. The Bayesian networks developed from this
methodology therefore benefit from partial a priori knowledge
of regulatory interactions thus simplifying the reverse-
engineering process. Our current study reverse engineers
GRN models using known network topology. The literature-
based method therefore does not benefit the approach
presented in this study but can be considered in future work
concerning structure recovery.
Wang et al. [16] address the idea of combining multiple
time-series microarray data sets by developing a high-level
framework they refer to as the Gene Network Reconstruction
tool. The idea is to combine solutions from each data set into
an overall, consistent solution. The combination therefore
occurs at the model level rather than the data level and the
focus of their study is primarily on network structure.
Chen et al. [17] propose a two-step method for inferring
GRN models from multiple data sets. First, they infer
(optimize) a GRN structure (network topology) from each
data set, which is then combined into a single, final network.
They propose two methods based on computing the statistical
mean and mode for each resulting topology. The second step
consists of an additional optimization process that estimates
the parameters of the combined GRN model after discovering
its structure.
Gupta et al. [18] used multi-objective optimization to
integrate different methods for reverse-engineering. To
illustrate this, they used a combination of linear ODE and
correlation-based methods, using data from time-course and
gene inactivation (knock-out) experiments. The novel aspect
in this approach is the combination of different inference
methods into the same procedure, along with using
heterogeneous sources of input data.
Marbach et al. [19] investigated the way in which ensemble
networks resulting from reverse-engineering experiments
could be used to “vote” the topology of a combined GRN
model. For creating the ensemble network, they used an
evolutionary method called analog genetic encoding. They ran
50 iterations of this procedure, each time retaining the
network with the best fitness as part of the ensemble. Then,
they used ensemble voting to generate a new network,
showing that this network outperforms all initial members of
the ensemble.
Our novel approach is to combine the data of the multi-
condition experiments and reverse engineer a single model
from this. Initially our method is concerned with reproducing
network dynamics, structure recovery is subject to future
research. As we use multiple data sets for training the model,
we hypothesize that the resulting model should be more
robust, more accurate and open to a wider scope of
perturbation and behavior prediction than models generated
from a single data set.
III. A
PPROACH AND
S
TUDY
D
ESIGN
The approach in this investigation assumes that the
biological GRN system, S, under study is provoked with a set
of n distinct stimuli or conditions C
1
C
2
C
n
to elicit
the corresponding dynamic gene-regulatory responses
R
1
, R
2
, … , R
n
. In this study, a condition C
i
is defined as a set
of data values specifying the initial perturbation of the system
variables, representing gene expression quantities.
2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 113

܀
ۏ
ێ
ێ
ێ
ۍ
ݎ
௜ଵଵ
ڭ
ڮݎ
௜௝ଵ
ڮ
ڭ
ݎ
௜௨ଵ
ڭ
ݎ
௜ଵ௞
ڭ
ڮݎ
௜௝௞
ڮ
ڭ
ݎ
௜௨௞
ڭ
ݎ
௜ଵ௠
ڮݎ
௜௝௠
ڮ
ݎ
௜௨௠
ے
ۑ
ۑ
ۑ
ې
Each response R
i
represents a set of u time-series of transcript
concentrations/abundances, where u denotes the number of
genes in S, and m the number of time points sampled for each
gene, and r
ijk
denotes a single measurement of gene j at time
point k.
For simplicity, we assume the same number of
measurements m for each gene in the considered time interval.
In general, this is not necessary, though. Conceptually, we
distinguish two types of scenarios: a concrete and an abstract
GRN system.
Concrete GRN system: The reverse-engineered GRN
model represents a single concrete GRN system derived from a
single individual, e.g. the cell cycle gene regulation network of
a particular mouse cell. In this case, the multiple conditions are
applied at time points T
1
< T
2
< … < T
n
, and the time interval
ΔT = T
i+1
T
i
between consecutive conditions is chosen large
enough for the system to fully “recover” from the provocation
with condition C
i
. This implies that the condition has no
lasting effect on the system (e.g. does not destroy the system).
Abstract GRN system: The reverse-engineered GRN
model represents a single abstract GRN system derived from a
collection of individuals that are assumed to be similar in some
important aspects. For example, the cell cycle gene regulation
networks of eight mouse cells (from one or more mice). In this
case, there is no restriction on the conditions applied and all
conditions may be applied in parallel.
Once the response data has been obtained, the process is
identical for reverse-engineering GRN models representing
concrete and abstract GRN systems. Based on the n response
data sets from n experiments (each applying a different
stimulus), we
1. Randomly determine n
L
training or learning data sets L
and n
V
validation data sets V, such that n = n
L
+ n
V
(typically: n
L
> n
V
):
ܮൌܮ
ǡܮ
ǡǥǡܮ
ܸൌܸ
ǡܸ
ǡǥǡܸ
2. Generate a combined training or learning data set Λ.
3. Reverse-engineer from each of the n
L
training data sets a
GRN model ܯ
ǡܯ
ǡǥǡܯ
.
4. Reverse-engineer the final model M
Λ
from the combined
learning data set
Λ
.
5. Validate (determine average accuracy or error) the models
generated in Steps (3) and (4) against the validation data
sets V.
Fig. 1. Illustration of the process for reverse-engineering a GRN model from
multi-condition experimental data (here with six conditions corresponding to
four training or learning data sets and two validation data sets).
Fig. 1 depicts the basic study design we adopted to explore
and evaluate our approach. In this particular case we used
n
L
= 4 and n
V
= 2. In the diagram, the notation M
i
(V
j
) denotes
the data set created by simulating model M
i
with the initial
condition from data set V
j
.
A. GRN Modeling
We use the term GRN systems to refer to gene-regulatory
networks that describe regulatory gene-gene interactions
without explicit representation of intermediary elements such
as metabolites, nuclear receptors and transcription factors,
which combine to direct and catalyze the reactions between
genes [20]. The simulated data generated from the reference
models represents measured mRNA abundance over time.
Under this modeling assumption, one gene can either activate
or repress another gene directly or indirectly (via other genes).
Fig. 2 shows the network topology of the three GRN systems
investigated in this study. System A describes a simple 3-gene
[21], and B a 5-gene GRN system [22]. Model C describes a 7-
gene GRN system based on the bile acid and xenobiotic
system (BAXS) [23]. TABLE I. maps the nodes in the BAXS
network to their corresponding genes and gene products.
TABLE I. E
LEMENTS OF
BAXS
GRN
NETWORK
Node Gene Gene product
X
1
NR0B2 SHP1
X
2
NR1I2 PXR
X
3
NR1H4 FXR
X
4
ABCB1 MDR1
X
5
ABCC2 MRP2
X
6
ABCB11 BSEP
X
7
CYP3A4 CYP3A4
The BAXS describes a genetic network that facilitates two
distinct but intimately overlapping physiological processes;
the enterohepatic circulation and maintenance of bile acid
114 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Legend: Transcriptional regulation (activation)
Transcriptional regulation (repression)
Fig. 2. The three gene regulatory networks or GRN system investigated in this study
concentrations and the detoxification and removal from the
body of harmful xenobiotic, e.g. drugs, pesticides, and
endobiotic compounds such as steroid hormones [24]. The
model describes a simple catabolic pathway which is induced
by the presence of a functional intermediate acting as an
inducer of the network. Such an inducer could be either an
endogenous or exogenous substance e.g. lithocholic acid
(LCA), a secondary bile acid which activates transcription of
both NR1I2 and NR1H4 [25] leading to activation of
CYP3A4.This results in the production of enzymes which
metabolize the inducer in this network e.g. LCA [26], thus
switching off the network after a period of time. This is
represented in the model as repression of NR1I2 and NR1H4
by CYP3A4 as the CYP3A4 enzyme metabolizes the inducer
therefore there is no further activation of these genes.
B. Model equation and data generation
Common rate laws to model the reaction kinetics
(regulatory interactions) of GRN systems include the s-
system, Hill functions, mass action kinetics, general rate law
of transcription, and artificial neural network formulations
[4]. Because of its flexibility and advantageous properties [5],
the models in this study are based on the ANN formalism
[21]. Equation (1) defines the ANN-based rate of change
X
i
/dt of transcript X
i
of gene i within a GRN system of u
genes.
݀ܺ
݀ݐ
ݒ
ͳ ݁ݔ݌ൣെ൫
σ
ݓ
௜௝
௝ୀଵ
ܺ
൅݀
൯൧
െ݇
ܺ
;ϭͿ
where
u defines the number of genes in the GRN system to be
modeled, i = 1, …, u.
v
i
denotes the maximal expression rate of gene i.
X
j
denotes the gene product of gene j influencing the product,
X
i
, of gene i, with: j = 1, …, u.
w
ij
denotes the strength of control or regulation of gene j on
gene i. Positive values indicate activating, negative values
repressing control.
d
i
defines an external influence on gene i, which modulates
the gene’s sensitivity of response to activating or
repressing influences. The higher | d
i
|, the lower the
influence of the weights w
ij
on gene i. In GRN modeling,
d
i
is sometimes interpreted as reaction delay parameter, as
it shifts the sigmoidal transfer function along the
horizontal time axis, thus determining how fast the gene’s
expression level responds.
k
i
denotes the degradation rate constant of the i-th gene
expression product.
Equation (1) defines a rate law capable of describing the
dynamic behavior of GRN systems. The ANN rate law
represents and calculates expression rates based on the
weighted sum of multiple regulatory inputs. This additive
input processing is able to represent logical disjunctions. The
expression rate is restricted to a certain interval where the
sigmoidal transfer function maps the regulatory input to the
expression interval. The external input, d
i
, regulates the
sensitivity to the summed regulatory input of all genes.
For each of the three GRN systems depicted in Fig. 2, a
single GRN reference model was manually created on the
basis of the ANN rate law defined in Equation (1). Each GRN
reference model serves as a surrogate for the corresponding
biological GRN system to facilitate that generation of
artificial dynamic gene expression data. The parameters for
each reference model were determined manually through a
process of trial and error. Each model was simulated with
different parameter configurations multiple times using
different initial conditions. After visual inspection of the
generated data, the parameter values were updated and the
reference models simulated again. This experimentation
continued until plausible dynamics were identified. The
criteria we applied to determine plausible dynamics were that
a steady state was reached during the simulation, that no
measurement increased infinitely nor all measured expression
levels stabilized at zero.
We then used the three GRN reference models to create six
time-series gene expression data sets based on different
experimental conditions (initial values of transcript
abundance). Visual inspection was again applied until six
different system behaviors that were both plausible and
sufficiently different were identified. The reference models
were simulated over 600 sampling time points, each simulated
Δt representing 1 second.
2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 115

Through this process, we generated 6 time-series data sets
for each of the three GRN systems representing the response
of the systems to varying experimental conditions. From each
of the three groups of six data sets, we randomly selected four
as training or learning sets, and two as validation data sets.
This configuration of data sets was viewed as a reasonable
compromise: (a) on one hand, we required a number of
training data sets under different conditions to be able to
capture the underlying intricacies of the system under
investigation, (b) on the other hand, we did not want to rely
only on a single data set as independent validation data set.
Current time-series gene expression experiments typically
sample somewhere between 10 and 30 time points, in rare
cases several dozen time points are measured. After our initial
exploration with 600 sampling points, we repeated the
generation of 3 times 6 data sets, sampling 20 time points per
gene over the explored total time interval. This sampling
frequency provides more realistic data sets which are in line
with current protocols for current time-series gene expression
experiments.
By simulating the reference models over 600 sampling time
points and then over 20 sampling time points, we can
compare the accuracy of models reverse-engineered from
both detailed and sparse data sets. This approach allows us to
determine if the data combination method investigated in this
work is sensitive to the amount of available data. Although
these data sets have been created artificially, a great deal of
thought went into this process, to ensure that the data is
representative of current time-series gene expression
experiments. We expect that the sampling frequencies and the
number of conditions with which real GRN systems are being
probed will continue to grow in the future.
C. Reverse engineering
Equipped with four training or learning data sets for each of
the three GRN reference systems, we reverse engineered four
individual GRN models for each system from the
corresponding training data sets. Each reverse-engineering
process is essentially a parameter estimation or optimization
process applying the following algorithm.
Given a learning data sets L and the GRN model parameters
M = ( {
v
i
}, {
w
ij
}, {
d
i
}, {
k
i
} ), where i,
j
1, …, u (the
number of genes in the model)
1. Set and fix topology (weight) parameters to zero for
gene pairs that do not interact:
(
i,
j {1,
u} ) ( w
ij
= 0 | no_interaction(
i,
j
) ).
2. Initialize remaining parameters. M initialize.
3. Modify non-fixed model parameters using particle
swarm optimization: M PSO.
4. Use M to simulate time course data set, Ǔ, for all genes
based on the initial values (condition) of L:
Ǔ simulate(L)
5. Compare simulated data Ǔ with the learning data L:
error compare(Ǔ,L). IF error is not sufficiently small
and maximum iterations are not reached, GO TO Step 3,
otherwise finish learning and GO TO Step 6.
6. Store M as GRN model.
Notice, in Step 1 of the reverse-engineering algorithm
described above, the ANN weights are fixed in such a way
that for genes that are known not to interact, the weight is set
to zero. These fixed weights are not subject to the
optimization procedure (Step 3). The algorithm then proceeds
to optimize or estimate the remaining ANN weights {
w
ij
0
}
only for genes i and j that are known to interact in the
underlying GRN system, plus the parameters {
d
i
} and {
k
i
}.
In other words, we assume that the interaction topology or
network is known. This is a simplification to limit the
computational complexity. In a future implementation of this
algorithm, we will explore problems without this constraint.
For each learning data set, which represents a different
condition on the same concrete GRN system, a GRN model
was reverse-engineered. We used Copasi [27] to code and
represent the GRN models and implement the reverse-
engineer algorithm described above. To realize the parameter
estimation step of the algorithm, we used Copasi’s
implementation of the particle swarm optimization (PSO)
method [28]. Copasi also provides numerical integration
methods needed in Step 4 of the algorithm, in which the time-
course is predicted or simulated based on the initial values
specified in the learning data set L. For deterministic
solutions, the LSODA integrator is used [29]. The comparison
Step 5 of the reverse engineering algorithm calculates the
deviation (error) between each corresponding time-course in
the learning data set L and the predicted or simulated time-
course data Ǔ. To calculate the error, we applied the
commonly used root mean squared error (RMSE) measure.
Since our models consist of u genes, we essentially calculate
the total RMSE by dividing the sum of all individual RMSEs
by u.
The integration (simulation) Step 4 and the comparison
Step 5 in the reverse-engineering algorithm is also relevant to
the model validation stage, which comes after reverse-
engineering has been completed.
D. Model validation
The training error is an indicator of how well the reverse-
engineered model can replicate (simulate) the data from
which it was constructed. A robust measure to assess how
well the reverse-engineered model has captured the
characteristics of the underlying system needs to determine
the prediction error on unseen data. Because experiments
generating gene expression time-course data are costly, the
validation on independent data is frequently not reported in
the literature. For each of the three GRN systems investigated
in this study, have we use two independent validation data
sets (V
1,
V
2
) to estimate the generalization error, and hence the
116 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Citations
More filters
Journal ArticleDOI
TL;DR: The results indicate that accommodating some parts of the multiscale computation on cloud resources can lead to low performance without a proper adjustment of CPUs power and workload, but by enforcing a load-balancing strategy one can benefit from the extra Cloud resources.

38 citations

Journal ArticleDOI
TL;DR: Two different structure-related approaches to infer GRN are rendered which are global structure approach and substructure approach.

22 citations


Cites background or methods from "Reverse engineering of gene regulat..."

  • ...Numerous noted applications of ANN and associated soft computing methods to reconstruct GRN (Weaver et al. 1999; Wahde and Hertz 2000; Vohradsky 2001; Xu and Wunsch 2005; Xu et al. 2007a; Benuskova and Kasabov 2008; Maraziotis et al. 2010; Sı̂rbu and Ruskin 2010; Kentzoglanakis and Poole 2012; Kennedy et al. 2013; Yang et al. 2013; Maetschke and Ragan 2014; Biswas and Acharyya 2014c; Rubiolo et al. 2015; Cussat-Blanc et al. 2015; Mandal et al. 2015) on the timescale of last 16 years well justify the necessity of a review of the topic....

    [...]

  • ...Kennedy et al. (2013) proposed a novel approach to reconstruct robust GRN from multi experimental gene expression data using ANN and PSO....

    [...]

  • ...…2001; Xu and Wunsch 2005; Xu et al. 2007a; Benuskova and Kasabov 2008; Maraziotis et al. 2010; Sı̂rbu and Ruskin 2010; Kentzoglanakis and Poole 2012; Kennedy et al. 2013; Yang et al. 2013; Maetschke and Ragan 2014; Biswas and Acharyya 2014c; Rubiolo et al. 2015; Cussat-Blanc et al. 2015; Mandal…...

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that restricting the limits to the [−1, +1] interval is sufficient to represent the essential features of GRN systems and offers a reduction of the search space without loss of quality in the resulting models.
Abstract: Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical formalisms for representing such models capture two aspects simultaneously within a single parameter: (1) Whether or not a gene is regulated, and if so, the type of regulator (activator or repressor), and (2) the strength of influence of the regulator (if any) on the target or effector gene. To accommodate both roles, "generous" boundaries or limits for possible values of this parameter are commonly allowed in the reverse-engineering process. This approach has several important drawbacks. First, in the absence of good guidelines, there is no consensus on what limits are reasonable. Second, because the limits may vary greatly among different reverse-engineering experiments, the concrete values obtained for the models may differ considerably, and thus it is difficult to compare models. Third, if high values are chosen as limits, the search space of the model inference process becomes very large, adding unnecessary computational load to the already complex reverse-engineering process. In this study, we demonstrate that restricting the limits to the [−1, +1] interval is sufficient to represent the essential features of GRN systems and offers a reduction of the search space without loss of quality in the resulting models. To show this, we have carried out reverse-engineering studies on data generated from artificial and experimentally determined from real GRN systems.

9 citations

Journal ArticleDOI
01 Oct 2016
TL;DR: A distributed computing framework and system for reverse-engineering of dynamic mechanistic GRN models from gene expression time-course data is developed called MultiGrain/MAPPER, based on a new architecture and tools supporting multiscale computing in a distributed computing environment.
Abstract: Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern systems biology investigations into mechanisms underlying gene regulation. A key task in this area is the automated inference or reverse-engineering of dynamic mechanistic GRN models from gene expression time-course data. Besides a lack of suitable data (in particular multi-condition data from the same system), one of the key challenges of this task is the computational complexity involved. The more genes in the GRN system and the more parameters a GRN model has, the higher the computational load. The computational challenge is likely to increase substantially in the near future when we tackle larger GRN systems. The goal of this study was to develop a distributed computing framework and system for reverse-engineering of GRN models. We present the resulting software called MultiGrain/MAPPER. This software is based on a new architecture and tools supporting multiscale computing in a distributed computing environment. A key feature of MultiGrain/MAPPER is the realization of GRN reverse-engineering based on the underlying distributed computing framework and multi-swarm particle swarm optimization. We demonstrate some of the features of MultiGrain/MAPPER and evaluate its performance using both real and artificial gene expression data.

7 citations


Cites background from "Reverse engineering of gene regulat..."

  • ...Automated reverse-engineering of dynamic mechanistic GRN models from gene-expression time-series data is becoming an area of growing interest in systems biology research [7, 8, 9, 10, 11]....

    [...]

  • ...(1) While the number of sampling points is important (typically, 10 to 50 time points are measured), far more important is to have multiple stimulus-response datasets from the same system under different stimuli [7]....

    [...]

References
More filters
Proceedings ArticleDOI
04 Oct 1995
TL;DR: The optimization of nonlinear functions using particle swarm methodology is described and implementations of two paradigms are discussed and compared, including a recently developed locally oriented paradigm.
Abstract: The optimization of nonlinear functions using particle swarm methodology is described. Implementations of two paradigms are discussed and compared, including a recently developed locally oriented paradigm. Benchmark testing of both paradigms is described, and applications, including neural network training and robot task learning, are proposed. Relationships between particle swarm optimization and both artificial life and evolutionary computation are reviewed.

14,477 citations

Journal ArticleDOI
01 Jan 1997
TL;DR: This paper provides a tutorial on data fusion, introducing data fusion applications, process models, and identification of applicable techniques.
Abstract: Multisensor data fusion is an emerging technology applied to Department of Defense (DoD) areas such as automated target recognition, battlefield surveillance, and guidance and control of autonomous vehicles, and to non-DoD applications such as monitoring of complex machinery, medical diagnosis, and smart buildings. Techniques for multisensor data fusion are drawn from a wide range of areas including artificial intelligence, pattern recognition, statistical estimation and other areas. This paper provides a tutorial on data fusion, introducing data fusion applications, process models, and identification of applicable techniques. Comments are made on the state-of-the-art in data fusion.

2,356 citations

Journal ArticleDOI
TL;DR: COPASI is presented, a platform-independent and user-friendly biochemical simulator that offers several unique features, and numerical issues with these features are discussed; in particular, the criteria to switch between stochastic and deterministic simulation methods, hybrid deterministic-stochastic methods, and the importance of random number generator numerical resolution in Stochastic simulation.
Abstract: Motivation: Simulation and modeling is becoming a standard approach to understand complex biochemical processes. Therefore, there is a big need for software tools that allow access to diverse simulation and modeling methods as well as support for the usage of these methods. Results: Here, we present COPASI, a platform-independent and user-friendly biochemical simulator that offers several unique features. We discuss numerical issues with these features; in particular, the criteria to switch between stochastic and deterministic simulation methods, hybrid deterministic--stochastic methods, and the importance of random number generator numerical resolution in stochastic simulation. Availability: The complete software is available in binary (executable) for MS Windows, OS X, Linux (Intel) and Sun Solaris (SPARC), as well as the full source code under an open source license from http://www.copasi.org. Contact: mendes@vbi.vt.edu

2,351 citations

Journal ArticleDOI
TL;DR: Gene regulatory networks have an important role in every process of life, including cell differentiation, metabolism, the cell cycle and signal transduction, and by understanding the dynamics of these networks the authors can shed light on the mechanisms of diseases that occur when these cellular processes are dysregulated.
Abstract: Gene regulatory networks have an important role in every process of life, including cell differentiation, metabolism, the cell cycle and signal transduction. By understanding the dynamics of these networks we can shed light on the mechanisms of diseases that occur when these cellular processes are dysregulated. Accurate prediction of the behaviour of regulatory networks will also speed up biotechnological projects, as such predictions are quicker and cheaper than lab experiments. Computational methods, both for supporting the development of network models and for the analysis of their functionality, have already proved to be a valuable research tool.

1,128 citations


"Reverse engineering of gene regulat..." refers methods in this paper

  • ...Because of its flexibility and advantageous properties [5], the models in this study are based on the ANN formalism [21]....

    [...]

Journal ArticleDOI
TL;DR: Test results indicate that many problems can be solved more efficiently using this scheme than with a single class of methods, and that the overhead of choosing the most efficient methods is relatively small.
Abstract: This paper describes a scheme for automatically determining whether a problem can be solved more efficiently using a class of methods suited for nonstiff problems or a class of methods designed for stiff problems. The technique uses information that is available at the end of each step in the integration for making the decision between the two types of methods. If a problem changes character in the interval of integration, the solver automatically switches to the class of methods which is likely to be most efficient for that part of the problem. Test results, using a modified version of the LSODE package, indicate that many problems can be solved more efficiently using this scheme than with a single class of methods, and that the overhead of choosing the most efficient methods is relatively small.

889 citations

Frequently Asked Questions (16)
Q1. What have the authors contributed in "Reverse engineering of gene regulation models from multi-condition experiments" ?

In this study, the authors present a novel method capable of inferring robust GRN models from multi-condition GRN experiments. This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization. 

The expression rate is restricted to a certain interval where the sigmoidal transfer function maps the regulatory input to the expression interval. 

The criteria the authors applied to determine plausible dynamics were that a steady state was reached during the simulation, that no measurement increased infinitely nor all measured expression levels stabilized at zero. 

Because experiments generating gene expression time-course data are costly, the validation on independent data is frequently not reported in the literature. 

The purpose of this study is to demonstrate that by using multi-condition experimental data, it is possible to generate more robust GRN models. 

To realize the parameter estimation step of the algorithm, the authors used Copasi’s implementation of the particle swarm optimization (PSO) method [28]. 

Common rate laws to model the reaction kinetics (regulatory interactions) of GRN systems include the ssystem, Hill functions, mass action kinetics, general rate law of transcription, and artificial neural network formulations [4]. 

Equipped with four training or learning data sets for each of the three GRN reference systems, the authors reverse engineered four individual GRN models for each system from the corresponding training data sets. 

The authors expect that the sampling frequencies and the number of conditions with which real GRN systems are being probed will continue to grow in the future. 

ACKNOWLEDGEMENTSThis work received funding from the EC's Seventh Framework Program (FP7/2007-2013) under grant agreement n° RI261507 and also from the Department for Employment and Learning, Northern Ireland. 

This suggests that multi-condition experimental data (as described in this article) can be successfully used to produce accurate GRN models and as the authors have validated against two unseen data sets this indicates a measure of confidence in the models robustness. 

The model describes a simple catabolic pathway which is induced by the presence of a functional intermediate acting as an inducer of the network. 

The advantage is that now the authors have more data to estimate the model parameters and hence are likely to produce more reliable estimates. 

Other general rate laws such as the Hill equation [30], GRLOT [6], SS [31] and general mass action will be assessed with the data combination method to determine which is more accurate at inferring network structure, predicting dynamic behavior or both. 

From this observation the authors are confident that the data combination method is not dependent on detailed data and can be applied successfully to sparse data sets without any loss of performance. 

The integration (simulation) Step 4 and the comparison Step 5 in the reverse-engineering algorithm is also relevant to the model validation stage, which comes after reverseengineering has been completed.