Proceedings Article•DOI•

Reverse engineering of gene regulation models from multi-condition experiments

Noel Kennedy¹, Alexandru Mizeranschi¹, Paul Thompson¹, Huiru Zheng¹, Werner Dubitzky¹ - Show less +1 more•Institutions (1)

Ulster University¹

16 Apr 2013-pp 112-119

TL;DR: This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization to present a novel method capable of inferring robust GRN models from multi-condition GRN experiments.

read less

Abstract: Reverse-engineering of quantitative, dynamic gene-regulatory network (GRN) models from time-series gene expression data is becoming important as such data are increasingly generated for research and other purposes. A key problem in the reverse-engineering process is the under-determined nature of these data. Because of this, the reverse-engineered GRN models often lack robustness and perform poorly when used to simulate system responses to new conditions. In this study, we present a novel method capable of inferring robust GRN models from multi-condition GRN experiments. This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [B. Model equation and data generation] – [C. Reverse engineering] – [D. Model validation] – [E. Combined modeling algorithm] – [2. Formulation of combined model] – [3. Reverse-engineering of combined-data model] and [4. Creation of final combined model]

Introduction

A range of mathematical methods facilitating the reverse-engineering of quantitative, dynamic gene-regulatory network (GRN) models from time-series gene expression data have been reported in the literature [4].
In the present study, the authors investigate the reverse engineering of robust GRN models based on repeated measures from the same GRN system under different conditions.
Their approach combines different data types which represent the same experimental condition.
The authors current study focuses on data combination utilizing all available data rather than a voting algorithm for the creation of a combined model.

B. Model equation and data generation

Common rate laws to model the reaction kinetics (regulatory interactions) of GRN systems include the ssystem, Hill functions, mass action kinetics, general rate law of transcription, and artificial neural network formulations [4].
Equation (1) defines the ANN-based rate of change Xi /dt of transcript Xi of gene i within a GRN system of u genes.
For each of the three GRN systems depicted in Fig. 2, a single GRN reference model was manually created on the basis of the ANN rate law defined in Equation (1).
Each model was simulated with different parameter configurations multiple times using different initial conditions.
A great deal of thought went into this process, to ensure that the data is representative of current time-series gene expression experiments.

C. Reverse engineering

Equipped with four training or learning data sets for each of the three GRN reference systems, the authors reverse engineered four individual GRN models for each system from the corresponding training data sets.
These fixed weights are not subject to the optimization procedure (Step 3).
The algorithm then proceeds to optimize or estimate the remaining ANN weights { wij ≠ 0 } only for genes i and j that are known to interact in the underlying GRN system, plus the parameters { di } and { ki }.
The authors used Copasi [27] to code and represent the GRN models and implement the reverseengineer algorithm described above.
The comparison Step 5 of the reverse engineering algorithm calculates the deviation between each corresponding time-course in the learning data set L and the predicted or simulated timecourse data .

D. Model validation

The training error is an indicator of how well the reverseengineered model can replicate the data from which it was constructed.
A robust measure to assess how well the reverse-engineered model has captured the characteristics of the underlying system needs to determine the prediction error on unseen data.
Because experiments generating gene expression time-course data are costly, the validation on independent data is frequently not reported in the literature.
For each of the three GRN systems investigated in this study, have the authors use two independent validation data sets (V1, V2) to estimate the generalization error, and hence the 116 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology robustness, of their models, by averaging the total RMSE of the models on the two validation data sets.

E. Combined modeling algorithm

The steps described above explain the procedures the authors adopted to reverse-engineer and validate 3 times 4 individual models, 1 model for each of the 4 learning data sets for each of the 3 GRN systems depicted in Fig.
The training and validation errors the authors obtained are shown in TABLE III.
The purpose of this study is to demonstrate that by using multi-condition experimental data, it is possible to generate more robust GRN models.
The main steps of this algorithm are described below.
The authors illustrate this based on the 3-gene Vohradsky GRN system.

2. Formulation of combined model

Essentially, the first step of this combined-data GRN modeling algorithm creates a combined learning data set Λ with nL × u genes or gene expression time-series.
(Notice, is not yet their final combined GRN model MΛ).
The combined model does not allow gene influences across the boundaries of the individual data sets; genes can only influence each other within the same individual training sets.

3. Reverse-engineering of combined-data model

With the formulation of the combined-data model in place, the reverse-engineering algorithm described above is applied to determine/estimate the model’s parameters.
The advantage is that now the authors have more data to estimate the model parameters and hence are likely to produce more reliable estimates.

4. Creation of final combined model

The combined-data model specifies nL times the number of model equations than the final combined GRN model MΛ.
The RMSE is shown for each reverse-engineered model against both the learning and validation data sets for 600 and 20 time points, respectively.
The observed results indicate the data combination approach to reverse engineering results in more robust GRN models than reverse engineering models from single-condition data sets.

Did you find this useful? Give us your feedback

Figures (4)

Fig. 1. Illustration of the process for reverse-engineering a GRN model from multi-condition experimental data (here with six conditions corresponding to four training or learning data sets and two validation data sets).

TABLE II. EXAMPLE OF COMBINING THE 4 DATA SETS OF THE VOHRADSKY 3 GENE MODEL. ONLY THE FIRST 6 TIMES STEPS ARE SHOWN.

Fig. 2. The three gene regulatory networks or GRN system investigated in this study

Content maybe subject to copyright Report

Reverse engineering of gene regulation models from

multi-condition experiments

Noel Kennedy, Alexandru Mizeranschi,

Paul Thompson, Huiru Zheng, Werner Dubitzky

University of Ulster, Northern Ireland, UK

Abstract—Reverse-engineering of quantitative, dynamic gene-

regulatory network (GRN) models from time-series gene

expression data is becoming important as such data are

increasingly generated for research and other purposes. A key

problem in the reverse-engineering process is the under-

determined nature of these data. Because of this, the reverse-

engineered GRN models often lack robustness and perform

poorly when used to simulate system responses to new

conditions. In this study, we present a novel method capable of

inferring robust GRN models from multi-condition GRN

experiments. This study uses two important computational

intelligence methods: artificial neural networks and particle

swarm optimization.

Keywords—Gene regulatory networks; reverse-engineering;

machine learning; multi-model fusion; optimization

I. I

NTRODUCTION

Regulation of gene expression (or gene regulation) refers to

processes that cells use to create functional gene products

(RNA, proteins) from the information stored in genes (DNA).

Gene regulation is essential for life as it increases the

versatility and adaptability of an organism by allowing it to

express protein when needed. While aspects of gene

regulation are well understood, many open research questions

still remain [1]. The dynamic behavior and regulatory

interactions of genes can be revealed by time-series

experiments, that is, experiments that measure the expression

of multiple genes over time [2]. In contrast to static gene

expression data, the modeling and simulation approach allows

the determination of stable states in response to a condition or

stimulus as well as the identification of pathways and

networks that are activated in the process [3]. A range of

mathematical methods facilitating the reverse-engineering of

quantitative, dynamic gene-regulatory network (GRN) models

from time-series gene expression data have been reported in

the literature [4]. Typical methods based on differential

equations include the S-system (SS), artificial neural networks

(ANN), and general rate law of transcription (GRLOT)

method [5] [6].

One of the issues in reverse-engineering GRN models is the

under-determined nature of the problem [7]. Essentially, this

means that for the given data and the differential equations

specifying the model, there is no unique solution to these

equations. A consequence of this is that models derived from

such data lack robustness. Thus, the predictive accuracy on

unseen data sets is often poor. Various approaches have been

employed to address this issue [8] [9]. The fact that the data is

normally noisy and that the reverse-engineering process

involves a non-deterministic element (optimization) is also a

factor that influences robustness, but it is not as fundamental

as the lack of complete information.

In the present study, we investigate the reverse engineering

of robust GRN models based on repeated measures from the

same GRN system under different conditions. We refer to

experiments that generate data in this way as multi-condition

experiments. The principal idea is that when the same GRN

system is subject to different (non-destructive) conditions or

stimuli, it will display a range of responses that together are

more characteristic for the underlying system properties than

a single response to a single stimulus. This concept is similar

to data fusion, which is a process that integrates multiple

sources of information representing the same real-world entity

into a consistent and accurate model of that entity [10].

However, whereas in conventional data fusion the merging of

information is normally achieved on the data level by a

straightforward join operation based on a common attribute or

key, combining time-series data is not that simple.

In this paper, we present a novel approach to exploit data

obtained from repeated GRN time-series measures under

varying conditions to infer robust GRN models. We

demonstrate the usefulness of this approach by comparing the

resultant multi-condition GRN models with the individual

single-condition models. At present it is still relatively costly

and time consuming to perform multi-condition experiments,

hence we base our study on artificial GRN time-series data

sets. While this is a limitation in the present study, we believe

that the results we obtained are still valid when applied to data

derived from real GRN systems. It is likely that in the future

the costs for multi-condition experiments will be lower and

that multi-condition experiments are therefore expected to

become commonplace.

II. R

EVERSE ENGINEERING AND DATA COMBINATION

Many reverse-engineering methods have been proposed in

recent years. Furthermore, different approaches have been

adopted in an attempt to improve the performance of these

methods and create more robust models, improve predictive

performance, and identify regulatory interactions from gene

expression measurements. Swain et al. [5] evaluated three

commonly used approaches which formulate dynamic GRN

models as ordinary differential equations (ODEs). The authors

assessed the ability of the different ODE structures to

112

978-1-4673-5875-0/13/$31.00

2013 IEEE

replicate the GRN systems’ regulatory structure and dynamic

gene expression behavior under varying conditions. The study

evaluated three commonly encountered ODE rate law

formulations: SS, ANN and GRLOT. The results suggest that

the ANN and GRLOT methods are superior to the SS method

in their ability to accurately predict network behavior. The

former two methods also produced more accurate network

structures from the underlying data than the SS method.

However, the study did not explore combining data from

multi-condition experiments. Although the research has

thoroughly investigated the rate law formalisms, relying on

sparse data based on only a single experiment may limit a

model’s usefulness in terms of the range of the experimental

conditions that can be successfully predicted by it.

Andrews et al. [11] use an artificial intelligence model

known as Cost Based Abduction (CBA) to generate GRN

models from multiple data sources. They successfully apply

their method to study the pheromone pathway in yeast using

protein-DNA data, protein-protein interaction data and gene

knock-out data. Their approach combines different data types

which represent the same experimental condition. Yeang et al.

[12] also explore the pheromone pathway in yeast by creating

annotated interaction graphs they call physical network

models. They combine multiple data types also. In our

approach we consider only one gene expression data type,

measurement of mRNA abundance, each data set representing

the systems response to a different experimental condition.

Ting and Low [13] compare two approaches, model

combination and data combination when multiple batches of

data are available. The former method uses the available data

sets creating a single model for each then combining the

output. The latter approach creates one model, using the

available data sets to train the model. The model combination

method involves estimating predictive accuracy for a given

instance through k-fold cross validation and creating a single

model from the instances achieving the highest predictive

performance (low error or high accuracy). They conclude that

the model combination approach is stronger when there is

only a small difference in the predictive error rate for each

model. Our current study focuses on data combination

utilizing all available data rather than a voting algorithm for

the creation of a combined model.

Peeling and Tucker [14] present a method for modeling

GRN’s by forming a consensus Bayesian network model from

multiple microarray gene expression data sets. Their study

focuses on qualitative combination of Bayesian networks to

determine the dependency structure between genes. The

method of data combination presented in our study primarily

focuses on reproducing quantitative network dynamics and

predicting behavior. We have identified and discussed the

need to develop this method further to include structure

recovery in this article’s section on future work.

Steele et al. [15] developed a method for transforming

literature-based gene association scores to network prior

probabilities. The Bayesian networks developed from this

methodology therefore benefit from partial a priori knowledge

of regulatory interactions thus simplifying the reverse-

engineering process. Our current study reverse engineers

GRN models using known network topology. The literature-

based method therefore does not benefit the approach

presented in this study but can be considered in future work

concerning structure recovery.

Wang et al. [16] address the idea of combining multiple

time-series microarray data sets by developing a high-level

framework they refer to as the Gene Network Reconstruction

tool. The idea is to combine solutions from each data set into

an overall, consistent solution. The combination therefore

occurs at the model level rather than the data level and the

focus of their study is primarily on network structure.

Chen et al. [17] propose a two-step method for inferring

GRN models from multiple data sets. First, they infer

(optimize) a GRN structure (network topology) from each

data set, which is then combined into a single, final network.

They propose two methods based on computing the statistical

mean and mode for each resulting topology. The second step

consists of an additional optimization process that estimates

the parameters of the combined GRN model after discovering

its structure.

Gupta et al. [18] used multi-objective optimization to

integrate different methods for reverse-engineering. To

illustrate this, they used a combination of linear ODE and

correlation-based methods, using data from time-course and

gene inactivation (knock-out) experiments. The novel aspect

in this approach is the combination of different inference

methods into the same procedure, along with using

heterogeneous sources of input data.

Marbach et al. [19] investigated the way in which ensemble

networks resulting from reverse-engineering experiments

could be used to “vote” the topology of a combined GRN

model. For creating the ensemble network, they used an

evolutionary method called analog genetic encoding. They ran

50 iterations of this procedure, each time retaining the

network with the best fitness as part of the ensemble. Then,

they used ensemble voting to generate a new network,

showing that this network outperforms all initial members of

the ensemble.

Our novel approach is to combine the data of the multi-

condition experiments and reverse engineer a single model

from this. Initially our method is concerned with reproducing

network dynamics, structure recovery is subject to future

research. As we use multiple data sets for training the model,

we hypothesize that the resulting model should be more

robust, more accurate and open to a wider scope of

perturbation and behavior prediction than models generated

from a single data set.

III. A

PPROACH AND

TUDY

ESIGN

The approach in this investigation assumes that the

biological GRN system, S, under study is provoked with a set

of n distinct stimuli or conditions C

 C

 …  C

to elicit

the corresponding dynamic gene-regulatory responses

, R

, … , R

. In this study, a condition C

is defined as a set

of data values specifying the initial perturbation of the system

variables, representing gene expression quantities.

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 113

௜

ൌ

௜ଵଵ

ڮݎ

௜௝ଵ

௜௨ଵ

௜ଵ௞

ڮݎ

௜௝௞

௜௨௞

௜ଵ௠

ڮݎ

௜௝௠

௜௨௠

Each response R

represents a set of u time-series of transcript

concentrations/abundances, where u denotes the number of

genes in S, and m the number of time points sampled for each

gene, and r

ijk

denotes a single measurement of gene j at time

point k.

For simplicity, we assume the same number of

measurements m for each gene in the considered time interval.

In general, this is not necessary, though. Conceptually, we

distinguish two types of scenarios: a concrete and an abstract

GRN system.

Concrete GRN system: The reverse-engineered GRN

model represents a single concrete GRN system derived from a

single individual, e.g. the cell cycle gene regulation network of

a particular mouse cell. In this case, the multiple conditions are

applied at time points T

< T

< … < T

, and the time interval

ΔT = T

i+1

− T

between consecutive conditions is chosen large

enough for the system to fully “recover” from the provocation

with condition C

. This implies that the condition has no

lasting effect on the system (e.g. does not destroy the system).

Abstract GRN system: The reverse-engineered GRN

model represents a single abstract GRN system derived from a

collection of individuals that are assumed to be similar in some

important aspects. For example, the cell cycle gene regulation

networks of eight mouse cells (from one or more mice). In this

case, there is no restriction on the conditions applied and all

conditions may be applied in parallel.

Once the response data has been obtained, the process is

identical for reverse-engineering GRN models representing

concrete and abstract GRN systems. Based on the n response

data sets from n experiments (each applying a different

stimulus), we

1. Randomly determine n

training or learning data sets L

and n

validation data sets V, such that n = n

+ n

(typically: n

> n

ܮൌܮ

ଵ

ǡܮ

ଶ

ǡǥǡܮ

௡

ಽ

ܸൌܸ

ଵ

ǡܸ

ଶ

ǡǥǡܸ

௡

ೇ

2. Generate a combined training or learning data set Λ.

3. Reverse-engineer from each of the n

training data sets a

GRN model ܯ

ଵ

ǡܯ

ଶ

ǡǥǡܯ

௡

ಽ

4. Reverse-engineer the final model M

from the combined

learning data set

5. Validate (determine average accuracy or error) the models

generated in Steps (3) and (4) against the validation data

sets V.

Fig. 1. Illustration of the process for reverse-engineering a GRN model from

multi-condition experimental data (here with six conditions corresponding to

four training or learning data sets and two validation data sets).

Fig. 1 depicts the basic study design we adopted to explore

and evaluate our approach. In this particular case we used

= 4 and n

= 2. In the diagram, the notation M

) denotes

the data set created by simulating model M

with the initial

condition from data set V

A. GRN Modeling

We use the term GRN systems to refer to gene-regulatory

networks that describe regulatory gene-gene interactions

without explicit representation of intermediary elements such

as metabolites, nuclear receptors and transcription factors,

which combine to direct and catalyze the reactions between

genes [20]. The simulated data generated from the reference

models represents measured mRNA abundance over time.

Under this modeling assumption, one gene can either activate

or repress another gene directly or indirectly (via other genes).

Fig. 2 shows the network topology of the three GRN systems

investigated in this study. System A describes a simple 3-gene

[21], and B a 5-gene GRN system [22]. Model C describes a 7-

gene GRN system based on the bile acid and xenobiotic

system (BAXS) [23]. TABLE I. maps the nodes in the BAXS

network to their corresponding genes and gene products.

TABLE I. E

LEMENTS OF

BAXS

GRN

NETWORK

Node Gene Gene product

NR0B2 SHP1

NR1I2 PXR

NR1H4 FXR

ABCB1 MDR1

ABCC2 MRP2

ABCB11 BSEP

CYP3A4 CYP3A4

The BAXS describes a genetic network that facilitates two

distinct but intimately overlapping physiological processes;

the enterohepatic circulation and maintenance of bile acid

114 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Legend: Transcriptional regulation (activation)

Transcriptional regulation (repression)

Fig. 2. The three gene regulatory networks or GRN system investigated in this study

concentrations and the detoxification and removal from the

body of harmful xenobiotic, e.g. drugs, pesticides, and

endobiotic compounds such as steroid hormones [24]. The

model describes a simple catabolic pathway which is induced

by the presence of a functional intermediate acting as an

inducer of the network. Such an inducer could be either an

endogenous or exogenous substance e.g. lithocholic acid

(LCA), a secondary bile acid which activates transcription of

both NR1I2 and NR1H4 [25] leading to activation of

CYP3A4.This results in the production of enzymes which

metabolize the inducer in this network e.g. LCA [26], thus

switching off the network after a period of time. This is

represented in the model as repression of NR1I2 and NR1H4

by CYP3A4 as the CYP3A4 enzyme metabolizes the inducer

therefore there is no further activation of these genes.

B. Model equation and data generation

Common rate laws to model the reaction kinetics

(regulatory interactions) of GRN systems include the s-

system, Hill functions, mass action kinetics, general rate law

of transcription, and artificial neural network formulations

[4]. Because of its flexibility and advantageous properties [5],

the models in this study are based on the ANN formalism

[21]. Equation (1) defines the ANN-based rate of change

/dt of transcript X

of gene i within a GRN system of u

genes.

݀ܺ

௜

݀ݐ

ൌ

௜

ͳ ൅ ݁ݔ݌ൣെ൫

௜௝

௨

௝ୀଵ

௝

൅݀

௜

൯൧

െ݇

௜



;ϭͿ

where

u defines the number of genes in the GRN system to be

modeled, i = 1, …, u.

denotes the maximal expression rate of gene i.

denotes the gene product of gene j influencing the product,

, of gene i, with: j = 1, …, u.

denotes the strength of control or regulation of gene j on

gene i. Positive values indicate activating, negative values

repressing control.

defines an external influence on gene i, which modulates

the gene’s sensitivity of response to activating or

repressing influences. The higher | d

|, the lower the

influence of the weights w

on gene i. In GRN modeling,

is sometimes interpreted as reaction delay parameter, as

it shifts the sigmoidal transfer function along the

horizontal time axis, thus determining how fast the gene’s

expression level responds.

denotes the degradation rate constant of the i-th gene

expression product.

Equation (1) defines a rate law capable of describing the

dynamic behavior of GRN systems. The ANN rate law

represents and calculates expression rates based on the

weighted sum of multiple regulatory inputs. This additive

input processing is able to represent logical disjunctions. The

expression rate is restricted to a certain interval where the

sigmoidal transfer function maps the regulatory input to the

expression interval. The external input, d

, regulates the

sensitivity to the summed regulatory input of all genes.

For each of the three GRN systems depicted in Fig. 2, a

single GRN reference model was manually created on the

basis of the ANN rate law defined in Equation (1). Each GRN

reference model serves as a surrogate for the corresponding

biological GRN system to facilitate that generation of

artificial dynamic gene expression data. The parameters for

each reference model were determined manually through a

process of trial and error. Each model was simulated with

different parameter configurations multiple times using

different initial conditions. After visual inspection of the

generated data, the parameter values were updated and the

reference models simulated again. This experimentation

continued until plausible dynamics were identified. The

criteria we applied to determine plausible dynamics were that

a steady state was reached during the simulation, that no

measurement increased infinitely nor all measured expression

levels stabilized at zero.

We then used the three GRN reference models to create six

time-series gene expression data sets based on different

experimental conditions (initial values of transcript

abundance). Visual inspection was again applied until six

different system behaviors that were both plausible and

sufficiently different were identified. The reference models

were simulated over 600 sampling time points, each simulated

Δt representing 1 second.

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 115

Through this process, we generated 6 time-series data sets

for each of the three GRN systems representing the response

of the systems to varying experimental conditions. From each

of the three groups of six data sets, we randomly selected four

as training or learning sets, and two as validation data sets.

This configuration of data sets was viewed as a reasonable

compromise: (a) on one hand, we required a number of

training data sets under different conditions to be able to

capture the underlying intricacies of the system under

investigation, (b) on the other hand, we did not want to rely

only on a single data set as independent validation data set.

Current time-series gene expression experiments typically

sample somewhere between 10 and 30 time points, in rare

cases several dozen time points are measured. After our initial

exploration with 600 sampling points, we repeated the

generation of 3 times 6 data sets, sampling 20 time points per

gene over the explored total time interval. This sampling

frequency provides more realistic data sets which are in line

with current protocols for current time-series gene expression

experiments.

By simulating the reference models over 600 sampling time

points and then over 20 sampling time points, we can

compare the accuracy of models reverse-engineered from

both detailed and sparse data sets. This approach allows us to

determine if the data combination method investigated in this

work is sensitive to the amount of available data. Although

these data sets have been created artificially, a great deal of

thought went into this process, to ensure that the data is

representative of current time-series gene expression

experiments. We expect that the sampling frequencies and the

number of conditions with which real GRN systems are being

probed will continue to grow in the future.

C. Reverse engineering

Equipped with four training or learning data sets for each of

the three GRN reference systems, we reverse engineered four

individual GRN models for each system from the

corresponding training data sets. Each reverse-engineering

process is essentially a parameter estimation or optimization

process applying the following algorithm.

Given a learning data sets L and the GRN model parameters

M = ( {

}, {

} ), where i,

∈

1, …, u (the

number of genes in the model)

1. Set and fix topology (weight) parameters to zero for

gene pairs that do not interact:

(∀

j ∈ {1,

…

u} ) ( w

= 0 | no_interaction(

) ).

2. Initialize remaining parameters. M ← initialize.

3. Modify non-fixed model parameters using particle

swarm optimization: M ← PSO.

4. Use M to simulate time course data set, Ǔ, for all genes

based on the initial values (condition) of L:

Ǔ ← simulate(L)

5. Compare simulated data Ǔ with the learning data L:

error ← compare(Ǔ,L). IF error is not sufficiently small

and maximum iterations are not reached, GO TO Step 3,

otherwise finish learning and GO TO Step 6.

6. Store M as GRN model.

Notice, in Step 1 of the reverse-engineering algorithm

described above, the ANN weights are fixed in such a way

that for genes that are known not to interact, the weight is set

to zero. These fixed weights are not subject to the

optimization procedure (Step 3). The algorithm then proceeds

to optimize or estimate the remaining ANN weights {

≠ 0

}

only for genes i and j that are known to interact in the

underlying GRN system, plus the parameters {

} and {

In other words, we assume that the interaction topology or

network is known. This is a simplification to limit the

computational complexity. In a future implementation of this

algorithm, we will explore problems without this constraint.

For each learning data set, which represents a different

condition on the same concrete GRN system, a GRN model

was reverse-engineered. We used Copasi [27] to code and

represent the GRN models and implement the reverse-

engineer algorithm described above. To realize the parameter

estimation step of the algorithm, we used Copasi’s

implementation of the particle swarm optimization (PSO)

method [28]. Copasi also provides numerical integration

methods needed in Step 4 of the algorithm, in which the time-

course is predicted or simulated based on the initial values

specified in the learning data set L. For deterministic

solutions, the LSODA integrator is used [29]. The comparison

Step 5 of the reverse engineering algorithm calculates the

deviation (error) between each corresponding time-course in

the learning data set L and the predicted or simulated time-

course data Ǔ. To calculate the error, we applied the

commonly used root mean squared error (RMSE) measure.

Since our models consist of u genes, we essentially calculate

the total RMSE by dividing the sum of all individual RMSEs

by u.

The integration (simulation) Step 4 and the comparison

Step 5 in the reverse-engineering algorithm is also relevant to

the model validation stage, which comes after reverse-

engineering has been completed.

D. Model validation

The training error is an indicator of how well the reverse-

engineered model can replicate (simulate) the data from

which it was constructed. A robust measure to assess how

well the reverse-engineered model has captured the

characteristics of the underlying system needs to determine

the prediction error on unseen data. Because experiments

generating gene expression time-course data are costly, the

validation on independent data is frequently not reported in

the literature. For each of the three GRN systems investigated

in this study, have we use two independent validation data

sets (V

) to estimate the generalization error, and hence the

116 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

HTML Viewer

Frequently Asked Questions (16)

Q1. What have the authors contributed in "Reverse engineering of gene regulation models from multi-condition experiments" ?

In this study, the authors present a novel method capable of inferring robust GRN models from multi-condition GRN experiments. This study uses two important computational intelligence methods: artificial neural networks and particle swarm optimization.

Q2. What is the expression rate of a gene?

The expression rate is restricted to a certain interval where the sigmoidal transfer function maps the regulatory input to the expression interval.

Q3. What criteria were used to determine plausible dynamics?

The criteria the authors applied to determine plausible dynamics were that a steady state was reached during the simulation, that no measurement increased infinitely nor all measured expression levels stabilized at zero.

Q4. Why is the validation of the GRN model not reported in the literature?

Because experiments generating gene expression time-course data are costly, the validation on independent data is frequently not reported in the literature.

Q5. What is the purpose of this study?

The purpose of this study is to demonstrate that by using multi-condition experimental data, it is possible to generate more robust GRN models.

Q6. What is the important step in the reverse engineering algorithm?

To realize the parameter estimation step of the algorithm, the authors used Copasi’s implementation of the particle swarm optimization (PSO) method [28].

Q7. What are the common rate laws for the model of gene-regulatory interactions?

Common rate laws to model the reaction kinetics (regulatory interactions) of GRN systems include the ssystem, Hill functions, mass action kinetics, general rate law of transcription, and artificial neural network formulations [4].

Q8. How many GRN models are used in the reverse engineering process?

Equipped with four training or learning data sets for each of the three GRN reference systems, the authors reverse engineered four individual GRN models for each system from the corresponding training data sets.

Q9. What is the expected growth of the GRN system?

The authors expect that the sampling frequencies and the number of conditions with which real GRN systems are being probed will continue to grow in the future.

Q10. What is the funding source for this work?

ACKNOWLEDGEMENTSThis work received funding from the EC's Seventh Framework Program (FP7/2007-2013) under grant agreement n° RI261507 and also from the Department for Employment and Learning, Northern Ireland.

Q11. How many models have been validated against two unseen data sets?

This suggests that multi-condition experimental data (as described in this article) can be successfully used to produce accurate GRN models and as the authors have validated against two unseen data sets this indicates a measure of confidence in the models robustness.

Q12. What is the model of a simple catabolic pathway?

The model describes a simple catabolic pathway which is induced by the presence of a functional intermediate acting as an inducer of the network.

Q13. What is the advantage of the combined data model?

The advantage is that now the authors have more data to estimate the model parameters and hence are likely to produce more reliable estimates.

Q14. What is the accurate method for estimating network structure?

Other general rate laws such as the Hill equation [30], GRLOT [6], SS [31] and general mass action will be assessed with the data combination method to determine which is more accurate at inferring network structure, predicting dynamic behavior or both.

Q15. How can the authors determine the reliability of the data combination method?

From this observation the authors are confident that the data combination method is not dependent on detailed data and can be applied successfully to sparse data sets without any loss of performance.

Q16. What is the purpose of the reverse engineering algorithm?

The integration (simulation) Step 4 and the comparison Step 5 in the reverse-engineering algorithm is also relevant to the model validation stage, which comes after reverseengineering has been completed.

Reverse engineering of gene regulation models from multi-condition experiments

Summary (2 min read)

Introduction

B. Model equation and data generation

C. Reverse engineering

D. Model validation

E. Combined modeling algorithm

2. Formulation of combined model

3. Reverse-engineering of combined-data model

4. Creation of final combined model

Figures (4)

Citations

Cites background or methods from "Reverse engineering of gene regulat..."

Cites background from "Reverse engineering of gene regulat..."

References

"Reverse engineering of gene regulat..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (16)

Q1. What have the authors contributed in "Reverse engineering of gene regulation models from multi-condition experiments" ?

Q2. What is the expression rate of a gene?

Q3. What criteria were used to determine plausible dynamics?

Q4. Why is the validation of the GRN model not reported in the literature?

Q5. What is the purpose of this study?

Q6. What is the important step in the reverse engineering algorithm?

Q7. What are the common rate laws for the model of gene-regulatory interactions?

Q8. How many GRN models are used in the reverse engineering process?

Q9. What is the expected growth of the GRN system?

Q10. What is the funding source for this work?

Q11. How many models have been validated against two unseen data sets?

Q12. What is the model of a simple catabolic pathway?

Q13. What is the advantage of the combined data model?

Q14. What is the accurate method for estimating network structure?

Q15. How can the authors determine the reliability of the data combination method?

Q16. What is the purpose of the reverse engineering algorithm?