scispace - formally typeset
Open AccessJournal ArticleDOI

Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network

Chi Hang Lam, +1 more
- 01 Sep 1998 - 
- Vol. 58, Iss: 3, pp 3673-3677
Reads0
Chats0
TLDR
This study studies a feed-forward neural network for two independent function approximation tasks and demonstrates that the sizes of the modules can be dynamically driven by varying the complexities of the tasks.
Abstract
We study a feed-forward neural network for two independent function approximation tasks. Upon training, two modules are automatically formed in the hidden layers, each handling one of the tasks predominantly. We demonstrate that the sizes of the modules can be dynamically driven by varying the complexities of the tasks. The network serves as a simple example of an artificial neural network with an adaptable modular structure. This study was motivated by related dynamical nature of modules in animal brains.

read more

Content maybe subject to copyright    Report

Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network
Chi-Hang Lam and F. G. Shin
Department of Applied Physics, Hong Kong Polytechnic University, Hung Hom, Hong Kong
~Received 17 April 1998!
We study a feed-forward neural network for two independent function approximation tasks. Upon training,
two modules are automatically formed in the hidden layers, each handling one of the tasks predominantly. We
demonstrate that the sizes of the modules can be dynamically driven by varying the complexities of the tasks.
The network serves as a simple example of an artificial neural network with an adaptable modular structure.
This study was motivated by related dynamical nature of modules in animal brains. @S1063-651X~98!14409-4#
PACS number~s!: 87.10.1e, 02.50.2r, 05.20.2y
I. INTRODUCTION
Training of neural networks for complicated tasks is often
difficult due to the large number of local minima in the error
function landscapes @1,2#. There has been much effort in
searching for efficient learning algorithms and better network
architectures. An interesting idea put forward by Jacobs
et al. is the use of modular neural networks @3#. In this ap-
proach, a complicated task is broken down into several sim-
pler subtasks. The whole network consists of modules called
expert networks each of which only learn to solve one of the
subtasks. Their outputs are connected to the overall network
output via a gate network. The duty of the gate network is to
select which expert network is to be consulted for any given
input pattern. The expert and the gate networks are trained
simultaneously to achieve coherently the inter-related pro-
cesses of task decomposition, assignment of subtasks to the
modules, and the actual learning of the task. Improved learn-
ing algorithms for modular neural networks and examples of
applications to control problems and speech recognition are
discussed in Ref. @4#. Apart from artificial neural networks,
animal brains also have high level modular structures con-
trolling various body functions @5#. At a lower level, it has
also been suggested recently that the human brain adopts a
modular decomposition strategy to learn sets of visuomotor
maps for relating visual inputs to motor outputs under vari-
ous conditions @6#.
To construct a modular artificial neural network, both the
overall modular structure and the architecture of the indi-
vidual components in general have to be carefully defined
before training commences. Spontaneous formation of mod-
ules from a homogeneous network has also been investigated
by Ishikawa @7#. The author adopted a weight decay training
approach. A group of neurons is considered to have formed a
module when most connections to other groups of neurons
have decayed away. Once formed, the modular structure re-
mains unchanged.
On the contrary, regions in an animal brain responsible
for various body functions are found to have certain fluidity
even for adults. In an experiment by Fox @8#, regions in a
monkey’s brain responsible for various fingers were mapped.
A middle finger was then amputated. After a few weeks, the
regions corresponding to adjacent fingers surprisingly ex-
panded into the region previously controlling the middle fin-
ger. This result indicates that fine details in the modular
structure of an animal brain are not hard coded genetically
and some rezoning may be allowed to adapt to the changing
environment or conditions.
Ideally, a modular neural network with a dynamical archi-
tecture may offer enhanced performance. It may automati-
cally decompose a complex task into smaller ones with the
least design effort but the best adaptability in a continuously
changing environment. To some extent, an animal brain may
be an example. Brains are by far the most powerful neural
networks and often inspire advances in their artificial coun-
terparts. Motivated by the fluidity of their modular struc-
tures, we have in this work constructed and studied an arti-
ficial neural network that exhibits analogous adaptable
modular structures. In our network, modules are formed
upon training and their sizes change when the complexities
of the associated tasks vary with time. In the investigations
of both Jacobs et al. and Ishikawa, the modular structure of
the neural networks remains unchanged once they are de-
fined or generated. Our construction is to our knowledge the
first example of a neural network with an adaptable modular
architecture. However, at present, the dynamics only occur
for some specific tasks, network architectures and training
parameters. We also obtain no improvement in the efficiency
of training. Therefore, our results may be of limited imme-
diate practical interest.
In Sec. II, we specify the architecture of our neural net-
work, the tasks to be learned, and the training method. Sec-
tion III explains the formation of modules. Section IV de-
scribes their dynamical properties when the complexities of
the tasks become time dependent. We conclude in Sec. V
with some further discussion.
II. NETWORK ARCHITECTURE AND TRAINING
We focus on a multilayer feed-forward neural network for
function approximations. The network architecture is shown
in Fig. 1. The circles denote the neurons and the lines repre-
sent the synaptic connections. The network is composed of
an input layer, three hidden layers, and an output layer. Note
that the neurons in a hidden layer are only connected to
neighboring ones in the next layer. This makes the locations
of the neurons physically significant and is essential for the
generation of any spatial modular pattern. The values re-
ceived by the two input neurons are denoted by
j
1
and
j
2
,
PHYSICAL REVIEW E SEPTEMBER 1998VOLUME 58, NUMBER 3
PRE 58
1063-651X/98/58~3!/3673~5!/$15.00 3673 © 1998 The American Physical Society

respectively. The output V
i
(1)
of the ith neuron in the first
hidden layer is given by
V
i
~
1
!
5 tanh
F
(
j51
2
w
ij
~
0
!
j
j
1
u
i
~
0
!
G
, ~1!
where w
ij
(0)
and
u
i
(0)
are weight and bias, respectively. The
outputs of the neurons in the second and the third hidden
layers corresponding to m5 2 and 3, respectively, are given
by
V
i
~
m
!
5 tanh
F
(
j5 1
6
w
ij
~
m21
!
V
j
~
m21
!
1
u
i
~
m21
!
G
. ~2!
The weight w
ij
(m21)
is zero if the corresponding synaptic con-
nection does not exist. The overall network outputs O
1
and
O
2
are computed from
O
i
5
(
j5 1
6
w
ij
~
3
!
V
j
~
3
!
1
u
i
~
3
!
. ~3!
This network is capable of approximating an R
2
R
2
func-
tion. However, we limit our consideration to the particular
case of approximating two R R functions. Specifically, O
1
and O
2
are supposed to approximate two target functions
T
1
(
j
1
) and T
2
(
j
2
), respectively. Furthermore, we consider
the case in which
j
1
and
j
2
are uncorrelated inputs. The
target function T
1
(
j
1
) is thus completely independent of
j
2
and similarly T
2
(
j
2
) is independent of
j
1
. Therefore, the
problem has a natural decomposition into two uncorrelated
subtasks of function approximations.
All target functions studied in this work belong to a fam-
ily of sawtooth functions defined by
f
r
~
j
!
5 h
4
@
1
2
~
j
1 r2 1
!
#
, ~4!
where h
4
denotes the fourfold composite hh(hh(x)) of
the function h(x)5
u
rx21
u
. The complexity of f
r
depends
strongly on the parameter r. Figure 2 shows some examples.
At r5 1, it is a simple straight line. The complexity increases
monotonically with r until it becomes a complicated saw-
tooth function at r5 2.
The weights and biases in the network are first initialized
to random values. Training is conducted using a backpropa-
gation algorithm in on-line mode with a momentum term of
0.9 @1#. In this approach, a set of input values
j
1
and
j
2
are
chosen randomly and independently in the range 0<
j
i
<1at
each time step. The input pattern is then presented to the
network and the resulting output error guides a single step of
adjustment on every weight w
ij
(m)
and bias
u
i
(m)
. A similar
adjustment is made during every time step using a random
input pattern. This approach is effectively a steepest descent
method minimizing the output error
E5
^
~
O
1
2 T
1
!
2
1
~
O
2
2 T
2
!
2
&
. ~5!
The brackets denote averaging over all input patterns. It is
customary to introduce noise to the training process to avoid
local minima @1#. In our case, random excitations are applied
to inactive neurons to recover their activities. Specifically,
we randomly inspect a random neuron in the hidden layers at
each time step. The neuron is identified as inactive if either
its recent outputs have a rms fluctuation smaller than 0.3 or
the sum of the magnitudes of all its connections is smaller
than 0.15. An inactive neuron is excited by updating all im-
mediately associated weight w
ij
(m)
to 0.98w
ij
(m)
1
h
ij
(m)
where
h
ij
(m)
is a uniform random variable in the range 6 0.2.
III. MODULES FORMATION
We now discuss the process of modularization of our neu-
ral network during training. Identical target functions T
1
5 T
2
5 f
r
with r5 1.5 are considered. We have already ex-
plained in Sec. II that approximations of the functions are
uncorrelated tasks since the inputs are independent. The saw-
tooth function f
r
defined in Eq. ~4! is moderately compli-
cated for r5 1.5. We perform backpropagation training until
time t53310
6
, well after the output errors have converged
apart from some random fluctuations. Figure 3~a! shows the
resulting network in a typical run. The intensity of a line
representing a connection is proportional to the absolute
value of the corresponding weight. In the figure, weights of
magnitude less than 0.05 are invisible while those larger than
1 are completely darkened. Some allowed connections have
practically vanished so that the network clearly decomposes
into two modules. Neurons are represented by open or filled
FIG. 1. Architecture of a feed-forward network for approxima-
tions of two independent functions.
FIG. 2. Family of sawtooth function f
r
for r5 1.0, 1.5, and 2.0.
The complexity of the function increases with r.
3674 PRE 58
CHI-HANG LAM AND F. G. SHIN

circles depending on which module they belong to. In this
particular run, the modules contain 6 and 12 hidden neurons,
respectively. The performance of a module can be evaluated
from the error
e
1
or
e
2
given by
e
i
5
^
u
O
i
2 T
i
u
&
, ~6!
where the averaging is over all input patterns. We obtain
e
1
5 0.034 and
e
2
5 0.008 in this run. Since the two tasks are
identical, a larger module usually gives a smaller error as is
observed here.
Figure 3~b! shows another realization trained under the
same conditions. In this case, the modules are noncompact in
contrast to the compact ones obtained previously. Both mod-
ules have 9 hidden neurons and the output errors are
e
1
5 0.042 and
e
2
5 0.010, respectively. In fact, noncompact
modules are not favored energetically since they usually give
slightly larger error due to the fewer internal connections.
They are formed occasionally because of entropic reasons.
IV. DYNAMICS OF MODULES
We now demonstrate that the modules in our neural net-
work can expand or shrink if the associated tasks vary with
time . The target functions are set to be T
1
5 f
r
1
and T
2
5 f
r
2
in the same family defined in Eq. ~4!. The parameters
r
1
and r
2
are given by
r
1
5 1.51 R sin
u
,
~7!
r
2
5 1.52 R sin
u
,
where
u
5 (2
p
t/T) modulo 2
p
is a time dependent phase
angle and R and T are the amplitude and period of the varia-
tion, respectively. Training proceeds continuously while the
tasks are slowly varying since one backpropagation step is
executed at each time step. The periodic variation in the
tasks hence directly implies periodic changes in the weights
and biases of the network.
We first consider an amplitude R5 0.4 and a period T
5 53 10
6
. This is a rather long period and the system is not
far from the quasistatic limit. We discard any data for time
t, 2T to avoid any initial transient. Figure 4 shows snap-
shots of the network at three different instants in the same
period in a typical run. Two modules are formed, similar to
the static case discussed in Sec. III, but they now expand and
shrink continuously. In all three snapshots, there are neurons
in the process of switching from one module to another and
hence the modules are not completely decoupled. We there-
fore introduce a more objective criterion for associating the
neurons with the modules, which is consistent with the pre-
vious classifications by simple inspection. We define that a
neuron is in a module if its output has a stronger effect on the
overall output of this module than on the other one. For
example, the ith neuron in the mth layer belongs to module 1
if
K
]
O
1
]
V
i
~
m
!
L
.
K
]
O
2
]
V
i
~
m
!
L
. ~8!
We observe that the modules are more compact in the dy-
namical case. This clearly results from the dynamics of
gradual expansion and shrinkage of the modules.
The snapshot in Fig. 4~a! is at phase angle
u
5 0. From Eq.
~7!, the parameters are r
1
5 r
2
5 1.5, implying tasks of iden-
tical complexity at that instant. Let n
1
and n
2
be the number
of hidden neurons in the respective modules. We obtain n
1
5 n
2
5 9 and individual output errors
e
1
5 0.040 and
e
2
5 0.014. The situation here is quite similar to the static case.
Figure 4~b! shows the state of the network a quarter of a
period later at
u
5
p
/2. Now, r
1
5 1.9 and r
2
5 1.1. Module 1
on the left is handling a much more complicated function
approximation task and has already gained some neurons
from module 2, which in contrast is assigned a much simpler
FIG. 3. Two realizations of a network trained to approximate
two identical functions with independent inputs.
FIG. 4. Realizations of a network trained to approximate time-
dependent functions at phase angles ~a!
u
5 0, ~b!
u
5
p
/2, and ~c!
u
5
p
.
PRE 58
3675FORMATION AND DYNAMICS OF MODULES IN A...

task. We get n
1
5 14, n
2
5 4,
e
1
5 0.075, and
e
2
5 0.007. Even
though module 1 is larger, its corresponding error is signifi-
cantly higher because of the more complicated target func-
tion to be approximated. The large difference between the
errors is precisely the driving force for the redistribution of
the neurons. At
u
5
p
, the tasks become identical again. The
corresponding state shown in Fig. 4~c! indicates modules of
sizes n
1
5 11 and n
2
5 7, respectively. The errors are
e
1
5 0.011 and
e
2
5 0.064. Module 1 has already returned most
neurons to module 2 but is still retaining a few extra ones.
This exemplifies a hysteresis effect when tuning the module
sizes with the complexities of the tasks. Note that hysteresis
also exists for other values of
u
but is not apparent, for
example, in Fig. 4~a! due to the presence of random fluctua-
tions.
We now examine quantities averaged over time. Let n
¯
1
be
the size of module 1 at any given phase angle
u
averaged
during the period 2T, t, 100T. The average size of module
2 is then 182 n
¯
1
. Figure 5 plots n
¯
1
against
u
for various
values of the amplitude R. The relations between n
¯
1
and
u
fit
quite well to sinusoidal functions in the form 91 A sin(
u
2
f
), which are also plotted in Fig. 5. The values of the
phase shift
f
fall in a rather narrow range of 0.406 0.07.
This phase shift measures the lag of the variation in the mod-
ule sizes behind that of the complexities of the tasks. We
present the same data again in Fig. 6 in an n
¯
1
versus r
1
plot
showing the hysteresis loops. For R& 0.1, the variations in
the tasks become so small that the modules become static.
Other values of the period T are also investigated. Figure 7
shows a similar plot of n
¯
1
versus
u
for R5 0.4 and T varying
from 53 10
5
to 23 10
7
. At small T, the relations deviate
significantly from sinusoidal.
V. DISCUSSION
To construct the above neural network exhibiting dynami-
cal modules, we have been very careful in selecting the net-
work architecture, the tasks, and the training method. Al-
though some variations are allowed, the dynamics
unfortunately is not robust but we have identified some es-
sential characteristic features. In the network architecture we
used, only neighboring neurons in the hidden layers are con-
nected as has been explained in Sec II. Neurons on the sur-
face of a module thus have fewer connections and are effec-
tively weakly bonded. Therefore, an expanding module can
easily capture neurons on the surface of the other module one
by one. For the more widely used architecture with full con-
nections between neurons in adjacent hidden layers, the no-
tion of surface or bulk does not apply. It is much more dif-
ficult to set loose some individual neurons for reallocation
without dissolving the whole module. The choice of the tar-
get functions is also important. They have to be sufficiently
complicated so that it requires as many neurons as possible
for the computations. However, it cannot be too complicated
FIG. 5. Plot of the average size n
1
of module 1 against phase
angle
u
for various values of the amplitude R at period T5 5
3 10
6
.
FIG. 6. Plot of the average size n
1
of module 1 against the target
function parameter r
1
for various values of the amplitude R.
FIG. 7. Plot of the average size n
1
of module 1 against the phase
angle
u
for various values of the period T at amplitude R5 0.4.
3676 PRE 58
CHI-HANG LAM AND F. G. SHIN

because the abundance of local minima would forbid effi-
cient learning. We have found that the family of sawtooth
functions defined in Eq. ~4! suits this purpose nicely. We
also tried sinusoidal target functions as examples but the
resulting dynamics of the modules is less pronounced.
We have obtained sinusoidal relations between the mod-
ule size n
¯
1
and the phase angle
u
of the variation of the tasks
as shown in Fig. 6. The simplicity of these relations is par-
ticularly interesting, although it does not hold for smaller
periods or some other target functions that we checked. Us-
ing Eq. ~7! and neglecting the hysteresis effect, the sinu-
soidal relations imply Dn}Dr where Dn5 n
2
2 n
1
and Dr
5 r
2
2 r
1
. Because the complexities of the tasks we used
increase monotonically with r, we may tentatively identify r
i
with some complexity measure c
i
for the respective modules.
As a result, we can write Dn5
k
Dc where Dc5 c
2
2 c
1
. The
proportionality constant
k
is related to the compressibility of
the modules with respect to changes in the complexities of
the tasks. The application of concepts in thermodynamics
motivated by the above observations may be helpful for fur-
ther investigations.
We have studied a network with separate input and output
neurons for each task. It would be more interesting if the
tasks could share the same input and output nodes but appro-
priate information could be channeled automatically to the
correct module similar to the networks of Jacobs et al. @3#
and Ishikawa @7#. However, we have not yet been able to
construct such a system exhibiting both spontaneous modu-
larization and module dynamics.
In conclusion, motivated by the fluidity of modules in the
brain, we have proposed a novel artificial neural network
with analogous adaptable modular structures. When training
a network to perform two independent function approxima-
tion tasks, two corresponding modules are formed. Their
sizes can vary in order to adapt to variations in the complexi-
ties of the tasks. Hysteresis in the dynamics is observed and
compactness of the modules is enhanced by the process of
expansion and shrinkage. We have also discussed features in
our model that are essential for the dynamics.
ACKNOWLEDGMENT
This work was supported by RGC Grant No. 0354-046-
A3-110.
@1# J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the
Theory of Neural Computation ~Addison-Wesley, New York,
1991!.
@2# T. L. H. Watkin and A. Rau, Rev. Mod. Phys. 65, 499
~1993!.
@3# R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton,
Neural Comput. 3,79~1991!.
@4# M. I. Jordan and R. A. Jacobs, in the Handbook of Brain
Theory and Neural Networks, edited by M. Arbib ~Cambridge
University Press, Cambridge, 1995!.
@5# P. Peretto, An Introduction to the Modeling of Neural Net-
works ~Cambridge University Press, Cambridge, 1992!.
@6# Z. Ghahramani and D. M. Wolpert, Nature ~London! 386, 392
~1997!.
@7# M. Ishikawa, Neural Networks 9, 509 ~1996!.
@8# J. L. Fox, Science 225, 820 ~1984!.
PRE 58
3677FORMATION AND DYNAMICS OF MODULES IN A...
Citations
More filters
Journal ArticleDOI

A Novel Estimation Method for the State of Health of Lithium-Ion Battery Using Prior Knowledge-Based Neural Network and Markov Chain

TL;DR: A novel SOH estimation method by using a prior knowledge-based neural network (PKNN) and the Markov chain for a single LIB and the maximum estimation error of the SOH is reduced to less than 1.7% by adopting the proposed method.
Book ChapterDOI

Simulating evolution with a computational model of embryogeny: obtaining robustness from evolved individuals

TL;DR: It is shown that evolved phenotypes exhibit robustness to damage and it is conjectured that it is the result of the effects of a complex mapping upon simulated evolution.
Journal ArticleDOI

Task-dependent evolution of modularity in neural networks

TL;DR: The authors empirically studied the evolution of connectionist models in the context of modular problems and found that the modularity of the problem is reflected by the architecture of adapted systems, although learning can counterbalance some imperfection of the architecture.

Task-Dependent Evolution of Modularity in Neural Networks — A Quantitative Case Study

TL;DR: It turns out that the modularity of the problem is reflected by the architecture of adapted systems, although learning can counterbalance some imperfection of the architecture.
Proceedings ArticleDOI

Formation of modules in a computational model of embryogeny

TL;DR: An investigation is conducted into the effects of a complex mapping between genotype and phenotype upon a simulated evolutionary process and a model of embryogeny is utilised to grow simple French flag like patterns.
References
More filters
Book

Introduction To The Theory Of Neural Computation

TL;DR: This book is a detailed, logically-developed treatment that covers the theory and uses of collective computational networks, including associative memory, feed forward networks, and unsupervised learning.
Book

An introduction to the modeling of neural networks

TL;DR: This text is a beginning graduate-level introduction to neural networks, focussing on current theoretical models, examining what these models can reveal about how the brain functions, and discussing the ramifications for psychology, artificial intelligence and the construction of a new generation of intelligent computers.
Frequently Asked Questions (7)
Q1. What are the contributions mentioned in the paper "Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network" ?

The authors study a feed-forward neural network for two independent function approximation tasks. The authors demonstrate that the sizes of the modules can be dynamically driven by varying the complexities of the tasks. This study was motivated by related dynamical nature of modules in animal brains. 

The output Vi (1) of the ith neuron in the first hidden layer is given byVi ~1 !5tanhF (j512wi j ~0 !j j1u i ~0 !G , ~1! where wi j (0) and u i (0) are weight and bias, respectively. 

The proportionality constant k is related to the compressibility of the modules with respect to changes in the complexities of the tasks. 

In fact, noncompact modules are not favored energetically since they usually give slightly larger error due to the fewer internal connections. 

The outputs of the neurons in the second and the third hidden layers corresponding to m52 and 3, respectively, are given byVi ~m !5tanhF (j516wi j ~m21 !V j ~m21 !1u i ~m21 !G . ~2!The weight wi j (m21) is zero if the corresponding synaptic connection does not exist. 

In conclusion, motivated by the fluidity of modules in the brain, the authors have proposed a novel artificial neural network with analogous adaptable modular structures. 

The authors have already explained in Sec. II that approximations of the functions are uncorrelated tasks since the inputs are independent. 

Trending Questions (1)
What are the basic components of artificial neural network?

The network serves as a simple example of an artificial neural network with an adaptable modular structure.