What is the output of the ith neuron in the first hidden layer?

The output Vi (1) of the ith neuron in the first hidden layer is given byVi ~1 !5tanhF (j512wi j ~0 !j j1u i ~0 !G , ~1! where wi j (0) and u i (0) are weight and bias, respectively.

What is the weight of the outputs of the hidden layers?

The outputs of the neurons in the second and the third hidden layers corresponding to m52 and 3, respectively, are given byVi ~m !5tanhF (j516wi j ~m21 !V j ~m21 !1u i ~m21 !G . ~2!The weight wi j (m21) is zero if the corresponding synaptic connection does not exist.

What is the purpose of this paper?

In conclusion, motivated by the fluidity of modules in the brain, the authors have proposed a novel artificial neural network with analogous adaptable modular structures.

(Open Access) Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network (1998) | Chi Hang Lam

Q: What are the contributions mentioned in the paper "Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network" ?

The authors study a feed-forward neural network for two independent function approximation tasks. The authors demonstrate that the sizes of the modules can be dynamically driven by varying the complexities of the tasks. This study was motivated by related dynamical nature of modules in animal brains.

Q: What is the proportionality constant of the modules?

The proportionality constant k is related to the compressibility of the modules with respect to changes in the complexities of the tasks.

Q: Why are noncompact modules not favored energetically?

In fact, noncompact modules are not favored energetically since they usually give slightly larger error due to the fewer internal connections.

Q: What is the meaning of the sentence?

The authors have already explained in Sec. II that approximations of the functions are uncorrelated tasks since the inputs are independent.

Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network

Chi-Hang Lam and F. G. Shin

Department of Applied Physics, Hong Kong Polytechnic University, Hung Hom, Hong Kong

~Received 17 April 1998!

We study a feed-forward neural network for two independent function approximation tasks. Upon training,

two modules are automatically formed in the hidden layers, each handling one of the tasks predominantly. We

demonstrate that the sizes of the modules can be dynamically driven by varying the complexities of the tasks.

The network serves as a simple example of an artiﬁcial neural network with an adaptable modular structure.

This study was motivated by related dynamical nature of modules in animal brains. @S1063-651X~98!14409-4#

PACS number~s!: 87.10.1e, 02.50.2r, 05.20.2y

I. INTRODUCTION

Training of neural networks for complicated tasks is often

difﬁcult due to the large number of local minima in the error

function landscapes @1,2#. There has been much effort in

searching for efﬁcient learning algorithms and better network

architectures. An interesting idea put forward by Jacobs

et al. is the use of modular neural networks @3#. In this ap-

proach, a complicated task is broken down into several sim-

pler subtasks. The whole network consists of modules called

expert networks each of which only learn to solve one of the

subtasks. Their outputs are connected to the overall network

output via a gate network. The duty of the gate network is to

select which expert network is to be consulted for any given

input pattern. The expert and the gate networks are trained

simultaneously to achieve coherently the inter-related pro-

cesses of task decomposition, assignment of subtasks to the

modules, and the actual learning of the task. Improved learn-

ing algorithms for modular neural networks and examples of

applications to control problems and speech recognition are

discussed in Ref. @4#. Apart from artiﬁcial neural networks,

animal brains also have high level modular structures con-

trolling various body functions @5#. At a lower level, it has

also been suggested recently that the human brain adopts a

modular decomposition strategy to learn sets of visuomotor

maps for relating visual inputs to motor outputs under vari-

ous conditions @6#.

To construct a modular artiﬁcial neural network, both the

overall modular structure and the architecture of the indi-

vidual components in general have to be carefully deﬁned

before training commences. Spontaneous formation of mod-

ules from a homogeneous network has also been investigated

by Ishikawa @7#. The author adopted a weight decay training

approach. A group of neurons is considered to have formed a

module when most connections to other groups of neurons

have decayed away. Once formed, the modular structure re-

mains unchanged.

On the contrary, regions in an animal brain responsible

for various body functions are found to have certain ﬂuidity

even for adults. In an experiment by Fox @8#, regions in a

monkey’s brain responsible for various ﬁngers were mapped.

A middle ﬁnger was then amputated. After a few weeks, the

regions corresponding to adjacent ﬁngers surprisingly ex-

panded into the region previously controlling the middle ﬁn-

ger. This result indicates that ﬁne details in the modular

structure of an animal brain are not hard coded genetically

and some rezoning may be allowed to adapt to the changing

environment or conditions.

Ideally, a modular neural network with a dynamical archi-

tecture may offer enhanced performance. It may automati-

cally decompose a complex task into smaller ones with the

least design effort but the best adaptability in a continuously

changing environment. To some extent, an animal brain may

be an example. Brains are by far the most powerful neural

networks and often inspire advances in their artiﬁcial coun-

terparts. Motivated by the ﬂuidity of their modular struc-

tures, we have in this work constructed and studied an arti-

ﬁcial neural network that exhibits analogous adaptable

modular structures. In our network, modules are formed

upon training and their sizes change when the complexities

of the associated tasks vary with time. In the investigations

of both Jacobs et al. and Ishikawa, the modular structure of

the neural networks remains unchanged once they are de-

ﬁned or generated. Our construction is to our knowledge the

ﬁrst example of a neural network with an adaptable modular

architecture. However, at present, the dynamics only occur

for some speciﬁc tasks, network architectures and training

parameters. We also obtain no improvement in the efﬁciency

of training. Therefore, our results may be of limited imme-

diate practical interest.

In Sec. II, we specify the architecture of our neural net-

work, the tasks to be learned, and the training method. Sec-

tion III explains the formation of modules. Section IV de-

scribes their dynamical properties when the complexities of

the tasks become time dependent. We conclude in Sec. V

with some further discussion.

II. NETWORK ARCHITECTURE AND TRAINING

We focus on a multilayer feed-forward neural network for

function approximations. The network architecture is shown

in Fig. 1. The circles denote the neurons and the lines repre-

sent the synaptic connections. The network is composed of

an input layer, three hidden layers, and an output layer. Note

that the neurons in a hidden layer are only connected to

neighboring ones in the next layer. This makes the locations

of the neurons physically signiﬁcant and is essential for the

generation of any spatial modular pattern. The values re-

ceived by the two input neurons are denoted by

and

PHYSICAL REVIEW E SEPTEMBER 1998VOLUME 58, NUMBER 3

PRE 58

respectively. The output V

(1)

of the ith neuron in the ﬁrst

hidden layer is given by

5 tanh

(

j51

, ~1!

where w

(0)

and

(0)

are weight and bias, respectively. The

outputs of the neurons in the second and the third hidden

layers corresponding to m5 2 and 3, respectively, are given

5 tanh

(

j5 1

m21

. ~2!

The weight w

(m21)

is zero if the corresponding synaptic con-

nection does not exist. The overall network outputs O

and

are computed from

(

j5 1

. ~3!

This network is capable of approximating an R

→R

func-

tion. However, we limit our consideration to the particular

case of approximating two R→ R functions. Speciﬁcally, O

and O

are supposed to approximate two target functions

(

) and T

(

), respectively. Furthermore, we consider

the case in which

and

are uncorrelated inputs. The

target function T

(

) is thus completely independent of

and similarly T

(

) is independent of

. Therefore, the

problem has a natural decomposition into two uncorrelated

subtasks of function approximations.

All target functions studied in this work belong to a fam-

ily of sawtooth functions deﬁned by

5 h

1 r2 1

, ~4!

where h

denotes the fourfold composite h„h(h„h(x)…)… of

the function h(x)5

rx21

. The complexity of f

depends

strongly on the parameter r. Figure 2 shows some examples.

At r5 1, it is a simple straight line. The complexity increases

monotonically with r until it becomes a complicated saw-

tooth function at r5 2.

The weights and biases in the network are ﬁrst initialized

to random values. Training is conducted using a backpropa-

gation algorithm in on-line mode with a momentum term of

0.9 @1#. In this approach, a set of input values

and

are

chosen randomly and independently in the range 0<

<1at

each time step. The input pattern is then presented to the

network and the resulting output error guides a single step of

adjustment on every weight w

(m)

and bias

(m)

. A similar

adjustment is made during every time step using a random

input pattern. This approach is effectively a steepest descent

method minimizing the output error

2 T

. ~5!

The brackets denote averaging over all input patterns. It is

customary to introduce noise to the training process to avoid

local minima @1#. In our case, random excitations are applied

to inactive neurons to recover their activities. Speciﬁcally,

we randomly inspect a random neuron in the hidden layers at

each time step. The neuron is identiﬁed as inactive if either

its recent outputs have a rms ﬂuctuation smaller than 0.3 or

the sum of the magnitudes of all its connections is smaller

than 0.15. An inactive neuron is excited by updating all im-

mediately associated weight w

(m)

to 0.98w

(m)

where

(m)

is a uniform random variable in the range 6 0.2.

III. MODULES FORMATION

We now discuss the process of modularization of our neu-

ral network during training. Identical target functions T

5 T

5 f

with r5 1.5 are considered. We have already ex-

plained in Sec. II that approximations of the functions are

uncorrelated tasks since the inputs are independent. The saw-

tooth function f

deﬁned in Eq. ~4! is moderately compli-

cated for r5 1.5. We perform backpropagation training until

time t53310

, well after the output errors have converged

apart from some random ﬂuctuations. Figure 3~a! shows the

resulting network in a typical run. The intensity of a line

representing a connection is proportional to the absolute

value of the corresponding weight. In the ﬁgure, weights of

magnitude less than 0.05 are invisible while those larger than

1 are completely darkened. Some allowed connections have

practically vanished so that the network clearly decomposes

into two modules. Neurons are represented by open or ﬁlled

FIG. 1. Architecture of a feed-forward network for approxima-

tions of two independent functions.

FIG. 2. Family of sawtooth function f

for r5 1.0, 1.5, and 2.0.

The complexity of the function increases with r.

3674 PRE 58

CHI-HANG LAM AND F. G. SHIN

circles depending on which module they belong to. In this

particular run, the modules contain 6 and 12 hidden neurons,

respectively. The performance of a module can be evaluated

from the error

given by

2 T

, ~6!

where the averaging is over all input patterns. We obtain

5 0.034 and

5 0.008 in this run. Since the two tasks are

identical, a larger module usually gives a smaller error as is

observed here.

Figure 3~b! shows another realization trained under the

same conditions. In this case, the modules are noncompact in

contrast to the compact ones obtained previously. Both mod-

ules have 9 hidden neurons and the output errors are

5 0.042 and

5 0.010, respectively. In fact, noncompact

modules are not favored energetically since they usually give

slightly larger error due to the fewer internal connections.

They are formed occasionally because of entropic reasons.

IV. DYNAMICS OF MODULES

We now demonstrate that the modules in our neural net-

work can expand or shrink if the associated tasks vary with

time . The target functions are set to be T

5 f

and T

5 f

in the same family deﬁned in Eq. ~4!. The parameters

and r

are given by

5 1.51 R sin

~7!

5 1.52 R sin

where

5 (2

t/T) modulo 2

is a time dependent phase

angle and R and T are the amplitude and period of the varia-

tion, respectively. Training proceeds continuously while the

tasks are slowly varying since one backpropagation step is

executed at each time step. The periodic variation in the

tasks hence directly implies periodic changes in the weights

and biases of the network.

We ﬁrst consider an amplitude R5 0.4 and a period T

5 53 10

. This is a rather long period and the system is not

far from the quasistatic limit. We discard any data for time

t, 2T to avoid any initial transient. Figure 4 shows snap-

shots of the network at three different instants in the same

period in a typical run. Two modules are formed, similar to

the static case discussed in Sec. III, but they now expand and

shrink continuously. In all three snapshots, there are neurons

in the process of switching from one module to another and

hence the modules are not completely decoupled. We there-

fore introduce a more objective criterion for associating the

neurons with the modules, which is consistent with the pre-

vious classiﬁcations by simple inspection. We deﬁne that a

neuron is in a module if its output has a stronger effect on the

overall output of this module than on the other one. For

example, the ith neuron in the mth layer belongs to module 1

]

. ~8!

We observe that the modules are more compact in the dy-

namical case. This clearly results from the dynamics of

gradual expansion and shrinkage of the modules.

The snapshot in Fig. 4~a! is at phase angle

5 0. From Eq.

~7!, the parameters are r

5 r

5 1.5, implying tasks of iden-

tical complexity at that instant. Let n

and n

be the number

of hidden neurons in the respective modules. We obtain n

5 n

5 9 and individual output errors

5 0.040 and

5 0.014. The situation here is quite similar to the static case.

Figure 4~b! shows the state of the network a quarter of a

period later at

/2. Now, r

5 1.9 and r

5 1.1. Module 1

on the left is handling a much more complicated function

approximation task and has already gained some neurons

from module 2, which in contrast is assigned a much simpler

FIG. 3. Two realizations of a network trained to approximate

two identical functions with independent inputs.

FIG. 4. Realizations of a network trained to approximate time-

dependent functions at phase angles ~a!

5 0, ~b!

/2, and ~c!

PRE 58

3675FORMATION AND DYNAMICS OF MODULES IN A...

task. We get n

5 14, n

5 4,

5 0.075, and

5 0.007. Even

though module 1 is larger, its corresponding error is signiﬁ-

cantly higher because of the more complicated target func-

tion to be approximated. The large difference between the

errors is precisely the driving force for the redistribution of

the neurons. At

, the tasks become identical again. The

corresponding state shown in Fig. 4~c! indicates modules of

sizes n

5 11 and n

5 7, respectively. The errors are

5 0.011 and

5 0.064. Module 1 has already returned most

neurons to module 2 but is still retaining a few extra ones.

This exempliﬁes a hysteresis effect when tuning the module

sizes with the complexities of the tasks. Note that hysteresis

also exists for other values of

but is not apparent, for

example, in Fig. 4~a! due to the presence of random ﬂuctua-

tions.

We now examine quantities averaged over time. Let n

the size of module 1 at any given phase angle

averaged

during the period 2T, t, 100T. The average size of module

2 is then 182 n

. Figure 5 plots n

against

for various

values of the amplitude R. The relations between n

and

ﬁt

quite well to sinusoidal functions in the form 91 A sin(

), which are also plotted in Fig. 5. The values of the

phase shift

fall in a rather narrow range of 0.406 0.07.

This phase shift measures the lag of the variation in the mod-

ule sizes behind that of the complexities of the tasks. We

present the same data again in Fig. 6 in an n

versus r

plot

showing the hysteresis loops. For R& 0.1, the variations in

the tasks become so small that the modules become static.

Other values of the period T are also investigated. Figure 7

shows a similar plot of n

versus

for R5 0.4 and T varying

from 53 10

to 23 10

. At small T, the relations deviate

signiﬁcantly from sinusoidal.

V. DISCUSSION

To construct the above neural network exhibiting dynami-

cal modules, we have been very careful in selecting the net-

work architecture, the tasks, and the training method. Al-

though some variations are allowed, the dynamics

unfortunately is not robust but we have identiﬁed some es-

sential characteristic features. In the network architecture we

used, only neighboring neurons in the hidden layers are con-

nected as has been explained in Sec II. Neurons on the sur-

face of a module thus have fewer connections and are effec-

tively weakly bonded. Therefore, an expanding module can

easily capture neurons on the surface of the other module one

by one. For the more widely used architecture with full con-

nections between neurons in adjacent hidden layers, the no-

tion of surface or bulk does not apply. It is much more dif-

ﬁcult to set loose some individual neurons for reallocation

without dissolving the whole module. The choice of the tar-

get functions is also important. They have to be sufﬁciently

complicated so that it requires as many neurons as possible

for the computations. However, it cannot be too complicated

FIG. 5. Plot of the average size n

of module 1 against phase

angle

for various values of the amplitude R at period T5 5

3 10

FIG. 6. Plot of the average size n

of module 1 against the target

function parameter r

for various values of the amplitude R.

FIG. 7. Plot of the average size n

of module 1 against the phase

angle

for various values of the period T at amplitude R5 0.4.

3676 PRE 58

CHI-HANG LAM AND F. G. SHIN

because the abundance of local minima would forbid efﬁ-

cient learning. We have found that the family of sawtooth

functions deﬁned in Eq. ~4! suits this purpose nicely. We

also tried sinusoidal target functions as examples but the

resulting dynamics of the modules is less pronounced.

We have obtained sinusoidal relations between the mod-

ule size n

and the phase angle

of the variation of the tasks

as shown in Fig. 6. The simplicity of these relations is par-

ticularly interesting, although it does not hold for smaller

periods or some other target functions that we checked. Us-

ing Eq. ~7! and neglecting the hysteresis effect, the sinu-

soidal relations imply Dn}Dr where Dn5 n

2 n

and Dr

5 r

2 r

. Because the complexities of the tasks we used

increase monotonically with r, we may tentatively identify r

with some complexity measure c

for the respective modules.

As a result, we can write Dn5

Dc where Dc5 c

2 c

. The

proportionality constant

is related to the compressibility of

the modules with respect to changes in the complexities of

the tasks. The application of concepts in thermodynamics

motivated by the above observations may be helpful for fur-

ther investigations.

We have studied a network with separate input and output

neurons for each task. It would be more interesting if the

tasks could share the same input and output nodes but appro-

priate information could be channeled automatically to the

correct module similar to the networks of Jacobs et al. @3#

and Ishikawa @7#. However, we have not yet been able to

construct such a system exhibiting both spontaneous modu-

larization and module dynamics.

In conclusion, motivated by the ﬂuidity of modules in the

brain, we have proposed a novel artiﬁcial neural network

with analogous adaptable modular structures. When training

a network to perform two independent function approxima-

tion tasks, two corresponding modules are formed. Their

sizes can vary in order to adapt to variations in the complexi-

ties of the tasks. Hysteresis in the dynamics is observed and

compactness of the modules is enhanced by the process of

expansion and shrinkage. We have also discussed features in

our model that are essential for the dynamics.

ACKNOWLEDGMENT

This work was supported by RGC Grant No. 0354-046-

A3-110.

@1# J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the

Theory of Neural Computation ~Addison-Wesley, New York,

1991!.

@2# T. L. H. Watkin and A. Rau, Rev. Mod. Phys. 65, 499

~1993!.

@3# R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton,

Neural Comput. 3,79~1991!.

@4# M. I. Jordan and R. A. Jacobs, in the Handbook of Brain

Theory and Neural Networks, edited by M. Arbib ~Cambridge

University Press, Cambridge, 1995!.

@5# P. Peretto, An Introduction to the Modeling of Neural Net-

works ~Cambridge University Press, Cambridge, 1992!.

@6# Z. Ghahramani and D. M. Wolpert, Nature ~London! 386, 392

~1997!.

@7# M. Ishikawa, Neural Networks 9, 509 ~1996!.

@8# J. L. Fox, Science 225, 820 ~1984!.

PRE 58

3677FORMATION AND DYNAMICS OF MODULES IN A...

Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network

Figures

Citations

A Novel Estimation Method for the State of Health of Lithium-Ion Battery Using Prior Knowledge-Based Neural Network and Markov Chain

Simulating evolution with a computational model of embryogeny: obtaining robustness from evolved individuals

Task-dependent evolution of modularity in neural networks

Task-Dependent Evolution of Modularity in Neural Networks — A Quantitative Case Study

Formation of modules in a computational model of embryogeny

References

Introduction To The Theory Of Neural Computation

An introduction to the modeling of neural networks

Related Papers (5)

A multi-layer feed-forward neural network with dynamically adjustable structures

Feed-forward neural networks

Neural Networks Theory

Feedforward Neural Nets

Neural network modeling by system dynamics methodology

Frequently Asked Questions (7)

Q1. What are the contributions mentioned in the paper "Formation and dynamics of modules in a dual-tasking multilayer feed-forward neural network" ?

Q2. What is the output of the ith neuron in the first hidden layer?

Q3. What is the proportionality constant of the modules?

Q4. Why are noncompact modules not favored energetically?

Q5. What is the weight of the outputs of the hidden layers?

Q6. What is the purpose of this paper?

Q7. What is the meaning of the sentence?

Trending Questions (1)