scispace - formally typeset
Open AccessProceedings ArticleDOI

A binary Self-Organizing Map and its FPGA implementation

Reads0
Chats0
TLDR
A novel tri-state rule is used in updating the network weights during the training phase, and the rule implementation is highly suited to the FPGA architecture, and allows extremely rapid training.
Abstract
A binary Self Organizing Map (SOM) has been designed and implemented on a Field Programmable Gate Array (FPGA) chip. A novel learning algorithm which takes binary inputs and maintains tri-state weights is presented. The binary SOM has the capability of recognizing binary input sequences after training. A novel tri-state rule is used in updating the network weights during the training phase. The rule implementation is highly suited to the FPGA architecture, and allows extremely rapid training. This architecture may be used in real-time for fast pattern clustering and classification of binary features.

read more

Content maybe subject to copyright    Report

Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, 2009
A
Binary
Self-Organizing
Map
and
its
FPGA
Implementation
Kofi Appiah Andrew Hunter Hongying Meng and Shigang Yue
Mervyn Hobden Nigel Priestley Peter Hobden and Cy Pettit
The vector components of the winning node
Wk
with mini-
mum distance
Di;
is then updated as follows
where
TJ
is the learning rate. The topological ordering prop-
erty is imposed by also updating weight vectors of nodes in
the neighbourhood of the winning node. This can be achieved
by the following learning rule
(3)
(2)
(1)
N-l
D;
==
L
(Xi
-
Wji)2.
i=O
where N
j
is a neighbourhood function (defining the region
around
Wk
) based on the topological displacement of
neighbouring neuron from the winning neuron. The size of
N
j
decreases as training progresses.
In the vast majority of implementations, the SOM input
data and neurons are represented by real numbers, making
it difficult to implement on a hardware architecture like the
Field Programmable Gate Array (FPGA). However, in many
applications the data is either presented as a binary string, or
may be conveniently recoded as such (a "binary signature").
For example, in image processing applications a bank of Haar
filters produces a long binary signature. In this paper we
present a new learning algorithm which takes binary inputs
and maintains tri-state weights (neuron) in the SOM. We
also present the FPGA implementation of this binary Self
Organizing Map (bSOM). The bSOM is designed for efficient
hardware implementation, having both greatly reduced circuit
size compared to a real-valued SOM, and exceptionally fast
execution and training times.
In section II, we review previous implementations of SOM
on hardware architectures. The novel bSOM algorithm is then
presented in III, followed by its FPGA implementation in
section IV. Section V, presents the experimental results in
software and hardware, and we conclude in section VI.
During training, the "nearest" neuron prototype vector to
the input vector is identified - this is called the "winning"
neuron - using a distance metric,
D. The Euclidean distance
is most frequently used as the metric.
For a given network with
M neurons and
N-dimensional
input vector x, the distance for neuron with weight vector
Wj
(j
<
M)
is given by
I.
INTRODUCTION
T
HE original Self Organizing Map (SOM) proposed by
Kohonen [1] consists of two layers; the input and
the competitive layers. It is an unsupervised neural network
with competitive learning models that captures the topology
and probability distribution of input data, which facilitates
clustering and classification in pattern recognition[2] , [3], [4].
The SOM is typically implemented on a standard von
Neumann architecture computer. For large input dimension-
ality and training set size execution speeds are reasonable,
but training is rather slow, as the SOM training algorithm
typically requires thousands of iterations, each of which
involves the calculation of the Euclidean distance of each of
the input vectors to each of the neuron prototype vectors.
Hardware implementation is therefore of interest. Fortu-
nately, the structure is fairly easy to convert into hardware
processing units executing in parallel [5]. However, a direct
implementation of the standard SOM onto hardware results in
large designs, which consume substantial hardware internal
resources (slices, registers and look-up table (LUT) units),
limiting the scale of network implementation.
The SOM algorithm presented in [1] is based on a
competitive learning algorithm, the winner-take-all (WTA)
network, where an input vector is represented by the closest
neuron prototype vector, which is assigned during training
to a data cluster centre. The prototype vectors are stored
in the "weights" of the neural network. The architecture
consists of topologically organized array of neurons, each
with
N-dimensional
weight vector, where N is also the
dimensionality of the input vector. The basic principle of the
SOM is to adjust the weight vectors until the neurons repre-
sent the input data, while using a topological neighbourhood
update rule to ensure that similar prototype occupy nearby
positions on the topological map.
Abstract-
A
binary
Self Organizing
Map
(SOM) has been
designed
and
implemented on a Field
Programmable
Gate
Array
(FPGA) chip. A novel learning algorithm which takes
binary
inputs
and
maintains tri-state weights is presented.
The
binary
SOM
has the capability of recognizing
binary
input
sequences
after
training. A novel tri-state
rule
is used in
updating
the network weights
during
the
training
phase.
The
rule
implementation is highly suited to the
FPGA
architecture,
and
allows extremely
rapid
training. This architecture may be
used in real-time for fast
pattern
clustering
and
classification
of
binary
features.
Kofi Appiah, Andrew Hunter, Hongying Meng and Shigang Yue are with
the Department of Computing and Informatics, University of Lincoln, UK
and Mervyn Hobden, Nigel Priestley, Peter Hobden and Cy Pettit are with
e2v Technologies, Lincoln, UK.
This work was supported by TSB under BRAINS Project
II.
HARDWARE
ARCHITECTURES
FOR
KOHONEN'
S
MAP
Software simulations are very useful for investigating the
capabilities of neural network models [6], and are suitable
978-1-4244-3553-1/09/$25.00 ©2009 IEEE
164

for many applications, but are limited in the size of network
implementation, particularly where very fast execution and
training is required. Hardware neural networks can be im-
plemented using analogue or digital systems [12].
The popularity of digital implementations stems from the
fact they are more accurate, more flexible and are less
sensitive to noise than analogue ones [7] - notwithstanding
the analogue inspiration from theoretical neural models. The
computational complexity of the SOM algorithm [1] prevents
it from training in real-time on single processor architectures,
for many real-time applications. The FPGA provides a suit-
able platform for the implementation of a digital version of
the SOM neural network, due to its reconfigurability and
smaller non-recurring engineering (NRE) costs.
However, a floating-point representation of neurons in a
neural network presents significant difficulties for implemen-
tation on FPGAs, despite the current advances in FPGA
technology [13], since floating point multipliers and the
computation of nonlinear excitation functions is complex and
consumes large resources [7] [8]. A number of authors have
sought to mitigate this problem by introducing simplifications
to the SOM algorithm; Pena et. al. [4] implemented a digital
version of the SOM on FPGA by replacing the Euclidean
distance computations with a Cityblock (Manhattan distance)
computation to avoid the expense of hardware multiplication.
In addition, they simplified the neighbourhood function and
introduced a set of new learning parameters.
A similar implementation of the SOM, where the distance,
neighbourhood and learning rate computation is replaced
with a simplified version, has been presented by Chang
et. al. [9] and Porrmann et. al. [10]. An efficient SOM
architecture based on a new Frequency Adaptive Learning
(FAL) algorithm, which efficiently replaces the neighbour-
hood adaptation function of the conventional SOM, has been
presented in [9]. The design was implemented on a Xilinx
FPGA and is capable of quantizing a 512 x 512 pixel colour
image in about 1.003sec at 35MHz clock rate without the
use of sub-sampling.
A design based on the universal rapid prototyping system
RAPTOR2000 for the acceleration of SOM is presented in
[10]. Using Xilinx FPGAs, the implementation achieves a
speed-up of up to 190 (with five FPGA modules on the RAP-
TOR2000 system) compared to a software implementation on
a state of the art personal computer, for typical applications
of self-organizing maps. A similar system, implemented on
a Xilinx Virtex II XC2V300, aimed at reducing the training
processing time of SOM, has been presented in [11]. The
design consists of 16 units in the input layer, N neurons in
the output layer and is divided into three sections: the pro-
cessing unit array, the address generator and the controller.
Compared with an all software implementation, the design
achieves approximately 89% speed-up.
Other forms of neural networks have also been designed
and implemented on hardware architectures such as FPGA.
In [17], Nedjah et. al. proposed the design of a feed-
forward neural network on FPGA using a stochastic process
to implement the computation performed by the neurons.
In the implementation, the multiplication and addition of
stochastic values are achieved by an ensemble of XNOR and
AND gates respectively. In the proposed stochastic model,
a long probabilistic bit-stream whose density of set bits
is proportional to the encoded numeric value is used to
represent a number.
Merchant et. al. [13] designed an intrinsic embed-
ded online evolution system using Block-based neural
networks(BbNN)[15]; a grid based network structure of
interconnected block-based neurons. Each neuron block can
have up to 3 inputs, 3 outputs and 9 synaptic weights and
biases depending on the internal configuration determined by
the network structure. The design has been implemented on a
Xilinx Virtex II Pro FPGA running at 40MHz, using a LUT
based BbNN implemented on the block RAM.
A modified version of Boolean k-nearest neighbour
(BKNN), a supervised classifier using Boolean Neural Net-
works, with binary inputs and outputs, has been implemented
on FPGA by Liu et. al. [14]. The modification omits the
iterative classification procedure and is characterised by a
one-shot training and a single classification sweep to obtain
the answer. The design has been verified with Xilinx ISE 6,
targeting XC2S 100E Xilinx Spartan2E FPGA.
To entirely avoid numeric weights in the SOM, while
maintaining the level of performance as well as speed up in
training and using SOM for real-time application, Yamakawa
et. al. in [3] proposed a binary weighted vector SOM and
simulated it in hardware. The proposed SOM uses binary data
for both input and weight vectors. The Hamming distance
is used to calculate the distance between the input and
weight vectors, to identify the winning neuron in the network.
However, the weight vector is updated with priority given to
the most significant bit, thus attempting to treat the weights
as a direct representation of integer values.
The use of the binary weighted SOM on FPGA proves
to be very successful compared to the others. The design
of the binary weighted SOM is five times faster than the
real number weighted SOM in software and 140 times faster
in hardware[3]. This highlights a key principle - that the
most successful design will take account of the nature of the
hardware architecture. A novel binary SOM that follows this
principle is presented in the following section.
III.
PROPOSED
BINARY
SOM
ALGORITHM
In this section we introduce the binary Self Organizing
Map (bSOM). This takes a binary vector input, and maintains
tri-state vector weights with
{O,
1, # } as the possible values.
The
# represents a
"don't
care" state, which signifies that
the corresponding input vector bit may be either set or clear.
The weight vectors have the same length as the input binary
vector. The bSOM has the same essential structure as a stan-
dard SOM, with an input layer and a competitive layer - see
figure 1. Given a binary input vector
hi
==
(b
1
,
b
2
,
...
, b
n
) ,
all the units in the competitive layer are "connected" by
corresponding prototype vectors,
Wj
==
(Wjl,
Wj2,
...
,Wjn).
165

Input layer ne urons
I
A bit in the weight vector is only updated if it is
different from its corresponding input vector bit.
An update value is generated for each iteration during
training. This value decreases as training progresses.
A random number is then generated and if the number
is greater than the update value, the bit is updated.
A bit is updated by changing its value from
I to #, 0
to # or # to (0 or 1) depending on the input bit value.
Outpu
t layer neurons
Fig. I. Structure of the Original SOM[18].
where X i and
Wji
are the bit inverse
of
Xi
and
Wji
respec-
tively.
The bSOM trammg algorithm is discussed below, and
compared and contrasted with the original SOM algorithm
and Yamakawa's [3] implementation.
a
1 #
a
0.5
a
0.5
1
a 0.5 0.5
# 0.5 0.5
a
a
1 #
a
1-0.5p
a
0.5p
1
a
1-0.5p
0.5p
#
0
.5p
0
.5p
l-p
Fig. 2. The conditional Markov transition matrix
Fig. 3. The effective Markov transition matrix
X in (T
----t
>"I)X = 0 where
>..
= 1 and X is a vector
representing the three states (0, 1, #);
Xl
= X
2
= X
3
.
This
shows that increasing the number
of
training iterations makes
no significant difference to the final results, confirming that
the bSOM requires fewer iterations to converge, as compared
to the original SOM and that presented in [3]. The following
section gives the architectural and implementation features
of
the proposed bSOM algorithm.
The bit transition can be modelled as a Markov chain
with a conditional Markov transition matrix
(T)
as shown
in figure 2.
If
the probability
of
applying the conditional
Markov transition matrix is given as
p = 1 -
updat
e
rat
e.
The resulting effective Markov transition matrix
(T
e
)
for a bit
to change is as shown in figure 3.
If
T is a regular transition
matrix, then as
n approaches infinity,
T"
----t S, where S is
a matrix with constant vectors, as shown in figure 4. The
transition matrix settles after the 12th iteration. Solving for
(5)
D. Updating Weight Vectors
The winning neuron and its neighbourhood are updated
as shown in equation 3. In bSOM, a probabilistic update is
used. The probabilistic update in the bSOM is summarised
as follows :
B. Winner Take All (WTA)
Analogously to the original SOM, the unit with the small-
est Hamming distance to the input is defined as the winning
neuron; see equation 5. Since the weight is a tri-state vector,
a # is considered as a matching bit irrespective of the input
bit's value. The total number of
#'s
in the weight vector
is stored and used when selecting the winning unit in the
competitive layer. When there is a tie or when two neurons
have the same Hamming distance to the input vector, the
neuron with the minimum number
of
#'s
is chosen as the
better match .
C. Neighbourhood Selection
As in the original SOM and in [3], a neighbourhood
N
e
of
neurons around the winning neuron W e is selected
and updated with the winning neuron. The size
of
the
neighbourhood is inversely proportional to the iteration value.
A. Distance Computation
The Euclidean distance computation, equation 1, is used in
the original SOM to calculate the distance between the input
vector and the neuron prototype vectors. The implementation
of
this equation is not only difficult to realise in hardware,
but also unnecessary for binary vectors. Following [3], we
use the Hamming distance H, as shown in equation 4, for an
input vector
x and weight vector
Wj
.
166

T2
T I2
0.5000 0.2500 0.2500 0.3335 0.3333 0.3333
0.2500 0.50 00 0.2500 0.3333 0.3335 0.3333
0.2500 0.2500 0.5000 0.3333 0.3333 0.3335
T
13
T I4
0.3334 0.3333 0.3334 0.3334 0.3333 0.3333
0.3333 0.3334 0.3334 0.3333 0.3334 0.3333
0.3334 0.3334 0.3333 0.3333 0.3333 0.3334
many clock cycles as there are bits in the binary input vector
to complete the initialization. The hardware architecture
presented here has been test with binary image characters
of size 28 x 28, totalling 784 bits. The sizes of the input
and weight vectors are all set to 784 bits and can easily
be altered for any image size. The presented implementation
takes exactly 784 clock cycles to completely initialize all the
neurons.
Fig. 5. A block diagram of the design circuit.
Fig. 4. The conditional Markov transition matrix after the 2nd, 12th, 13th
and 14th iterations respectively.
TABLE I
SPECIFICATIO
NOF FPGA
CIRC
UIT
DESIG
N.
(6)
D. Neighbourhood update block
This block is use to select the neighbourhood of the
winning neuron and to update the neurons in the specified
region. The size of the neighbourhood reduces as training
progresses. In the hardware implementation the maximum
size of the neighbourhood is set to 4, and decreases as
training progresses. The iterations count determines the size
of the neighbourhood; for example, if the total number of
iterations is set to 100, then for the first 25 iterations the
neighbourhood is set to 4, then 3 in the second 25 iterations
(thus iteration 26 to 50) and then I in the last 25 iterations.
where k is the total number of bits in the input vector and
j E (1· .. 40) is the address of the neuron.
It
is worth noting
that the neuron vector is tri-state and the # state is ignored
when computing the Hamming distance . Thus, for a neuron
with 784 #'s, the Hamming distance will always be O.
2) Winning neuron unit:
This unit uses the results from
the Hamming distance computed in section IV-C.l to identify
the winning neuron. The design, as shown in figure 6, uses
a series of comparators to select the minimum of every
two input Hamming distances . For an implementation with
40 values, the design takes exactly seven clock cycles to
compute the node with the minimum Hamming distance .
C. Winner Take All block
This block is made up two parts, the Hamming distance
computation unit and the winning neuron unit.
I) Distance computation unit: This unit is used to com-
pute the Hamming distance between the input binary vector
and all the (40) neurons in the bSOM. The Hamming distance
between the input vector X i and a neuron
Wj,
as shown in
equation 6 is a bitwise operation and hence, takes as many
clock cycles as there are bits in the input vector. Since the
Hamming distance for all the 40 neurons are computed in
parallel, it takes exactly 784 clock cycles to compute the
Hamming distance for all the neurons in the network.
784
n., =
2::=
H
ijk
, where
Wjk
-I-
#.
k
=l
B. Pattern Input block
This block is used to acquire the binary input vector (or
binary image) from an external camera. The size of the input
vector 784 is pre-programmed and the input is complete
when a total of 784 bits is read from the camera. This binary
data is stored in the input vector and then passed onto the
WTA block for further processing.
40 neurons
784 bits
784 bits
Random
4 neurons
Network Size
Input vectors
Neuron vectors
Initial weights
Maximum neighbourhood
IV.
FPGA
ARCHITECTURE
AND IMPLEMENTATION
The most critical aspect of any hardware design is the
selection and design of the architecture that provides the most
efficient and effective implementation [9]. The specifications
of the circuit implemented on FPGA is given in table
I with its corresponding block diagram in figure 5. The
circuitry is made up of five basic blocks, namely the weight
initialization, pattern input, Winner Take All, neighbourhood
update and the display blocks.
Three of the five blocks run in parallel. These are the
pattern input, Winner Take All and the display (output) block.
The weight initialization block is triggered only at start-up.
Similarly, the neighbourhood update block is triggered when
a winning node
ui;
is identified for an input binary vector.
Details of the five basic blocks are presented in the following
sections.
A. Weight Initialization block
This block is used to randomly initialize all the weight
(neuron) vectors in the network. All the neurons in the
network are initialized in parallel bit-by-bit; hence it takes as
167

Fig. 6.
Structure
of
the WTA unit.
7 clock cycles
Resource Total Used
Name Total Used
Per.(%)
Flip
Flops
135,168
4,095
3
4
input
LUTs
135,168 18,387 13
bonded
lOBs
768
147 19
Occupied
Slices
67,584
11,468 16
RAM16s
288 43 14
A. Software Simulation
The software based simulation of the bSOM has been
achieved on a PC with a general purpose processor clocked
at 2.8GHz and 2GB of SDRAM. Initial experiments were
conducted to empirically select control parameters - number
of neuron, neighbourhood size and learning rate - for all
three implementations of the SOM (the conventional SOM,
strict binary SOM and tri-state SOM (bSOM)).
To determine the number of neurons required to represent
all 60,000 patterns in the dataset; see figure 7. We experi-
mented with different numbers of neurons from 10 to 100
in steps of 10. This experiment was primarily based on the
bSOM and also applies for the conventional SOM algorithms.
The results improve with increasing numbers of neurons until
performance begin to plateau at 80 neurons for the bSOM
(with minimal improvement thereafter).
initialization. The clock frequency of 40MHz also includes
the design for controlling the external logic for the VGA
and the camera. This is the actual hardware test and the
most stable clock frequency. The frequency could be much
higher without the requirement to interface these devices.
Table II gives the details of the resource utilization of the
FPGA implementation.
V.
EXPERIMENTAL
RESULTS
This section describes some of the experiments conducted
on the algorithm to verify its correctness, and compares
it with other implementations. The MNIST database of
handwritten digits[ 19], sample shown in figure 7 is used
to test the implementation on both PC simulation and on
the FPGA hardware architecture. A comparison on the PC
between the original SOM as presented by Kohonen in [1]
(herein referred to as the conventional SOM), a strictly binary
SOM and the proposed tri-state SOM (bSOM) algorithms is
also given in this section.
Even though the bSOM is meant for hardware imple-
mentation; for simulation and a fair comparison with the
conventional SOM, we have also implemented the bSOM on
a PC using MATLAB. To justify the use of tri-state
(0,1,
#)
rather than just binary (0,1), we have also implemented a
version of the binary SOM with only O's and 1's excluding
the third state
#. The solely binary implementation uses the
same rules as in the tri-state (or bSOM) implementation and
it is herein referred to as the strict binary SOM.
TABLE
II
IMPLEMENTATION
RESULTS
FOR THE
BSOM,
USING VIRTEX-4
XC4VLX160,
PACKAGE
FFl148
AND SPEED GRADE -10.
Muniplexer
S,
D--1
___
S2
c ENS
Multiplexer
S,
D-
S:1
C ENS
"
Minimum
hamming
Muniplexer
distance and
s,
address of the
S2
corresponding
c
ENS
neuron
The update requires a random number generator, which is
not only complex to implement in hardware but also compu-
tationally expensive. To avoid these costs, an LUT with 2000
randomly generated numbers has been implemented on the
FPGA. For a mismatched bit between the input vector and
the neuron to be updated, one of the 2000 values is selected
using the iteration count. If the number of iterations exceeds
2000, the last 10 bits of the iteration count is used to address
the random number in the LUT.
Mis-matching bits in the neuron vector are updated as
discussed in section III-D; thus a 1 changes to
#, a 0 changes
to
# and a # changes into 0 or 1 depending on the binary
input value. Note, a # is implemented as
'10'
or decimal 2.
E. Output display blocks
This block displays the neurons (weights) as binary image
on an external Video Graphics Array (VGA) for visual
verification. It runs in parallel with the input and WTA
blocks. It runs at the refresh rate for the VGA used, typically
60Hz.
The bSOM architecture discussed here has been imple-
mented on a Xilinx Virtex-4 FPGA chip (XC4VLX160)
with approximately 152,064 logic cells with embedded RAM
totalling 5,184 Kbits. The design and verification was accom-
plished using the Handel-C high level descriptive language.
Compilation and simulation were achieved using the Agility
DK design suite. Synthesis - the translation of abstract high-
level code into a gate-level net-list - was accomplished using
Xilinx ISE tools.
The entire design can be clocked up to 40MHz, making
it possible to train the binary Self Organizing Map with
up to 25,000 patterns of size 784bit in a second after
168

Citations
More filters
Journal ArticleDOI

A convolutional recursive modified Self Organizing Map for handwritten digits recognition

TL;DR: A Modified SOM for the vector quantization problem with improved initialization process and topology preservation is introduced and a Convolutional Recursive Modified SOM is developed and applied to the problem of handwritten digits recognition.
Journal ArticleDOI

Improved Learning Performance of Hardware Self-Organizing Map Using a Novel Neighborhood Function

TL;DR: This paper proposes a novel hardware friendly neighborhood function that is aimed to improve the vector quantization performance of hardware SOM and shows that the proposed function can improve SOM'squantization performance without additional hardware cost or slowing down the operating speed.
Proceedings ArticleDOI

FPGA implementation of Naive Bayes classifier for visual object recognition

TL;DR: A Naive Bayes classifier was simplified and implemented as a multi-class classifier for binary feature vectors using tri-states operation on FPGA, and the experimental results demonstrated both its higher performance and lower resource usage on the FPGa chip.
Journal ArticleDOI

Implementation and Applications of Tri-State Self-Organizing Maps on FPGA

TL;DR: The bSOM is well suited to FPGA implementation, trains quicker than the original self-organizing map (SOM), and can be used in clustering and classification problems with binary input data.
Proceedings ArticleDOI

Integer Self-Organizing Maps for Digital Hardware

TL;DR: The presented experiments demonstrated that the integer Self-Organizing Maps achieve better accuracy in a classification task when compared to the original tri-state Self- organizing Maps.
References
More filters
Book

Self-Organizing Maps

TL;DR: The Self-Organising Map (SOM) algorithm was introduced by the author in 1981 as mentioned in this paper, and many applications form one of the major approaches to the contemporary artificial neural networks field, and new technologies have already been based on it.
Journal ArticleDOI

The Amsterdam Library of Object Images

TL;DR: In order to capture the sensory variation in object recordings, this work systematically varied viewing angle, illumination angle, and illumination color for each object, and additionally captured wide-baseline stereo images.
Journal Article

Neural Network Implementation Using FPGA: Issues and Application

TL;DR: The issues involved in implementation of a multi-input neuron with linear/nonlinear excitation functions using FPGA, and the proposed method of implementation a neural network based application, namely, a Space vector modulator for a vector-controlled drive is presented.
Journal ArticleDOI

Block-based neural networks

TL;DR: Simulations show that the optimized BBNN can solve engineering problems such as pattern classification and mobile robot control.
Journal ArticleDOI

Self-organizing learning array

TL;DR: SOLAR is a sparsely connected, information theory-based learning machine, with a multilayer structure that has reconfigurable processing units and an evolvable system structure, which makes it an adaptive classification system for a variety of machine learning problems.
Related Papers (5)