Neural Network Structures
Lateral
Inhibition
Adapted By: Yash Gad
Written By: Thomas Anastasio
Grade Level: 9 - 12
Subjects: Applied
Mathematics – Matrix Algebra
Biology – Neuroscience
Description: A look at one of the most fundamental and
commonly seen neural computations involving groups of neurons in a network.
Objectives: Students will be able to analyze the
outputs of sets of laterally inhibited neurons in different situations, and
represent these using simple functions.
Material: (Optional) Computers with
mathematics software, such as Mathematica or Matlab
Background
Lateral inhibition is the most commonly
known form of computation by neural networks. Laterally inhibitory
architectures are characterized by inhibition off to the sides of some form of
straight ahead excitation.

Lateral inhibition finds many uses
throughout the brain. In this unit, we will study computational models of
lateral inhibitory networks that process signals spatially temporally.
Spatial processing is a result of the
interactions between neighboring input neurons, which will have overlapping fields
of activity.
Temporal processing is a result of these
interactions occurring over time, with the activity from the output layer being
fed back to the input layer through recurrent connections.
Scenario I – Basic Lateral Inhibition
The goal will be to construct a laterally
inhibitory network, and assign vectors for the input and output units, and a
matrix to declare the connection weights between them.
From this point onwards, networks
will now be too large to make a diagram for. We will instead rely purely on the
matrix representations of the units and their connections.
To begin, create 11 input neurons and 11
output neurons. This will be a row of 11 elements for each. For the matrix
describing lateral inhibition, make each input unit send a strong excitatory
projection to its corresponding output unit (right “below”), and inhibitory
projections to the output units on either side. Since we don’t want there to be
an imbalance on the edges, we will say that output neurons 1 and 11 are
neighbors (we will see how to express this shortly).
When there was only one projection from an
input neuron to its output neuron, we could describe the weights with a simple
row of terms. For this and future assignments, we will want instead to dedicate
one row of weights to each input neuron. This row will have a number of
elements equal to the number of output neurons (which allows us to use more or
less outputs than inputs).
For this scenario, the weights for Input
neuron 1 are given by:
Weight1 = [2 –1 0 0
0 0 0 0 0
0 –1]
Likewise, the weights for Input neuron 2 are
given by:
Weight2 = [–1 2 –1 0
0 0 0 0 0
0 0]
The rest of the connection matrix can be
written accordingly, so that we have:

Now that we have the basic architecture in
place, it is time to actually have it do something. We will feed this network a
step function (which is exactly what is sounds like – a function that has some
fixed value for a specific range, and is zero elsewhere). For this example, we
will introduce a value of 3 to units 4 to 8, and a value of zero to all others.
When we do the computation, this is the result we see:

The blue curve (the top curve if you are
viewing this in black and white) is the original step that we presented. The
slopes on the sides are simply artifacts of the way MatLab plots graphs – this
is still a step function. The green curve (the bottom curve) represents a plot
of the activities of the output neurons. For this particular example, we have
not set the threshold we described in the first unit. When we look at the
output without any kind of filter, we can clearly see the kind of computation
being performed spatially.
This particular network configuration has
taken the step function, and performed a second derivative. The output graph is
this second derivative, only rotated around the x-axis. In a more general
sense, the input neurons have become sensitive to changes in the input signal.
A real world example to relate this to would
be a visual scene. If the step represented the presence of the visual scene,
the output would be sensitive to the “edges” of the scene.
Scenario II – Difference of Gaussian
Weight profiles
The Difference of Gaussian (DOG) profile is
a common connectivity profile in neural network models. It is easily
constructed as simply the difference of 2 Gaussian curves, with different
variances. A short tutorial on Gaussian curves can be found at:
http://mathworld.wolfram.com/NormalDistribution.html
The first Gaussian we will
use, g, with a variance (var) of 0.75 can be constructed using the following
formula:
![]()
Another Gaussian, d,
with a variance of 1.5, can be made the same way. Both of these will be
discrete Gaussians, if we use only integers (a nice range would be something like –5 to 5). A DOG p can be
constructed using:
p = g – (0.5 * d)
Note that to make the DOG
profile, the broader Gaussian (the one with the larger variance) is being
scaled and subtracted from the narrower Gaussian (the one with the smaller variance).
This profile can be incorporated into the weight matrix, but we just need to
flip the first and second halves of the graph so that the input and outputs are
aligned. The result looks something like this:

Using the step input you
generated before, you can now look at the output of this network with a DOG
weight profile.

The output appears as a smoothed version of
the flipped second derivative that was observed in Scenario I. This makes
sense, since the connectivity profile used in this case is a smoothed version
of the original profile. We will see an even more pronounced effect when we use
Gaussian pairs with larger variances (such as 1.5 and 3.0, or 3.0 and 6.0).

In an example that you may want to try on
your own, or use as an advanced problem, we built a larger version of the above
scenario (using 51 outputs and 51 inputs). If we presented our 3 DOG weight
distributions with a spiky input (the top graph in blue), the result is the
output seen in the 3 bottom graphs. Effectively what happens is that broader
DOG distributions are less able to follow rapid changes in the input
pattern.
Scenario III
The lateral inhibition networks we worked
with before are quite powerful spatial processors, but to get temporal
processing we need recurrent (i.e. feedback) connections. In general, recurrent
connections can occur from a layer to a previous layer, or between units in the
same layer, or both.
Recurrent lateral inhibitory networks
typically employ recurrent connections between the units in the output layer,
as well as the “feed-forward” type connections from the input to the output
layer that we previously worked with.
As for feed-forward connection weights, recurrent
connection weights are represented in a matrix. The states of the output units
in such a recurrent network will then be a function both of feed-forward and
recurrent connections. It is necessary to represent (discrete) time steps in
recurrent networks, because the states of the output units at time step (n+1)
are functions of the states of the input and output units at time step n.
This can be expressed by:
(1)
where u and x
are the input and output unit state vectors, and V and W
are the feed-forward and recurrent connection weight matrices. y(x) is a nonlinear function meant to represent
the real limits on neural firing-rate. It ensures that the elements of x
are greater than zero but less than some saturation level a:
(2).
When the recurrent connection profile
(making up matrix W) is a DOG having both positive and negative
values, the recurrent connections form both positive and negative feedback
loops. The result of this for network dynamics is that some units can be driven
to the saturation limit a while other units are driven to zero. The
overall strength of the feedback can be controlled by the value of parameter b. The network is said to relax into an
activity pattern that is specific for a given, constant input. Because time in
the network is discrete, the state equation describing relaxation in the
network (1) can be solved iteratively.
Now for an example. Construct a recurrent,
lateral inhibitory network. To make things simple, set the feed-forward weight
matrix to be the identity matrix, and make the recurrent weight profiles a DOG.
Use variances of 3.0 and 15.0 for the Gaussians, and scale the broader curve by
0.3. Remember to rotate the 2 halves of the result. Make an input vector which
is the positive half-cycle of a sinusoid. Set the saturation level a to
10, and compute the output to this input as the recurrent network relaxes for
20 iterations.
Set rate parameter b to 0.1 for your first experiment, and 1.0 for the
second.
The result for an example we did for 51
input and 51 output units is:

The curve on the left is the
first experiment, and the second is on the right. The blue curve is the initial
output (at the first time step), and the other graphs represent subsequent time
steps. You will probably notice several things:
1) The curves seem to converge to a steady level of
activity across the network over time
2) The first experiment didn’t really have much of an
effect
3) In the second experiment, something really strange
happened J
In the second experiment,
those inputs that received inputs near the peak of the sinusoid were driven to
saturation over time. The rest were driven to zero. This is what is called a
winner-take-all network. This type of pattern is frequently seen in the brain
when you only want a special subset of the output neurons to respond to your
input.