I recently created a simple snake-like game
using floating point arithmetic.
That means, both the snake and the food can
have arbitrary coordinates in 2-D space and
are not bound to a grid.
When you now want an artificial intelligence
to control the steering of the snake, making
it find the food efficiently and avoid biting
itself, simple pathfinding algorithms can
become really complicated.
Therefore, I used a neural network combined
with a genetic algorithm to train it, to approach
this challenge.
Visual cells of the snake serve as inputs
to the network.
The visual field of the snake is plus minus
120 degrees wide and split into 16 sectors
in total.
The snake can “see” three different types
of objects: The wall, itself and food.
Hence, there are 48 input neurons.
The closer an object is to the head of the
snake, the higher is the stimulation of the
neuron.
This data is then fed through a three stage
neural network.
Each neuron calculates a sum of the outputs
of the previous layer neurons weighted by
constant coefficients of the network.
It then applies a nonlinear activation function
to this sum to determine the neurons output
intensity.
At the end, the difference between the outputs
of the last two neurons is directly used as
the steering direction of the snake.
As activation function in each neuron, a classical
logistic sigmoid function is used.
This type of activation function is very common
in neural networks because of its useful properties
for backpropagation.
However, backpropagation not used in this
case to train the network.
Instead, in order to find optimal values for
the coefficients of the network, a genetic
algorithm is applied.
Genetic algorithms try to find an optimal
solution for a problem iteratively by recombining
and mutating existing candidates.
Here the candidates are snakes.
At any point in time, there is a fixed number
of snakes alive.
Each snake is unique by its DNA, which stores
the coefficients of the neural network and
therefore determines its behavior.
Additionally, another byte is stored for the
color of the snake, to spot inheritance easily.
In total, for snakes with the 48-16-16-2 network
structure, the DNA strand is a 1091-byte long
array.
As snakes die, due to biting itself in the
tail, crashing into the wall, or starvation,
the population of snakes will drop.
To counteract, each time a snake dies, the
algorithm will choose two snakes at random
as parents for a new snake.
In the parent selection process, a snake with
a higher fitness value will have a larger
probability to become chosen, than a snake
with a lower fitness.
The fitness of a snake is calculated as a
function of its current size and its hunger.
To generate the new DNA, the both DNA-arrays
of the parent snakes are crossed over.
The DNA strand is cut at random locations
and then put back together, alternating between
parent A and parent B. Notable here is, that
the crossover process is done bitwise, so
cuts may happen even in the middle of a byte,
changing its value vastly.
However, the amount of diversity is limited
by the DNA of the snakes in the very first
generation, which is seeded completely at
random.
The probability, that all parts of the optimal
DNA are present in the first generation is
quite low.
That’s why mutation helps in this kind of
evolution.
Here, the mutation is implemented as randomly
flipping bits in the DNA array during the
copying procedure.
The rate at that these wanted errors occur
can be a constant, or in this case, inversely
proportional to the fitness of the best snake
currently alive.
That means, when the snakes perform quite
bad at survival, the mutation will happen
more frequently and thus add more variety
to the behavior of the next generation.
Let’s have a look at an example in the simulation.
The graph in the bottom left of the screen
shows the maximum fitness of the snakes currently
alive.
Notable is also, that most snakes have the
quirk of circling around in one direction.
This is due to the very random coefficients
in the first few generations.
Also, it is important to say, that the snakes
cannot interact or collide with each other.
They only share the same food.
Now we’ve reached the point, where somewhat
intelligent snakes have been born.
Let’s watch the currently best snake performing
alone and look at its neural network in real time.
In this debug-view, the neural network is
visualized.
Neuron stimulation is shown as the brightness
of the neuron.
The synapses are displayed as colored lines.
Red means a negative or inhibiting coefficient,
green means a positive or stimulating coefficient.
The thickness of the line represents the absolute
value of the coefficient.
Since snakes become slower, the longer they
get, at some point there will be no more noticeable
improvement and progress stops due to starvation.
Before I end this presentation, I want to
express thanks to the Youtube channel Computerphile
especially for the videos about neural networks
with Michael Pound from Nottingham University
and to Daniel Shiffman from “Coding Rainbow”
for the video series about genetic algorithms.
You can find links to both in the video description.
