During an internship for a museum, I offered
the idea to use computer vision and machine
learning techniques to prototype an idea about
a cool tool that could be used by students
of history to help them translate hieroglyph
based languages on murals and items.
I wanted to build a predictive model that
could be able to interpret a hieroglyph into
its phoneme.
I decided to use, for our predictive model,
a neural network adapted for its use to images.
In our current situation, our prototype must
demonstrate our neural network's capacity
to learn and adapt to our symbol interpretation
problem.
Depending upon the museum's satisfaction,
the development of a full version could be
decided.
Some egyptian hieroglyphs have been extracted,
so we could have some data to work on.
Later on, each hieroglyph will be labelised
with its phoneme, and fed to our predictive
model so it could learn to identify and classify
them.
Now let's look at the structure of our neural
network.
This network has a 32 per 32 neurons input
layer, a 16 per 16 neurons hidden layer, and
25 or more neurons output layer.
The hieroglyph are converted into binary images
before being presented in input to our network.
The egyptian hieroglyphs being very distinct,
the binary convertion still permits us to
recognize them perfectly, while simplifing
the learning process for the network.
The neurons in the hidden layer are connected
to local patches of neurons, on the input
layer.
This is closer to the biological design, how
neurons interact in our visual systems.
The network will be fed a training set of
the hieroglyphs.
After training, the neural network should
be ready to start recognizing the symbols
it has been trained on.
One of the factors of a neural network's efficiency
is the size and variety of its training set.
Let's have a look at the training process.
This is the beginning of the training process.
Notice that the outputs are at random values,
and the network make mistakes, in red, at
it first tries.
Let's continue the training process.
Each training pass increases the network's
efficiency.
each time you see a red comment on the left,
the network makes a mistake, and guesses right
when the comment is in green.
Notice how, as the training progresses, there
are more and more successful guesses.
At the end of the training process, the train
cases are correctly recognized.
Notice the output cases being
close to 1 or -1, this indicates the network
certainty.
This means the model has successfully been
trained.
Now let's check its efficiency with a collection
of test cases.
The test cases, extracted from picture of
murals, that you can see on the left, are
being fed to the model at once.
Let's see what our predictive model think
they are.
This picture, 21.bmp, has successfully been
identified as the phoneme "un", symbolized
by a chick.
This one is "ez", the scepter.
Overall on this test set , the identification
success rate is approximately 75%.
The neural network, at this early experimental
stage, has been trained with just 25 hieroglyphs,
and is giving satisfacting results, as well
as good scalability prospects.
Now that we tested it with a set of real hieroglyphs,
let's try something else with our network.
Let's draw some doodle hieroglyphs, *by hand*,
and see how well it is able to recognize them.
Here you can see me drawing by mouse a few
hieroglyphs on gimp.
Notice I am not the best artist, I have seen
them quite a few times , so let's say they
look like that, more or less.
The chick hieroglyph has successfully been
identified despite the approximate drawing
that can be seen in the bottom left.
As for the Luth's hieroglyph, and the ankh's.
Some of them aren't correctly recognized,
this is as much of a lack of training data
factor, as of my poor drawing skills.
Later on, this technology could be used by
the professionals of history to help their
translation process, and I hope a finished
product could even be used by tourists to
have a richer and interactive experience when
visiting ancient sites directly with the help
of their phones.
A complete product would represent a very
difficult challenge and probably months, or
even years of development.
Such product would include proper extraction
of the hieroglyphs from the image, accurate
prediction of each symbol, then finally, running
the translated phonemes into a dictionary
to obtain a proper translation.
