Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
This paper from DeepMind is about taking a
bunch of learning algorithms and torturing
them with millions of classic math questions
to find out if they can solve them.
Sounds great, right?
I wonder what kind of math questions would
an AI find easy to solve?
What percentage of these can a good learning
algorithm answer today?
Worry not, we’ll discuss some of the results
at the end of this video.
These kinds of problems are typically solved
by recurrent neural networks that are able
to read and produce sequences of data, and
to even begin to understand what the question
is here, an AI would have to understand the
concept of functions, variables, arithmetic
operators, and of course, the words that form
the question itself.
It has to learn planning and precedence, that
is, in what order do we evaluate such an expression,
and it has to have some sort of memory, in
which it can store the intermediate results.
The main goal of this paper is to describe
a dataset that is designed in a very specific
way to be able to benchmark the mathematical
reasoning abilities of an AI.
So how do we do that?
First, it is made in way that it’s very
difficult to solve for someone without generalized
knowledge.
Imagine the kind of student at school who
memorized everything from the textbooks, but
has no understanding of the underlying tasks,
and if the teacher changes just one number
in a question, the student is unable to solve
the problem.
We all met that kind of student, right?
Well, this test is designed in a way that
students like these should fail at it.
Of course, in our case, the student is the
AI.
Second, the questions should be modular.
This is a huge advantage because a large number
of these questions can be generated procedurally
by adding a different combination of subtasks,
such as additions, function evaluations, and
more.
An additional advantage of this is that we
can easily control the difficulty of these
questions - the more modules we use, typically,
the more difficult the question gets.
Third, the questions and answers should be
able to come in any form.
This is an advantage, because the AI has to
not only understand the mathematical expressions,
but also focus on what exactly we wish to
know about them.
This also means that the question itself can
be about factorization, where the answer is
expected to be either true or false.
And the algorithm is not told that we are
looking for a true or false answer, it has
to be able to infer this from the question
itself.
And to be able to tackle all this properly,
with this paper, the authors released 2 million
of these questions for training an AI free
of charge to foster more future research in
this direction.
I wonder what percentage of these can a good
learning algorithm answer today?
Let’s have a look at some results!
A neural network model that goes by the name
Transformer network produced the best results
by being able to answer 50% of the questions.
This you find in the extrapolation column
here.
When you look at the interpolation column,
you see that it successfully answered 76%
of the questions.
So which one is it, 50% of 76%?
Actually, both.
The difference is that interpolation means
that the numbers in these questions were within
the bounds that was seen in the training data,
where extrapolation means that some of these
numbers are potentially much larger or smaller
than others that the AI has seen in the training
examples.
I would say that given the difficulty of just
even understanding what these questions are,
these are really great results.
Generally, in the future, we will be looking
for algorithms that do well on the extrapolation
tasks, because these are the AIs that have
knowledge that generalizes well.
So, which tasks were easy and which were difficult?
Interestingly, the AI has had similar difficulties
as we, fellow humans have, namely, rounding
decimals and integers, comparisons, basic
algebra was quite easy for it, whereas detecting
primality and factorization were not very
accurate.
I will keep an eye out on improvements in
this area, if you are interested to hear more
about it, make sure to subscribe to this series.
And if you just pushed the red button, you
may think you are subscribed, but you are
not.
You are just kind of subscribed.
Make sure to also click the bell icon to not
miss these future episodes.
Also, please make sure to read the paper,
it is quite readable and contains a lot more
really cool insights about this dataset and
the experiments.
As always, the link is available in the video
description.
Thanks for watching and for your generous
support, and I'll see you next time!
