My name is Katja Hofmann, I'm a principal
researcher at Microsoft Research in Cambridge,
where I lead a group that focuses on game
intelligence.
I got into this field, oh gosh, quite a long
time ago when we didn't call it AI as much,
we were calling this machine learning.
And I started my research career looking at
how search engines could learn directly from
their users.
When I wanted to do more fundamental research
in settings where we could explore, for example,
more complex behaviours, this search for the
space where we could explore this, brought
me in the direction of video games, and especially
Minecraft, being so open-ended, turned out
to be a fantastic platform for driving this
research, because we could set up a huge variety
of tasks which would allow us to really test
out the boundaries.
Rather than working with applications that
are already there, we could imagine what these
applications could look like in the future.
I'm William Guss, and I'm a research scientist
at OpenAI and a PhD student at Carnegie Mellon,
studying reinforcement learning.
So, I grew up in Salt Lake City, Utah, and
I started getting into gaming because of my brother.
He showed me Minecraft, like early on, was
really excited about it.
In that process I was kind of always yearning
for the world to be more reactive, having
an NPC that actually would play with me like
another human would or would actually interact
with us in that way.
I wanted to do more with the game and so I
started to learn how to program, and the story
from that is just like I realised programming
is this wonderful thing in the same way that
Minecraft was, like I can build whatever I
want.
It's the same kind of joy.
And so I actually did some research and there
was this algorithm for codifying some artificial
neural networks, and having that interface
with whatever sort of programming that you're
already doing.
And it was really crazy to me because all
the programs I'd written up to that point
were just, like, "I'm making a calculator"
or "I'm making some sort of application" and
you know exactly what's gonna happen.
But when I codified this neural network and
I gave it data and it started learning, it
was like, it was alive.
It's like, imagine your Lego set coming to
life.
And that's the big question, it's like how
far can you push these techniques and how
close can you get to like societies in Minecraft,
all of these things that you would expect
humans to do, but having a neural network
do it itself.
I think once you're behaving like humans in
all these settings, you're already able to
ask questions about intelligence that you
weren't able to ask before.
Then I can say something about, well ok, I have
an agent that's very close to a human and
when humans, or these agents, are placed in
large scale environments like this they behave
like this, I can make judgements about human
behaviour, that I couldn't before.
You know, I think that's like the wild west,
that's a new type of research.
So having that sort of manifest is quite interesting,
I think, in terms of learning about ourselves.
So we started Project Malmo about four or
five years ago, and our goal was really to
develop a general-purpose experimentation
platform, especially for AI researchers and
students and other enthusiasts, so they could
create their own experiments easily.
And we envisioned this as a platform that
would grow with our understanding of what
AI technology can do. For example, we started
from relatively simple examples of teaching
agents to navigate small rooms, avoiding lava,
and then over time these tasks have become
more and more complex.
Reinforcement learning is one of a wide range
of artificial intelligence techniques, it's
a type of machine learning where we think
of systems for agents that are autonomous
in a way, in the sense that they take actions
in their environment and then they learn from
the consequences of these actions.
For a very long time, there was this notion
that reinforcement learning agents would purely
learn a single given task, so for example
it wouldn't necessarily understand that lava
is bad, it would just remember that I stepped
on this square in this situation and it was
bad and so I shouldn't do that again.
Now this generality is really the question
of, what structure, or maybe what training
approaches do we need to put in place so that
agents learn to understand the world in a
more general, more meaningful way, so that
they can learn that lava is bad and that there
maybe are types of material that it cannot
step on, and then when it's put in a new environment
and it encounters, for example, water, that
it might be using some information about what
happens with lava.
So generalisability is a really important
challenge that needs to be addressed before
many new real-world applications can be enabled.
If we just take this example of some household
robotics, let's say you want a robot that
is able to clean up in your home.
Well, it probably wasn't trained on your home,
it needs to be able to understand what a carpet
is, what a floor is, what the typical items
in a kitchen are and what you do with them.
And that is the kind of general knowledge
that we want these agents to learn to build
up in the long term.
So, Malmo is a research platform, and ours
is an actual sort of direct research project,
which is like, how do we build AGI using human
data in Minecraft?
So we built this challenge called the MineRL
challenge.
And so we partnered with Microsoft and the
whole MineRL competition in itself is basically,
"how do we show that using reinforcement learning
isn't the best way to do this?!"
[Katja] So we met up for coffee and he was basically
suggesting a version of that competition,
he was sketching out what it would mean to
collect a large amount of data so that people
could try and see how to best use human demonstrations as a starting point for training AI agents
to do these much more complex tasks than what
we had tried so far.
And in particular we settled on the idea of
having agents learn how to mine for a diamond.
And effectively, what we did is we created
this dataset of like a simple task to humans,
which is obtaining a diamond, and then we
said ok, AI researchers: instead of trying
to solve this task with reinforcement learning,
the MineRL challenge said ok, take this data,
train an agent to solve this task, but you
only have four days to do it, and one computer
to do it on.
It was sort of this really cool moment in
RL competitions because for the first time
we were seeing the top techniques being so
drastically different and so diverse, not
only in the algorithms that people were coming
up with, but also the behaviours of the agents
that people were developing.
So it was a completely like, lack of hard
coding, it was a really cool result I think.
[Katja] And it was really really humbling to see,
end of last year, the exciting progress that
all the participating teams had made towards
actually developing new RL algorithms that
could address this challenge.
[Will] And actually getting very very close to attaining
a diamond, which is just like something we
didn't expect at all.
And I think where we see breakthroughs in
science is when we have people coming from
left field, out of nowhere, and actually implementing
ideas and trying stuff that you really wouldn't expect.
Microsoft generously sponsored a lot of the
compute, and so we saw people from universities
that we had never seen compete in AI competitions
before, competing.
Because people had access.
So with AI in general, I don't believe that
technology is either good or bad.
I think that people use technology for different
purposes, some better than others.
And as many more people come to understand
the underlying technology, they can for themselves
have a better understanding of what the opportunities
are, what the limitations, and maybe what
the risks are.
So I think there is a really important role
for education to get as wide a set of people
as possible contributing to that conversation
of what kinds of AI techniques we want and
what kinds of applications we want in our
lives.
One of the latest developments around Project
Malmo is a collaboration that we have started
with Azure Machine Learning.
As I mentioned, I believe it's really valuable
for many many people to understand the potential,
the uses and also the limitations of new AI
technologies, so that everyone can figure
out what kinds of applications they might
want, and maybe how AI and other related technologies
might help in their daily lives.
Now, in this collaboration with Azure ML,
we have built samples that showcase how Azure
Machine Learning can be used to train reinforcement
learning agents.
And one of the examples we have recently released
is using Project Malmo.
So you can go to the website that's shown
in the description of this video and you can
try this out for yourself today, and train
your first reinforcement learning agent.
