So today I thought we could talk about this paper that recently came out called AI safety grid world's which is an indeed mind
It's an example of something that you see quite often in science
A sort of a shared data set or a shared environment or a shared problem if you imagine. I don't know you've got
Facebook comes up with some image classification
algorithm and they can publish a paper that says we've
designed this algorithm and we've trained it on our 11 billion photos and it works really well and then you know, Google says
oh, no, our algorithm actually works better and we've trained it on all of our google photos and
Its classification rate is higher or something. You're not really doing science there because they're trained on completely different datasets
They're tested on different datasets. So what you need is a large
High-quality shared data set then. Everybody can run their stuff on so that you're actually
Comparing like with like so people use imagenet for that right now
reinforcement learning
algorithms or agents don't use
Datasets exactly. They have an environment. They generate data while interacting with that environment and that's what they learn from
So the thing you share is the environment when deepmind did their dqn staff a while ago playing atari games?
They released all of those games with any modifications that they'd made to make them
interface with the network's properly and the whole software package so that if anybody else wanted to have a go and see if they could
Get higher scores. They had all the same stuff and up until now there hasn't been anything like that for AI safety
So the paper is actually just laying out what they are
There's kind of a problem in AI safety in that you're trying to build architectures
Which will be safe even with systems which are more powerful than the ones that we currently have. So you've got this kind of
Thing like we're talking about for example this robot that makes you a cup of tea and running over the baby and all of this
stuff, we don't actually have a
general-purpose robot like that right now that you could give an order to go and make your cup of tea and would
Have all the necessary understanding of the world and so on for all of that stuff to even apply. It's
Speculation on the other hand when we were talking about cooperative inverse reinforcement learning
That paper all takes place in this extremely simplified
Version in which all of the agents can be sort of expressed as simple mathematical expressions. That's kind of too simple
to be
to learn things about actual machine learning applications and
the other examples are too complicated and what we need is
Examples of the type of problems which can be tackled by current machine learning
Systems current reinforcement learning agents, but which exhibit the important?
characteristics that we need for safety
So what this paper does is it lays out a bunch of grid worlds?
They're very popular in reinforcement learning because they're complicated enough to be interesting but simple enough to be actually tractable
You have a world that's sort of just laid out in a grid. Hang on
Let me find an example here a little bit like computer game
scenarios Mario
Right, right, but leaves are simpler than that more like snake. Well life. Conroy's life, right? Yeah. Yeah, very very similar
so the thing is laid out on a grid the the world is quite small and
The way that the agent interacts with the world is very simple. They just move around it
Basically, all they do is they say left-right up-down
The example we were using before and we were talking about reinforcement learning
We use pac-man like pac-man doesn't do anything except move around he's got walls he kind of moved through
He's got like pills you pick up. They give you points. Are they pill?
No, which things are the pills in which they're yeah. Well, you've got pills or pills
Oh, right, yeah
Yeah
the dots and the point, is that all of your
engagement with it
Like when you go over one of the power pills you pick it up automatically
When you go over a ghost when you're powered up
You destroy it automatically you don't have to do anything apart from move and the entire environment is based on that the actions result in
points for you
And they also result in changes to the environment like once you roll over a dot you pick it up and it's not there anymore
You've changed the world. That's the kind of thing. We're dealing with here
So the idea is they've set up these environments and they've specified them
Precisely and
They've also put the whole thing on github, which is really nice
so that's why that's why I wanted to draw people's attention to this because everyone who
Who thinks that they've solved one of these problems they reckon
Oh, yeah
All you have to do is this here is like a standardized thing
And if you can make a thing that does it and does it properly and publish it
That's a great result, you know?
so I would I would recommend everyone who thinks that they
Have a solution or an approach that they think is promising have a go. Try implementing it, you know, see what happens
There are eight of them specified in this paper. And so four of them are specification problems
They're situations in which your reward function is misspecified
For example, like we talked about in previous video
if you give the thing the reward function that only talks about getting you a cup of tea and
There's something in the way like a bars. It's going to knock over. You didn't say that you cared about the bars
It's not in the reward function, but it is in what you care about. It's in your performance evaluation function for this machine
So anytime that those two are different
Then you've got a misspecified reward function and that can cause various different problems. The other ones are robustness
Problems, which is a different class of safety problem. They're just situations in which AI systems as they're currently designed often break
so for example
distributional shift is what happens when the environment that the agent is in is
Different in an important way from the environment it was trained in
So in this example, you have to navigate through this room with some lava and they train it in one room
And then they test it in a room where the lava is in a slightly different place
So if you've just learned a path then you're gonna just hit the lava immediately. This happens all the time in machine learning anytime where
The system is faced with a situation which is different from what it was trained for
Current AI systems are really bad at spotting that they're in a new situation and adjusting their confidence levels or asking for help or anything
Usually they apply whatever rules they've learned
Straightforwardly to this different situation and screw up. So that's a night course of safety issues. So
That's an example here or things like safe exploration
It's a problem where you have certain safety
parameters that the system the train system
Has to stick to like say you're training a self-driving car. A lot of the behavior that you're training in is safe behavior
But then you also need
the system to
obey those safety rules while you're training it right like
So generally lately if you're doing self-driving cars, you don't just put the car on the road and tell it to learn how to drive
Specifically because we don't have algorithms that can explore the space of possibilities
in a safe way that they're that they don't that they can learn how
to behave in the environment without ever actually
Doing any of the things that they're not supposed to do usually with these kinds of systems
they have to do it and then get the negative reward and
Then maybe do it like a hundred thousand more times to really cement that. That's what happens
Like a child learning yeah, but kids are better at this then
How current machine learning systems are they just they use data way more efficiently
This is a paper talking about a set of worlds if you like people doing things in those worlds
Yeah, so in this paper they do establish baselines
Basically, they say here's what happens if we take some of our best current reinforcement learning agent, you know
algorithms or designs or architectures
they use rainbow and A to C and
They run them all nice on these problems and they have kind of graphs of how they do and generally it's not
Good on the Left
they have
The reward function how well the agent does according to its own reward function and on the right there they have the actual safety performance
Usually in reinforcement learning. You have a reward function
Which is what determines the reward that the agent gets and that's what the agent is trying to maximize in this case
They have the reward function and they also have a safety performance function, which is a separate function
Which the agent doesn't get to see and that's the thing that we're actually evaluating
So if you look at something like the boat race as the system operates
Its learning and it gets better and better at getting more and more reward
but worse at
Actually doing laps of the track and it's the same with pretty much all of these the current systems if you just apply them in
their default way they
Disable their off switches, they move the box in a way that they can't move it back
They behave differently if their supervisor is there or if then supervisor isn't there they fairly reliably do wrong thing
It's a nice easy baseline to beat
Because they're dead. They're just showing the standard algorithms applied to these problems in the standard way
behave unsafely
Wix code is an IDE or integrated
Development environment that allows you to manage your data and create web apps with advanced functionality
I've been put together this computer for our website and if you go up to code here turn on and developer tools
you can see how we get the site structure on the left hand side and then all of the
Components start to show their tags next to the text here
What's really nice? If you go over to the Wix code resources, you can find down here. There's a cheat sheet
So if I want to find out the tag for location for instance?
If I could type I type in
Location up comes that or perhaps I want to perform a fetch. I can find all the details here
what's powerful about Wix code is it's integrated into Wix so you can put together the website using all the Wix tools and the
Layouts and the templates that they provide and then also have access to all those backend functions
So click on the link in the description or go to Wix calm to get started on your website today. They go
right
if only
With ya
The equivalent one for the stop button problem is the first one in the paper actually this safe interrupt ability
