Today I thought I'd talk about a paper fairly recent.
It was last year
A paper called "Concrete Problems in AI Safety"
Which is going to be related to the stuff I was talking about
before with the "Stop Button". It's got a bunch of authors; mostly from Google Brain
Google's AI research department, I guess..
Well a lot of it's AI research, but
specifically Google Brain and some
people from Stanford and Berkeley and
opening iEARN. Whatever... it's a
collaboration between a lot of different
authors
The idea of the paper is trying to lay out a set
of problems that we are able to
currently make progress on like if we're
concerned about this far-off sort of
super intelligence stuff.. Sure; it seems
important and it's interesting and
difficult and whatever, but it's quite
difficult to sit down and actually do
anything about it because we don't know
very much about what a super
intelligence would be like or how it
would be implemented or whatever....
The idea of this paper is that it... It lays
out some problems that we can tackle now
which will be helpful now and that I
think will be helpful later on as well
with more advanced AI systems and making
them safe as well. It lists five problems:
Avoiding negative side effects, which is
quite closely related to the stuff we've
been talking about before with the stop
button or the stamp collector. A lot of
the problems with that can be framed as
negative side effects. They do the thing
you ask them to but in the process of
doing that they do a lot of things but
you don't want them to. These are like
the robot running over the baby right?
Yeah, anything where it does the thing
you wanted it to, like it makes you the
cup of tea or it collects you stamps or
whatever, but in the process of doing
that, it also does things you don't want
it to do. So those are your negative side
effects. So that's the first of the
research areas is how do we avoid these
negative side effects.. Then there's
avoiding reward hacking, which is about
systems gaming their reward function. Doing something which technically counts
but isn't really what you intended the
reward function to be. There's a lot of
different ways that that can manifest
but this is like this is already a
common problem in machine learning
systems where you come up with your
evaluation function or your reward
function or whatever your objective
function and the system very carefully
optimizes to exactly what you wrote and
then you realize what you wrote isn't
what you meant. Scalable oversight is the
next one. It's a problem that human
beings have all the time, anytime you've
started a new job. You don't know what to
do and you have someone who does who's
supervising you. The question is what questions do you
ask and how many questions do you ask
because current machine learning systems
can learn pretty well if you give them a
million examples but you don't want your
robot to ask you a million questions, you
know. You want it to only ask a few
questions and use that information
efficiently to learn from you. Safe
exploration is the next one which is
about, well, about safely exploring the
range of possible actions. So, you will
want the system to experiment, you know,
try different things, try out different
approaches. That's the only way it's
going to find what's going to work but
there are some things that you don't
want it to try even once like the baby.
Right, right.. Yeah you don't want it to
say "What happens if I run over this
baby?" Do you want certain possible things
that it might consider trying to
actually not try at all because you
can't afford to have them happen even
once in the real world. Like a
thermonuclear war option; What happens if
I do this? You don't want it to try that.
Is that the sort of thing that.. Yeah, yeah..
I'm thinking of war games.. Yes, yeah.. yeah. Global
Thermal Nuclear War . It runs through a
simulation of every possible type of
nuclear war, right? But it does it in
simulation. You want your system not to
run through every possible type of
thermonuclear war in real life to find
out it doesn't work cause you can't.. It's
too unsafe to do that even once. The last
area to look into is robustness to
distributional shift. Yeah
It's a complicated term but the
concept is not. It's just that the
situation can change over time. So you
may end up; you may make something.
You train it; it performs well and then
things change to be different from the
training scenario and that is inherently
very difficult. It's something
humans struggle with. You
find yourself in a situation you've
never been in before
but the difference I think or one of the
useful things that humans do is, notice
that there's a problem a lot of current
machine learning systems. If
something changes underneath them
and their training is no longer useful
they have no way of knowing that. So they
continue being just as confident in
their answers that now make no sense
because they haven't noticed
that there's a change. So.. if we can't
make systems that can just react to
completely unforeseen circumstances, we
may be able to make systems that at
least can recognize that they're in
unforeseen circumstances and ask for
help and then maybe we have a scalable
supervision situation there where they
recognize the problem and that's when
they ask for help. I suppose a simplified
simplistic example of this is when you have
an out-of-date satnav and it doesn't seem
to realize that you happen to be doing
70 miles an hour over a plowed field because somebody else, you know, built a
road there. Yeah, exactly. The general
tendency of unless you program them
specifically not to; to just plow on with
what they think they should be doing.
Yeah. It can cause problems and in a large
scale heavily depended on , you know , in
this case, it's your sat-nav. So it's not
too big of a deal because it's not
actually driving the car and you know
what's wrong and you can ignore it
As AI systems become more
important and more integrated into
everything, that kind of thing, can become
a real problem.
Although, you would hope the car
doesn't take you in  plowed field in
first place. Yeah. Is it an open paper or does it
leave us with any answers? Yeah. So
the way it does all of these
is it gives a quick outline of what the
problem is. The example they usually use
is a cleaning robot like we've made this.
We've made a robot it's in an office or
something and it's cleaning up and then
they sort of framed the different
problems those things that could go
wrong in that scenario. So it's pretty
similar to they get me a cup of tea and
don't run over the baby type set up. It's
clean the office and, you know, not knock
anything over or destroy anything. And
then, for each one, the paper talks about
possible approaches to each problem and
things we can work on, basically. Things
that we don't know how to do yet but
which seem like they might be doable in
a year or two and some careful thought
This paper. Is this one for people to read? Yeah, really good. It doesn't cover
anything like the range of the problems
in AI safety but of the problems
specifically about avoiding accidents,
because all of these are
these are ways of creating possible
accidents, right? Possible causes of
accidents. There's all kinds of other
problems you've been having in AI that
don't fall under accidents but within
that area I think it covers everything
and it's quite readable. It's quite... It doesn't
require really high-level because it's
an overview paper, doesn't require
high-level AI understanding for the most
part. Anyone can read it and it's on
archive so you know it's freely
available. These guys now working on AI
safety, or did this then
They've hung their hat up. They've
written a paper and they're hoping
someone else is gonna sort it all out. These people are working on AI
safety right now but they're not the
only people. This paper was released in
summer of 2016, so it's been about a year
since it came out and since then there
have been more advances and some of the
problems posed have had really
interesting solutions or well.. Not
solutions, early work, that looks like it
could become a solution or approaches
new interesting ideas about ways to
tackle these problems. So I think as a
paper, it's already been successful in
stirring new research and giving people
a focus to build their AI safety research on
top of. So we just need to watch this space, right? Yeah, exactly..
