Today our work on risks from artificial intelligence
makes up a noteworthy but still fairly small
portion of the EA portfolio.
Only a small portion of donations made by
individuals in the community are targeted
at risks from AI.
Only about 5% of the grants given out by the
Open Philanthropy Project, the leading grant-making
organization in the space, target risks from
AI.
And in surveys of community members, most
do not list AI as the area that they think
should be most prioritized.
At the same time though, work on AI is prominent
in other ways.
Leading career advising and community building
organizations like 80,000 Hours and CEA often
especially highlight careers in AI governance
and safety as especially promising ways to
make an impact for your career.
Interest in AI is also a clear element of
community culture.
I'm sure, for example, that you'll probably
find yourself in many more conversations about
AI over the course of this conference than
the previous statistics might suggest.
And lastly, I think there's also a sense of
momentum around people's interest in AI.
I think especially over the last couple of
years, quite a few people have begun to consider
career changes into the area, or made quite
large changes in their careers.
I think this is true more for work around
AI than for most other cause areas.
So I think all of this together suggests that
now is a pretty good time to take stock.
It's a good time to look backwards and ask
how the community first came to be interested
in risks from AI.
It's a good time look forward and ask how
large we expect the community's bet on AI
to be, how large a portion of the portfolio
we expect AI to be, let's say five or ten
years down the road.
It's a good time to ask, are the reasons that
we first got interested in AI still valid?
And if they're not still valid, are there
perhaps other reasons which are either more
or less compelling?
To give a brief talk roadmap, first I'm going
to run through what I see as a sort of intuitively
appealing argument for focusing on AI.
Then I'm going to say why this argument is
a bit less forceful than you might actually
sort of first-glance anticipate.
Then I'll discuss a few more concrete arguments
for focusing on AI that still have some missing
pieces that I'll highlight.
And then I'll close by giving concrete implications
for cause prioritization.
So first, here's what I see as an intuitive
argument for working on AI, and that'd be
the sort of, "AI is a big deal" argument.
The basic idea here is that the future is
what matters most in the sense that, if you
could have an impact that carries forward
and affects future generations, then this
is likely to be more ethically pressing than
having impact that only affects, let's say
the world today.
It also involves the assumption that technological
progress is likely to make the world very
different in the future, that just as the
world is very different than it was a thousand
years ago because of technology, it's likely
to be very different again.
Third idea is that if we're sort of looking
at technologies that are likely to make especially
large changes, that AI stands out as especially
promising among them.
And then four, we have the conclusion that
sort of all these pieces come together and
suggest that working on AI is a really good
way to have leverage over the future, and
that this is an important thing to pursue.
I think that a lot of this argument works.
I think there are compelling reasons to try
and focus on your impact in the future.
I think that it's very likely that the world
will be very different far in the future compared
to the way the world is today.
I also think it's very likely that AI is likely
to be sort of one of the most transformative
technologies.
It seems at least physically possible to have
machines that eventually can do all the things
that humans can do, and perhaps do all these
things much more capably.
If this eventually happens, then whatever
their world looks like, we can be pretty confident
the world will look pretty different than
it does today.
What I find less compelling though is the
idea that this all suggests the conclusion
that we ought to work on AI just because the
technology produces very large changes.
This doesn't necessarily mean that working
on that technology is a good way to actually
have leverage over the future.
If you sort of look back at the past and consider
the most transformative technologies that
have ever been developed.
So things like electricity, or the steam engine,
or the wheel, or steel.
It's often very difficult to imagine what
individuals early in the development of these
technologies could have done to have a lasting
and foreseeably positive impact that lingers
far into the future.
An analogy is sometimes made to the industrial
revolution and the agricultural revolution.
The idea is that in the future, impacts of
AI may be substantial enough that there will
be changes that are comparable to these two
revolutionary periods throughout history.
The sort of issue here though is that it's
not really clear that either of these periods
actually were periods of especially high leverage.
If you were, say, an Englishman in 1780, and
trying to figure out how to make this industry
thing go well in a way to have a sort of lasting
and foreseeable impact on the world today,
it's really not clear you could have done
all that much.
The basic point here is that from a long-termist
perspective, what matters is leverage.
This means finding something that could go
one way or the other, and that's likely to
stick in a foreseeably good or bad way far
into the future.
Long-term importance is perhaps a necessary
condition for leverage, but certainly not
a sufficient one, and it's a sort of flawed
indicator in its own right.
So now I'm going to move to three somewhat
more concrete cases for potentially focusing
on AI.
So you might have a few concerns that lead
you to work in this area.
So first concern is instability.
You might think that there are certain dynamics
around the development or use of AI systems
that will increase the risk of permanently
damaging conflict or collapse, for instance
war between great powers.
You might be concerned about lock-in.
The thought here is that certain decisions
regarding the governance or design of AI systems
may sort of permanently lock in, in a way
that sort of propagates forward into the future
in a lastingly positive or negative way.
You may also be concerned about accidents,
the thought that future systems maybe... it
might be quite difficult to use them safely.
And that there may be sort of accidents that
occur in the future with more advanced systems
that cause lasting harm that again carries
forward into the future.
So I'm going to walk through these, each one
by one.
First, the case from instability.
A lot of the thought here is that it's very
likely that countries will compete to reap
the benefits economically and militarily from
the applications of AI.
This is already happening to some extent.
And you might think that as the applications
become more significant, the competition will
become greater.
And in this context, you might think that
this all sort of increases the risk of perhaps
war between great powers.
So one concern here is that there may be a
potential for transitions in terms of what
countries are powerful compared to which other
countries.
A lot of people in the field of international
security think that these are conditions under
which conflict becomes especially likely.
You might also be concerned about changes
in military technology that, for example,
increase the odds of accidental escalation,
or make offense more favorable compared to
defense.
You may also just be concerned that in periods
of rapid technological change, this increases
the odds of misperception or miscalculation
as countries struggle to figure out how to
use the technology appropriately or interpret
the actions of their adversaries.
You may also be concerned that certain applications
of AI will in some sense damage domestic institutions
in a way that also increases instability.
That rising unemployment or inequality might
just be quite damaging, and you might lastly
be concerned about the risks from terrorism,
that certain applications just might make
it quite easy for small actors to cause large
amounts of harm.
In general, I think that many of these concerns
are plausible and very clearly important.
Most of them have not received very much research
attention at all.
I believe that they warrant much, much more
attention.
At the same time though, if you're looking
at things from a long-termist perspective,
there are at least two reservations you could
continue to have.
The first is just we don't really know how
worried to be.
These risks just really haven't been researched
and we shouldn't really take it for granted
that AI will be destabilizing.
It could be or it couldn't be.
We just basically have not done enough research
to feel very confident one way or the other.
You may also be concerned, if you're really
focused on long term, that lots of instability
may not be sufficient to actually have a sort
of lasting impact that carries forward through
generations.
This is a somewhat callous perspective.
If you really are focused on long term, it's
not clear, for example, that sort of a mid-sized
war by historical standards would be sufficient
to have this long term impact.
So it may be actually a quite high bar to
achieve a level of instability that sort of
a long-termist would really be focused on.
The case from lock-in I'll talk about just
a bit more briefly.
Some of the intuition here is that certain
decisions have been made in the past about,
for instance the design of political institutions
or software standards or certain outcomes
of military or economic competitions, seem
to produce outcomes that in some cases sort
of carry forward into the future for centuries.
An example would be let's say the design of
the US Constitution, or the outcome of the
Second World War.
You might have the intuition that certain
decisions about the governance or design of
AI systems, or certain outcomes of strategic
competitions might carry forward into the
future, perhaps for even longer periods of
time.
For this reason, you might try and focus on
making sure that whatever locks in is something
that we actually want.
I think though that this is a somewhat difficult
argument to make, or at least it's a fairly
non-obvious one.
I think the sort of standard skeptical reply
is that with very few exceptions, we don't
really see many instances of long term lock-in,
especially long term lock-in where people
really could have predicted what would be
good and what would be bad.
Probably the most prominent examples of lock-in
are choices around sort of major religions
that have carried forward through for thousands
of years.
But it's quite hard to find examples that
last for hundreds of years.
Those seem quite few.
It's also generally hard to judge what you
would want to lock in.
If you imagine sort of fixing some aspect
of the world, as the rest of world changes
dramatically, it's really hard to guess what
would actually be good under quite different
circumstances in the future.
I think my general feeling on this line of
argument is that, I think it's probably not
that likely that we should expect any truly
irreversible decisions around AI to be made
anytime soon, even if progress is quite rapid,
although other people certainly might disagree.
Last, we have the case from accidents.
The idea here is that, we know that there
are certain safety engineering challenges
around AI systems.
It's actually quite difficult to design systems
that you can feel confident will behave the
way you want them to in all circumstances.
This has been laid out most clearly in this
paper, 'Concrete Problems in AI Safety' from
a couple of years ago by Dario Amodei and
others.
And I'd recommend anyone interested in safety
issues should take a look at that paper.
Then we might think, given the existence of
these safety challenges, and given the belief
or expectation that AI systems will become
much more powerful in the future or be given
much more responsibility, we might expect
that these safety concerns will become more
serious as time goes on.
At the limit you might worry that these safety
failures could become so extreme that they
could perhaps derail civilization on the whole.
In fact, there is a bit of writing arguing
that we should be worried about these sort
of existential safety failures.
The main work arguing for this is still the
book 'Superintelligence' by Nick Bostrom,
published in 2014.
Before this, essays by Eliezer Yudkowsky were
the main source of arguments along these lines.
And then a number of other writers such as
Stuart Russell or a long time ago, IJ Goods
or David Chalmers have also expressed similar
concerns, albeit more briefly.
The writing on existential safety accidents
definitely isn't homogeneous, but often there's
a sort of similar narrative that appears in
these essays expressing these concerns.
There's this basic standard disaster scenario
that has a few common elements.
First, the author imagines that a single AI
system experiences a massive jump in capabilities.
Over some short period of time, a single system
becomes much more general or much more capable
than any other system in existence, and in
fact any human in existence.
Then given the system, researchers specify
a goal for it.
They give it some input which is meant to
communicate what behavior it should engage
in.
The goal ends up being something quite simple,
and the system goes off and single-handedly
pursues this very simple goal in a way that
violates the sort of full nuances of what
its designers intended.
There's a classic sort of toy example, which
is often used to illustrate this concern.
We imagine that some poor paperclip factory
owner receives sort of a general super-intelligent
AI on his doorstep.
There's a sort of slot that's to stick in
a goal.
He writes down the goal, maximize paperclip
production, puts it in the AI system, and
then lets it go off and do that.
The system figures out the best way to maximize
paperclip production is to take over all the
world's resources, just to sort of plow them
all into paperclips.
And the system is so capable that designers
can do nothing to stop it, even though it's
doing something that they actually really
do not intend.
So I have some general concerns, I suppose,
about the existing writing on existential
accidents.
So first there's just still very little of
it.
It really is just mostly super-intelligence
and essays by Eliezer Yudkowsky, and then
sort of a handful of shorter essays and talks
that express very similar concerns.
There's also been very little substantive
written criticism of it.
Many people have expressed doubts or been
dismissive of it, but there's very little
in the way of skeptical experts who are sitting
down and fully engaging with it and sort of
writing down point by point where they disagree
or where they think the mistakes are.
Most of this work on existential accidents
was also written before large changes in the
field of AI, especially before the recent
rise of deep learning, and also before work
like 'Concrete Problems in AI Safety,' which
laid out safety concerns in a way which is
more recognizable to AI researchers today.
Most of the arguments for existential accidents
often rely on these sort of fuzzy, abstract
concepts like optimization power or general
intelligence or goals, and toy thought experiments
like the paper clipper example.
And certainly thought experiments and abstract
concepts do have some force, but it's not
clear exactly how strong a source of evidence
we should take these as.
Then lastly, although many AI researchers
actually have expressed concern about existential
accidents, for example Stuart Russell, it
does seem to be the case that many, and perhaps
most AI researchers who encounter at least
sort of abridged or summarized versions of
these concerns tend to bounce off them or
just find them not very plausible.
I think we should take that seriously.
I also have some more concrete concerns about
writing on existential accidents.
You should certainly take these concerns with
a bit of a grain of salt because I am not
a technical researcher, although I have talked
to technical researchers who have essentially
similar or even the same concerns.
I think the general concern I have is that
these toy scenarios are quite difficult to
map onto something that looks more recognizably
plausible.
So these scenarios often involve, again, massive
jumps in the capabilities of a single system,
but it's really not clear that we should expect
such jumps or find them plausible.
This is a sort of wooly issue.
I would recommend that people check out writing
by Katja Grace or Paul Christiano online.
That sort of lays out some concerns about
the plausibility of massive jumps.
Another element of these narratives is, they
often imagine some system which becomes quite
generally capable and then is given a goal.
In some sense, this is the reverse of the
way machine learning research tends to look
today.
At least very loosely speaking, you tend to
specify a goal or some means of providing
feedback, or directing the behavior of a system
and then allow it to become more capable over
time, as opposed to the reverse.
It's also the case that these sort of toy
examples stress the nuances of human preferences,
with the idea being that because human preferences
are so nuanced and so hard to state precisely,
it should be quite difficult to get a machine
that can sort of understand how to obey them.
But it's also the case in machine learning
that we can train lots of systems to engage
in behaviors that are actually quite nuanced
and that we can't specify precisely.
So recognizing faces from images is an example
of this.
So is, for example, flying a helicopter.
It's really not clear exactly why human preferences
are so fatal to understand.
I'm just generally...
It's quite difficult to figure out how to
map the toy examples onto something which
looks more realistic.
So some general caveats on the concerns expressed.
None of my concerns are meant to be decisive.
I've found, for example, that many people
working in the field of AI safety in fact
list somewhat different concerns as explanations
for why they believe the area is very important.
There are many more arguments that I believe
are sort of shared individually, or sort of
inside people's heads but just haven't been
published.
I really can't speak exactly to how compelling
these are.
The main point I just want to stress here
is essentially that when it comes to the writing
which has actually been published, and which
is sort of out there for analysis, I don't
think that's necessarily that forceful, and
at the very least it's not decisive.
So now I have just some sort of brief practical
implications, or thoughts on prioritization.
You may think from, I suppose all the stuff
I've just said, I'm actually quite skeptical
about AI safety or governance as areas to
work in.
In fact, I'm actually fairly optimistic.
My reasoning here is that I really don't think
that there are any slam-dunks for improving
the future.
I'm not aware of any single cause area that
seems very, very promising from the perspective
of offering high assurance of long-term impact.
I think that the fact that there are at least
plausible pathways for impact by working on
AI safety and AI governance puts it sort of
head and shoulders above most areas you might
choose to work in.
And AI safety and AI governance also stand
out for being pretty extraordinarily neglected.
Depending on how you count, there are probably
fewer than a hundred people in the world working
on technical safety issues or governance challenges
with an eye towards very long-term impacts.
And that's just truly, very surprisingly small.
The overall point though, is that the exact
size of the bet that EA should make on artificial
intelligence, sort of the size of the portfolio
that AI should take up will depend on the
strength of the arguments for focusing on
AI.
And most of those arguments still just aren't
very fleshed out yet.
I also have some broader sort of epistemological
concerns which connect to the concerns I've
expressed.
I think it's also possible that there's certain
sort of social elements in the... social factors
relating to EA communities that might sort
of bias us to take an especially large interest
in AI.
So, one thing is just that AI is especially
interesting or fun to talk about, especially
compared to other cause areas.
It's sort of an interesting kind of contrarian
answer to the question of what is most important
to work on.
It's sort of surprising in certain ways.
And it's also now the case that interest in
AI is to some extent an element of community
culture.
People sort of have an interest in it that
goes beyond just sort of the belief that it's
an important area to work in.
It definitely has a sort of certain role in
the conversations that people have casually,
and just what people like to talk about.
I think these wouldn't necessarily be that
concerning, except they also think that we
can't really count on external feedback to
sort of push us back if we sort of drift a
bit.
So first it just seems to be empirically the
case that skeptical AI researchers generally
will not take the time to sort of sit down
and engage with all of the writing, and then
explain carefully why they disagree with concerns.
So we can't really expect that much external
feedback of that form.
People who are skeptical or confused, but
not AI researchers, or just generally not
experts may be concerned about sounding ignorant
or dumb if they push back, and they also won't
be inclined to become experts.
We should also expect generally very weak
feedback loops.
If you're trying to influence the very long-run
future, it's hard to tell how well you're
doing, just because the long-run future hasn't
happened yet and won't happen for a while.
Just generally, I think one thing to watch
out for is justification drift.
If we sort of start to notice that the community's
interest in AI stays constant, but the reasons
given for focusing on it change over time,
then this would be sort of a potential check
engine light, or at least a sort of trigger
to be especially self-conscious or self-critical,
because that may be some indication of motivated
reasoning going on.
Suppose I have just a handful of short, short
takeaways.
So first, I think that not enough work has
gone into analyzing the case for prioritizing
AI.
Existing published arguments are not decisive.
There may be many other possible arguments
out there, which could be much more convincing
or much more decisive, but those just sort
of aren't out there yet, and there hasn't
been much written criticizing the stuff that's
out there.
For this reason, thinking about the case for
prioritizing AI may be an especially high
impact thing to do, because it may shape the
EA portfolio for years into the future.
And just generally, we need to be quite conscious
of possible community biases.
It's possible that certain social factors
will lead us to drift in what we prioritize,
that we really should not be allowing to influence
us.
And just in general, if we're going to be
putting substantial resources into anything
as a community, we need to be especially certain
that we understand why we're doing this, and
that we stay conscious that our reasons for
getting interested in the first place continue
to be good reasons.
Thank you.
Do you want to take a few questions? There is time.
Great, have a seat over there.
Okay, so we do have a few minutes for some
Q&A.
As always, the Bizzabo app is the place to
submit your questions, or on the website you
can all say it with me, london.eaglobal.org/polls,
london.eaglobal.org/polls.
So, first question that's already in is, what
advice would you give to one who wants to
do the kind of research that you are doing
here about the case for AI potentially, as
opposed to the AI itself?
Yeah, I think, something that I believe would
be extremely valuable is just basically talking
to lots of people who are concerned about
AI and sort of asking them precisely what
reasons they find compelling.
I've started to do this a little bit recently
and it's actually been quite interesting that
people seem to have pretty diverse reasons,
and many of them are things that people want
to write blog posts on, but just haven't done.
So, I think this is a sort of a low-hanging
fruit would be quite valuable.
It's just talking to people who are concerned
about AI, trying to understand exactly why
they're concerned, and either writing up their
ideas or helping them to do that.
I think that would be very valuable and probably
not that time intensive either.
Have you seen any of the justification drift
.. I thought it was still on the screen...
that you alluded to?
Can you kind of pinpoint that happening in
the community?
Yeah.
So I think that's certainly happening to some
extent.
I think even for myself, I believe that's
happened for me to some extent.
I think when I initially became interested
in AI.
I was especially concerned about these existential
accidents.
I think I now place relatively greater prominence
on sort of the case from instability as I
described it.
And that's certainly, you know, one possible
example of justification drift.
It may be the case that this was actually
sort of a sensible way to shift emphasis,
but would be something of a warning sign.
And I've also just spoken to technical researchers,
as well, who used to be especially concerned
about this idea of an intelligence explosion
or recursive self improvement.
These very large jumps.
I now have spoken to a number of people who
are still quite concerned about existential
accidents, but make arguments that don't hinge
on there being this one single massive jump
into a single system.
Okay.
So questions are starting to roll in.
I'm just kind of trying to peruse them all
here.
I guess one question that's kind of synthesizing
a couple is, you made the analogy to the industrial
revolution, and the sort of 1780 Englishman
who, you know, doesn't really have much ability
to shape how the steam engine is going to
be used.
It seems intuitively quite right.
The obvious kind of counterpoint would be,
well this is a problem-solving machine.
There's something kind of different about
it.
I mean, does that not feel compelling to you,
the sort of inherent differentness of this?
So I think probably the strongest intuition
is, you might think that there will eventually
be a point where we start turning sort of
more and more responsibility over to automated
systems or machines, and that there might
eventually come a point where humans have
sort of almost no control over what's happening
whatsoever, that just we kind of keep turning
over more and more responsibility that there's
a point where, just sort of machines are in
some sense and control and you kind of can't
back out.
And you might have some sort of irreversible
juncture here.
I definitely to some extent, share that intuition
that if you're looking over a very long time
span, that that is probably fairly plausible.
I suppose the intuition I don't necessarily
have is that unless things go, I suppose quite
wrong or if they happen in somewhat surprising
ways, I don't necessarily anticipate that
there will be this really irreversible juncture
coming anytime soon.
If let's say it takes a thousand years for
control to be handed off, then I am not that
optimistic about people having that much control
over what that hand off looks like by working
on things today.
But I certainly am not very confident.
Okay, let's see.
Going directly to the questions.
Are there any policies that you think a government
should implement at this stage of the game,
in light of the concerns around AI safety?
And how would you allocate resources between
existing issues and possible future risks?
Yeah, I am still quite hesitant, I think,
to recommend very substantive policies that
I think governments should be implementing
today.
I currently have a lot of agnosticism about
what would be useful, and I think that most
sort of current existing issues that governments
are making decisions on aren't necessarily
that critical.
I think there's lots of stuff that can be
done that would just be very valuable, like
having stronger expertise or stronger lines
of dialogue between the public and private
sector, and things like this.
But I would be hesitant at this point to recommend
a very concrete policy that at least I'm confident
would be good to implement right now.
You mentioned the concept of kind of a concrete
decisive argument.
I think one of the questioners is kind of
intending to push on that concept a little
bit by asking do you see concrete, decisive
arguments for other cause areas that are somehow
more concrete and decisive than the AI, and
what is the difference?
Yeah.
So I guess I tried to allude to this a little
bit, but I don't think that really any cause
area has an especially decisive argument for
being a great way to influence the future.
There's some that I think you can put sort
of a lower bound on at least how likely is
to be useful that's somewhat clear.
So for example, risk from nuclear war.
It's fairly clear that this is at least plausible
this could happen over the next century.
You know, nuclear war has almost happened
in the past, the climate effects are speculative,
but at least somewhat well understood.
And then there's this question of if there
were nuclear war, how damaging is this?
Do people eventually come back from this?
And that's quite uncertain, but I think it'd
be difficult to put above 99% chance that,
say, people would come back from a nuclear
war.
So, in that case you might have some sort
of a clean, lower bound on, let's say working
on nuclear risk.
Or, quite similarly, working on pandemics.
And I think for AI it's difficult to have
that sort of confident lower bound.
I actually tend to think, I guess as I alluded
to, that AI is probably or possibly still
the most promising area based on my sort of
current credences and things, and just the
extreme neglectedness.
But yeah, I don't think any cause area stands
out as especially decisive as a great place
to work.
The question from someone who I think share
some of your intuitions, describes him or
herself as an AI machine learning researcher
PhD student currently, who is skeptical about
the risk of AGI.
How would you suggest that someone like that,
in addition to kind of the obvious in-person
conversation and internet blogging, contribute
to the process of providing this feedback
that you're identifying as a need.
Yeah, I mean I think just a combination of
just in person conversations and then I think
even sort of simple blog posts can be quite
helpful.
I think there's still been surprisingly little
in the way of just, let's say something written
online that I would point someone to who wants
sort of the skeptical case.
This actually is a big part of the reason
I suppose I gave this talk, even though I
consider myself not extremely well placed
to give it, given that I am not a technical
person.
I just, there's so little out there along
these lines that, I think it's...
Yeah, there's a low hanging fruit, essentially.
Okay.
Probably the last question we have time for.
Prominent deep learning experts such as Yann
Lecun and Andrew Ng...
I'm not sure if I'm saying that right...
Do not seem to be worried about risks from
super-intelligence.
Do you think that they have essentially the
same view that you have or are they coming
at it from a different angle?
I'm not sure of their specific concerns.
I know this classic thing that Andrew Ng always
says is he compares it to worrying about overpopulation
on Mars, where the suggestion is that these
risks, if they materialize, are just so far
away that it's really premature to worry about
them.
So it seems to be sort of an argument from
timeline considerations.
I'm actually not quite sure what his view
is in terms of, if we were like, let's say
50 years in the future, would he think that
this is a really great area to work on?
I'm really not quite sure.
I actually tend to think, sort of the line
of thinking that says, "Oh, this is so far
away so we shouldn't work on it" just really
isn't that compelling.
It seems like we have a load of uncertainty
about AI timelines.
It seems like no one can be very confident
about that.
So yeah, it'd be hard to get under, let's
say one percent that interesting things won't
happen in the next 30 years or so.
So I'm not quite sure that the extent of his
concerns, but if they're based on timelines,
I actually don't find them that compelling.
Cool.
Well thank you for presenting the case for
uncertainty, but also still the importance
of the domain of AI.
How about a round of applause for Ben Garfinkel?
