It’s been a long time since I’ve filmed
outside… or in my car.
Though, according to some people I won’t
have to be sitting here for long.
Which reminds me, I recently picked up a
large British audience because of a few videos
I made, so let me uh…
This should make them feel a little
more at home.
And is oddly appropriate since this is what
many people think this is what the driver’s
seat of a car might look like soon.
No steering wheel, maybe just a HUD… it’s
pronounced HUD, Heads Up Display, not “hood.”
And I won’t have to do anything more than
just sit here and sleep or… sing.
Really, we don’t have an embarrassing shot
of me singing?
But there are many problems and challenges
that self-driving cars must overcome first.
When artificial intelligence, first started,
the designers thought to themselves: “Hmm,
what’s the most difficult thing we can teach
a computer to do?”
What answer did they come up with?
Chess… and checkers and go, board games.
But not just any board games, these are games
for super nerds, so if they can beat super
nerds, they must be really smart.
Turns out, while these games may be hard for
you, they’re really easy for a computer.
Like laughably easy.
Deep Blue beat the beat the best human chess
master in 1997.
That was well before most of you had internet,
if you were lucky enough to have internet,
it was still dialup and you hoarded those AOL CDs
with thousands of hours.
The game checkers was declared officially
dead in 2007, because a computer had memorized
every possible game situation, all 5 times
10 to the 20 positions.
In 2015, Chess was likewise solved.
And in 2016, a computer beat the second best international Go player…
Board games are easy.
You input the rules, maybe let it watch a
few thousand games, and that’s it.
It doesn’t require any motor functions or
sensory input.
So what people didn’t see coming was how
difficult it is to get a computer to see.
Seeing requires much more than just photon
receptors.
That’s why you have an entire lobe, roughly
20% of your brain, dedicated to it.
Colors are relatively simple and not all that
difficult for a computer to figure out…
so they won’t be the focus of this video
– but I’m… building to it.
Shapes are also relatively easy but edge detection
is not.
So when a computer sees an image like this,
it sees a bunch of colors and shapes.
But it has no idea what any of it means and
can’t tell the objects from the background.
Is this an object?
Is this an object?
Is this all one object with multiple colors?
But you can see it, easily.
And without having to think about it.
Because, one of the primary purposes of Area
V1, also known as BA17, is edge detection.
It’s one of the first things that the brain
does with visual information.
This problem has been mostly figured out,
but I’ll get to how in a bit.
A problem that will never be solved though,
is how to teach a computer to see depth.
It has no idea what is closer.
This… or this.
Again, you can, with little to no effort.
When you see optical illusions like this,
you know something isn’t right, but a computer
has no idea.
First, let’s go through the monocular cues,
that is one eye, for depth perception.
When you look at this picture, you see the
horizon.
Your brain uses this as a major clue.
The closer it is to the horizon, the further
away it usually is.
And the higher up or down, the closer to you
it is.
We also look for parallel lines which converge
the further away they are.
Also the further away you look, the more atmospheric
haze there is, which makes things have less
contrast and appear more blue than they should.
But a more obvious monocular cue is relative
size, which is how big or small something
is compared to something else that should
be similar… like in this illusion.
They’re all the same size by the way, but
because of their positions they appear hilariously
misproportioned.
Or, my hands, one of them is bigger than the
other – not because I’m a freak, but because
one is closer to the camera.
Another obvious cue is occlusion, since my
right hand is obscuring the view of my left
hand, it must be closer.
Which was also the case with the cars.
But that’s just how you do it with one eye.
You don’t actually see this.
Because you have two eyes, giving you binocular
vision… so you see this.
The focus of your two eyes gives you depth
perception.
Because of a fun little bit of math that your
eyes and brain calculate without your conscious
awareness.
The angle of your eyes is calculated when
you’re looking here… and when you’re
looking here… and your brain essentially
does triangulation to figure out that this
point is further away than this point.
Granted, you don’t actually perceive this
blurry mess unless you take off your glasses
during a 3D movie, but in real life your brain
cleans up the image a lot.
While you may think that you see in 3D, you
actually only see in 2D and your brain creates
a 3D space using that 2D information.
And we’re not even going to get into the
motion cues because, we’ve already complicated
things enough… and…
I mean c’mon.
Maybe that’ll be another video.
All of these are things your brain does without
even thinking about it.
Many of the monocular cues can be programmed
into a computer, but not all of them.
And just like how your brain can be fooled
by various illusions, so can a computer, which
is why a computer cannot rely entirely on
vision to determine depth or distance.
It needs some sort of outside help.
Many of you might have jumped to the idea
of GPS or satellite imagery.
Unfortunately, GPS doesn’t provide you with
any images, and it doesn’t tell
you where you are.
It tells you where IT is, and your computer
makes its best guess on your location, which
is usually only accurate within a few feet.
Which doesn’t work when you’re talking
about piloting a one ton hunk of metal at
60 miles an hour.
Satellite imagery also doesn’t work because
it isn’t live, which is pretty much what
you need when navigating a busy street.
And even if you somehow manage to fix that
problem you’ll never overcome weather, buildings,
and trees and stuff.
So a driverless car needs to use localized
vision, along with something else to perceive
distance local to the car.
Which we’ve actually had much longer than
we’ve had satellites.
Echolocation, such as sonar or radar.
Basically, it sends out a signal and based
on how long it takes for the signal to reflect
off the object and return, it can figure out
how far away it is.
The problem, is that this information is limited
and not quite how you see it in the movies.
For an image like this, radar will return
an image like this – just like with visual
information, it’s only two dimensional,
and will only tell you how far something is
from transceiver at the level of the transceiver.
So if it’s on the roof of your car, that’s
not very useful for looking for how far away
another car’s bumper is.
So have one on the bumper, obviously.
That’s on top of the fact that it only tells
you how far away something is in that moment,
and not what direction or how fast it’s
travelling.
Which is made even more complicated when you
add in the fact that you are likewise moving.
The technology to do that does exist and has been in use for sea and air travel for decades.
But those are long range with far fewer vehicles
and little environmental obstruction.
So okay, all of these are challenges that
have been or can be solved.
So let’s look at some issues with vision
that have not yet been solved.
What is this?
Right, it’s a bicycle.
What is this?
It’s still a bicycle, c’mon.
Okay, what is this?
It’s still a bicycle, this isn’t rocket
science, it’s pretty easy.
Yeah, for you.
For a computer, it’s infinitely difficult.
Object recognition is by far the hardest thing
to get a computer to do.
Especially when you consider that 3D objects
look completely different in a 2D image from
different angles, even in perfect conditions.
You may remember this video from a few years
ago of a robot navigating around the world
and interacting with objects.
You probably mostly remember it because of
the jerk with the hockey stick.
It’s all pretty impressive… as long as
you aren’t aware of the difficulties of
getting a computer to see.
You know what a cognitive psychologist sees
when they watch this video?
A robot interacting with a bunch of QR codes.
Those QR codes tell the robot what the object
is, it’s distance, and it’s orientation.
Whether it’s a box or a door, if they want
the robot to interact with it, there are QR
codes stamped all over it.
Sorry for ruining the magic.
What are you talking about?
Computers are way good at recognizing things,
like how facebook recognizes me in all those
pictures or all those snapchat filters.
Yeah, I mean they’re good at faces, I’ll
give you that.
But faces are pretty easy, there’s a pretty
standard pattern for those… your brain is
wired to automatically think this is a face
– it’s not… it’s a chair
And it’s not really like recognizing faces
is all that useful for driverless cars.
Computers are getting better at recognizing
objects though, and you’re helping.
Whether you’re playing Google Quickdraw,
or doing one of those annoying new Captcha
images where you select all of the squares with a stop sign, you're teaching computers how to see.
You are helping our future machine overlords
recognize objects.
However, even with all of this learning, it
still has to match what it sees with a previously
learned template.
It’s not, like you, able to figure out what
new objects are and what they mean on the
fly.
For example, say you come across this sign.
It might only take you a second or two to
skim it over and realize that it doesn’t
apply to you and continue driving.
But a computer?
The first thing it will recognize is the shape
– it sure looks like a stop sign.
It’s red and white, just like a stop sign…
it has a little more to it than a stop sign
but, just to be sure, I better stop... in
the middle of this street.
That could cause an issue.
Or perhaps this situation.
Clearly, that’s a stop sign, but it’s
covered with a trash bag.
Whatever internet traffic uplink it’s using
says there should be a stop sign here and
there it is.
But it… it’s partially covered with a
trash bag.
Again, you can look at this situation and
quickly figure it out.
You’re supposed to follow the temporary
green light and ignore the stop sign.
But a computer, especially one that’s never
encountered this before, won’t be able to
react as easily.
Say the temporary light wasn’t there.
Is a trash bag and some duct tape all it takes
to fool a self-driving car into ignoring the
sign and just flying through?
If so, you’re going to see a lot more teenage
pranks and youtube videos like this show up.
And what about that detour sign?
Your GPS says you should go straight.
You might know that you can safely go straight,
but a computer sees the sign saying it must
go right.
Maybe there’s an obstruction ahead?
These are all things that you can quickly
figure out.
But a computer has to obey a set of rules
and when presented with something outside
of its ruleset, it may not know how to react.
Which is why I’m not very impressed when
people bring up Google’s self driving car
that drove up and down the highway on its
own a few years ago.
Driving on a highway is easy, all you have
to do is stay between the lines.
There are very few dynamic situations, very
few unique situations, and relatively few
challenges.
It’s so easy a monkey could do it… actually,
it’s so easy a dog could do it.
This isn’t a prank, this isn’t a joke,
this is an actual dog driving a car.
There’s an entire channel dedicated to showing
various dogs learning how to drive cars, it’s
not that hard.
Obviously they’re on a track by themselves,
heavily supervised, but they are staying in
the lines.
It’s not that hard, and it’s not that
impressive.
Highway driving is so easy you regularly completely
zone out and stop paying attention, and most
of the time everything turns out okay.
Not like in the city where you’re on constant
guard and see dynamic, unique situations all
the time.
Which brings us to the last major hurdle that
driverless cars must face.
Once you’re able to get it to see properly
and understand what it sees, you then have
to tell it what to do with that information.
Let’s say you’re driving along and you
come across this situation.
Again, you, as a human, can quickly figure
out the context of this situation, and probably
wouldn’t stop.
A computer, on the other hand, wouldn’t
know that this person was just about to turn
right and get into the driver’s side door.
According to this person’s current trajectory,
if the car doesn’t stop now, it’s gonna
hit them.
Does it assume that this person is fully aware
and acting safely or does it stop, possibly
causing an accident with the car behind them?
That’s a simple situation, a very simple
situation.
Let’s say that the car is driving along
on a two lane road and realizes that it’s
brakes are out.
I don’t know, maybe a line got severed or
a wire shorted out, it doesn’t matter – it’s
rare, but it’s not unheard of.
Coming towards the car are two motorcyclists
who are dangerously riding side by side in
both lanes.
Your car must now choose who to hit.
You, as a human, can freeze up, yell “Jesus
take the wheel!” and let physics decide
who lives and who dies.
A computer on the other hand, can’t.
Not making a decision is a decision to do
nothing, which means that the car will hit
one or both of them… which means that the
car decided to hit one or both of them.
There is no scenario where the computer can
claim to have been so flustered that it couldn’t
make a decision.
It could decide to follow the law and strike
the person who was travelling in the incorrect
lane, that’s one way to do it.
Or, we can make the situation even more interesting
by pointing out that one rider is wearing
a helmet and all their protective clothing,
while the other is simply wearing a t-shirt
and shorts.
Your car may decide that this person is more
likely to survive a collision – although
very slimly more likely – and therefore
steer the car into that person.
Which would paradoxically make it less safe
to be wearing a helmet.
Or, we could point out this tree to the side.
It could avoid hitting both riders and instead
elect to crash itself into the tree… just
injuring you.
You’ll likely survive while the riders probably
wouldn’t.
But who is the car supposed to protect?
You?
The owner and operator?
Or some bonehead Harley rider who wasn’t
obeying the law?
Some might say that as the owner, the car’s
main directive should be to protect you and
the passengers.
While others might say it should protect as
many lives as possible.
But given the choice, if there were cars on
the market that safeguarded all life and cars
that just protected you and your passengers…
you might be more inclined to buy the one
that places you and your family above others.
Let’s pose another situation.
Say you’re at an intersection, and your
car wants to make a right turn… but there’s
a line of school children currently crossing
the street, all holding hands, single file.
So you’re patiently waiting.
But another car coming down the road, has
hit a patch of ice or has its brakes and steering
go out, whatever, it doesn’t matter, the
point is that the car isn’t stopping and
no longer has control.
It’s also a self driving car, and using
magic, is alerting all other cars in the area
about its situation.
If your car is designed to only protect you,
it’ll probably sit tight… and force you
to watch something so horrifying you’ll
never see the end of the therapy bills.
If your car is designed to protect as many
lives as possible, it might pull forward into
the intersection… stopping the car from
plowing through all those kids… but you’ll
be t-boned and your possibility of walking
away from this accident is pretty low.
These are the situations that driverless cars
will be forced to have to make decisions on,
and they are incredibly tough decisions.
Not to mention the fact that I’ve only given
you a small handful of the literal infinite
amount of possible situations.
I certainly don’t want to be the one writing
the ethical and moral codes for self driving
cars… but someone has to… especially if
we ever want intersections to look like this.
Where there are no traffic lights, all the
cars are driverless and are simply communicating
with each other with hyper efficiency.
And it’s absolutely impossible.
First of all, it requires that every single
car on the road be self-driving.
If there’s even one manually driven car,
game over.
Which then also means that your car must be
self-driving all the time.
If you have the ability to switch it on and
off, an intersection like that will never
work.
Which means that old man river out on his
dirt road would have to be using a self driving
car.
We can get around this situation by saying
that only on certain roads, auto pilot must
be enabled, fine.
But let’s say you’re on one of these auto
pilot only roads, and you’re late for work…
when this happens.
Your HUD tells you that an emergency is occurring
on the road so all travel is currently halted.
Nevermind how furious you’ll be over the
fact that the government can just seize control
of your car… you’re late, so you flip
the manual override and decide to proceed
anyway… and congratulations, you just caused
a collision.
Don’t act like this situation is impossible,
how many people do you know who have driven
through a closed highway because “the weather
was bad.”
If people can break the law in order to save
themselves time, they will.
But let’s go back to this intersection and
assume that all cars will always
be self driving, with no possible manual override.
This intersection is still a disaster waiting
to happen.
Let’s completely set aside the idea that
anyone would ever go to this intersection
with malicious intent, even though those people
have always and will always exist.
And we’ll assume that all of these cars
are completely unhackable, again… we’re
assuming perfect conditions.
Imagine a tree branch falls in this intersection.
Or a tie blows out.
Or a truck’s unsecure cargo falls out.
You’re looking at a several car pile up…
even with AI that can respond instantly.
Also, having traffic flow like this renders
the intersection completely useless to pedestrians
and bicyclists.
There’s an easy solution of course, a four-way
foot bridge.
Which likewise dramatically increases the
likelihood of something or someone falling
into the intersection, accidentally or not.
But again, in order to achieve that perfect
flow of traffic, everyone needs a driverless
car.
Cars aren’t like phones, where people get
a new one every year or two.
Cars last a long time – like 15 to 20 years.
Most people don’t get a new one until their
current one is broken beyond repair.
So even if by some miracle, all of the technological
and ethical hurdles are overcome in ten years
– which is extremely generous, they totally
won’t be – and they stop selling manually
driven cars that same day… without government
intervention, it would still take another
15 to 20 years to phase out all of the manually
driven cars, excluding the antiques of course,
because you’re never going to take those
away from people.
On the topic of government intervention, we
also have many legal issues to work out with
driverless cars.
Just to throw a few out there.
Who is at fault when a car decides to hit
someone?
When you’re the only person riding in a
self-driving car, are you allowed to be on
your phone?
Sleeping?
Drunk?
If you’re required to be awake and attentive
the entire time, doesn’t that kind of ruin
the point of it being self-driving?
Self driving cars will happen, don’t get
me wrong.
They are coming.
But if you think that they’ll take over
the roads in the next ten, twenty, or even
thirty years, hopefully now, you know better.
