Hi, how's everyone doing today? Cold? A
little bit. Just, just a quick question:
Who here is like doing computer science or taking some kind of software
development classes? Just okay! Basically
everybody! All right, awesome and who here
has seen the Carpathia talk or read the
original medium article where he
describes software 2.0? Has anyone seen
that or read that? No?!
Oh, awesome so just to give him the
credit for the for the name Software 2.0
as a term that was coined by Andrey
Carpathia. He's the director of AI at
Tesla and he kind of identified this
trend that that many of us in the
industry have I started to see and this
trend is coming about because of the
advancements in the capabilities of
machine learning, and kind of the recent
successes that we've seen with deep
neural networks and some of the AI
breakthroughs. So my goal here is not to
talk too much about specifics of machine
learning but it is to kind of convince
you that there's this revolution that's
happening in the software development
process itself because of it. However,
before we talk about that, just mention
that I'll be teaching as an adjunct next
quarter here in the Spring. Doing big
data so if that sounds interesting you
want to learn about what is big data, why
does it matter,
and how do engineers in the industry
kind of approach these problems of a
that come about with data at scale,
I highly encourage you to enroll. That'd
be super fun! Just about myself: this is a
picture of my wife and I in Hawaii. We
like to travel particularly in
particular places that have more
sunshine then maybe Seattle does. I'm a
software engineer at Microsoft, I've been
there for a couple years now. I work on a
big data analytics team that's
responsible for Windows and Azure
reliability. So we basically make sure
that the operating system and kind of
all the applications that
right at Microsoft are working well and
that our users aren't being you know
interrupted by any issues or causing. In
addition to that, I'm also working on a
software 2.0 project. And that's kind of
interesting story how that got started.
Basically, one day I was in Dallas Texas,
where I'm originally from, and my dad and
I were having this talk about this new
kind of neural network thing called a
generative adversarial Network. There's a
really interesting and I mentioned those
a little bit later. But I was talking
about wouldn't it be cool for any of you
have been to Moab Utah, there's this
really awesome national park called
arches national park and it's just
beautiful.
There's canyons, there's like these
arches, there's all kinds of crazy
interesting geographical features, and I
thought it'd be really cool if we could
somehow emulate that in software. So
basically build a procedural generator
that could create these arches in these
canyons and stuff. Because I wasn't
familiar still not familiar with any
procedural algorithms that can generate
that kind of interesting terrain. So
anyway, last summer
Microsoft puts on a big hackathon called
"One week". And it's actually the every
year we break the record for the world's
largest hackathon by participants. So
last year we had about 30,000 people
working on hackathon projects. Anyway, so
I decided that I would build this Moab
Utah terrain generating thing and while
the outcome wasn't necessarily as
successful as that might have led up to be.
It did lead to kind of me joining
this project between which is now being
owned by Microsoft Research in
conjunction with the minecraft gaming
Studios. So our plan is to try to use
deep learning to learn to create terrain
maps for video games. So for me I'm
tremendously excited about this new
potential. So what exactly does software
2.0 look like? Well,
sometimes you might hear in the media
that it sounds like this but just to
quote Andrew Inc, who's kind of an expert
in the field,
"Worrying about evil killer AI robots is
just a little bit like worrying about
overpopulation on the planet Mars". It's
not to say that it isn't a problem or
won't become a problem, but it's a little
bit so far off in the future that maybe
we need to tone down some of the
investment and research is going into
trying to combat this problem, because
it's not really that much of a problem
just yet. So, I think that what software
2.0 is going to look like it's more
something more like this.
This is Amazon go for those of you who
may not have heard of it and it's
essentially a new project by Amazon
where you walk into a grocery store, you
just grab all the items right off the
shelf, and then walk out.
There's no cashiers, there's no checkout
lines, and the idea is that it's gonna
save you time and money by kind of
eliminating the logistics of the
checkout line process. And this is not
science fiction, any of you can go down
to the Amazon Doppler building, in South
Lake Union districts of Seattle and, I
think it's open even maybe right now,
maybe not because of the weather, but you
just need an Amazon account you can go
there today. So I want to talk just
briefly about some of the recent
achievements in machine learning and AI.
In the top left here this is a google
deepmind's alphago and this is the
version of the alphago program which
defeated 18 time world Korean go player
champion lee sedol. I think it's 18 time.
Essentially this is the first time that
a computer program ever beat a human
that was the highest ranked and go which
is nine Don at go. So it was a pretty
breakthrough achievement. In a similar
fashion this is open AI by playing
against five five professional dota 2
players, and it I believe they won like
four out of five matches. And that was a
huge achievement because dota
a very kind of complex messy, freeform,
nature type of game where Co you really
just have discrete states so the problem
it's not easier but it's certainly there
so there's a smaller action space. And
then on the bottom left here you see all
of these faces are fake. None of these
people exist, none of these people will
ever exist.
This is using a neural network called a
generative adversarial Network. And it's
able to synthesize highly photorealistic
imagery of really any kind of
distribution of images you can think of
this is the type of technology that my
team is using to try to synthesize
terrain maps and video games. And then
lastly we have, I'm sure most of you
familiar with, this is the self-driving
auto pilot program at Tesla and they're
using mostly just cameras. So it's it's
mostly just imagery, they're trying to
minimize the number of special sensory
they use, and they're just trying to use
computer vision to be able to map out
their entire environment, and have a car
that can drive itself. In these examples
really only represent kind of the tip of
the iceberg for what we're gonna see
probably in the next decade. So as I
mentioned before, software 2.0 Carpathia
identified the term. The name itself is
not that important and who knows if this
is what we're going to be calling this
trend moving forward but the really
important thing here to understand is
that it's changing the way that we build
software. But to really understand that I
think we need to kind of look at how we
build software today first. So the way
that we build software today is we take
this really big complex problem and we
decompose it into subproblems. Some of
these subproblems can then be further
subdivided until the hope is that we
have kind of these easier to manage base
problems that we can solve and then we
somehow sum up all of these base
problems back to the top to where we
sort of have a solution to our original
problem.
So this type of explicitly telling the
computer how to solve a problem is the
kind of class we'll called suffer 1.0
and software 1.0 is the technology stack
that we've all become used to it. Our
java programming languages RDEs it's
our compilers. It's all the tools that
have been created to make us very very
productive about building software. Well,
what the key is that we tell the
computer exactly what to do, we tell it
exactly how to solve the problem. And
there's some problems that software 1.0
is great at solving. Things that are very
well-defined, like protocols, my example
up here tcp/ip stack, GPU shading,
compilers. All of these problems domains
are very kind of like well understood or
at least they have very clear
requirements, so these are things that
software 1.0 really excels at. So
software 2.0 doesn't suddenly come in
here and say,- all this stuff is obsolete,
it just kind of puts some of the missing
pieces that we've had you know we
desired to solve kind of on the side.
So let's look at a task that is really hard
for software 1.0 to solve. Consider this
problem of image recognition of a cat.
How do we go about solving this problem
of identifying if an image as it
contains a cat or not? Well, the first
thing we can try is to simplify the
problem. In this image there's a lot of
complex colors is a white cat and kind
of yellow eyes. In general, who want
identify a cat, maybe the color isn't so
important, we could apply first a you
know pre-processing filter on it, like a
Sobel edge detection, which could make
the problem a little bit easier to solve.
And as I mentioned before the way that
we approach problem solving in software
1.0 is that we decompose a you know
bigger problem into subproblems. So you
can think of the problem of solving
whether an image is a cat or not, as kind
of first identifying if you have an ear,
and if you have a face, and the face can
be subdivided into the problems: does it
have eyeballs, does it have a nose, does
that whiskers? And then we can just kind of
put logic around it, that says, "If all these
things exist, then we have cat". But we
need to remember what does a computer
see when it has this image? All the
computer really knows is a big matrix of
pixel values. So how do we really write
the logic that identifies if there's an
ear, or nose, or whisker. I mean, do we look
at neighboring pixel values and say if
they're within this range of red values,
within this range of green values, or
something that we suddenly have an ear.
It certainly seems that hard coding all
these rules is going to be very hard to
scale. But imagine, we did it. And imagine
we're able to identify if this
particular image is a cat. Does anyone
think that it'll work on this image?
Probably, not! We didn't mention anything
about a tail, we didn't say anything
about paws, the face is now at a
completely different angle It's highly
likely that this is not going to work. So,
does that mean that the problem of image
recognition is unsolvable? Well, the
answer is no. This problem has actually
easily solved, with what is known as a
convolutional neural network, and you can
intuitively think of a convolutional
neural network as this kind of layering
thing where these early layers are able
to identify kind of localized patterns
like curvature, so these the layers here
would just say, oh there's a kind of a
curve here or maybe there's a sharp edge
here. And the next layer is gonna kind of
take into account all of these different
you know localized features and build up
a more complex pattern. And you continue
this process until you're able to
identify fairly complex objects at the
at the end. And this has worked great for
these kinds of image recognition
problems. In computer vision in
particular we kind of see this this
growing trend where every time we
increase the the datasets available, and
we have more compute power, we find a
fundamentally different way about
approaching computer vision. In the early
days where we may
had a single image or maybe a very small
number of images doing what we would
have done with a cat it's perfectly fine.
You just hard code all the rules. I mean
it is only a single image it's gonna be
pretty easy. But as the number of images
and the data sets have gotten bigger, you
know, with the advent of just the digital
revolution, we've pushed more and more of
the feature extraction to the computer.
In the 90s kind of early 2000s
we might have hard-coded some of the
features like the ear and the whiskers
but then we added some statistical
inference that could kind of say how
much of those things matter. And then
recently as I mentioned we have
convolutional neural networks. And you
can kind of think of this line as kind
of a transition from 1.0 to 2.0. Here the
ComNet completely learns all the
features. There's no need to mention
anything about noses, ears, or whiskers.
But the programmer is still responsible
for just saying how many layers we have,
you know, how deep is this network, and
kind of the architecture of the network.
Whereas in the future we hope that we're
gonna generate all that and all that
will be learned as well. So it'll the
hope is that this new era of kind of
computer vision will predict us into the
future and make things like self-driving
car possible. So you've heard me mention
this word neural network probably
several times now. If you're not familiar
with it it's just a data structure,
that's actually composed of some very
simple components. In this lower left,
this is called a perceptron. And it's
just, it's just an operator which takes a
weighted sum of its inputs, passes it
through what we call a nonlinear
activation function, which sounds super
scary, until I tell you that the most
common version of a activation function
is called Ray Lu which is the function
that takes the max of zero or the input.
So the stuff is extremely simple, once
you can kind of get past all of the the
language in kind of the wording of
things. On their own perceptrons
are not all that useful.
It's not until we
string them together in this series of
networks where they start to learn
interesting behaviors and you might heard,
here the term, deep neural network and
all that really means is that we have
just a few layer, I mean we have a deep
set of layers, I mean it's not really
well defined, but the point is the back
in the day when these neural networks
were first invented which i think was
actually the 1940s they certainly didn't
have the computational power to kind of
make use of these things and string them
together but with the you know how fast
GPUs and how much processing capability
we have today we've been able to scale
up these networks into just these
gigantic webs that are able to really
learn complex behaviors. So let's take a
look at this learning process. I want you
to just imagine this red line as your
problem or the program you're trying to
create. And these blue dots represent
your data points. And in this example it
doesn't really matter what these mean.
You can think of this x-axis is maybe
the square feet of a home and you're
trying to predict what what you should
price your house up given the square
feet of your home. So, that follows this
kind of upward trend and what we would
like to do is to parameterize a linear
function which is y equals MX plus B
from math class. We want to find
an Amanda B that fits this blue data set
is as best as possible and over here,
this red dot actually is just a
different way to think about this same
red line, except this is how much error
the output of our red line is with
respect to our data points. And what our
goal is is to get this red point down
here because if this is a curve that
represents our error function then we
want to minimize the error with respect
to this red line to its data points. So,
how do we do that? Well,
the way that it works is that we compute
what's called a gradient, which is a
direction on this error curve that
points toward towards some kind of
minimum. And if we can calculate this
thing then we can update our M&R be, such
that this red point descends down this
error curve, and we can see that
simultaneously as this is happening. Our
actual function is getting closer, and
closer, and closer to modeling our real
data distribution. And while this is a
toy example that has been known to
statistics for you know decades.
The reason why your own networks kind of
take this and you know put it on the
crack if you want is that we take
instead of a single variable X in two
parameters m and B our neural network
can have thousands of variables in
potentially millions of parameters. And
this allows us to learn very, very
complex behaviors, and work on very
complex data sets such as imagery. So
what can we conclude from this process?
Well, the conclusion or at least the
argument that I want to make to you is
that software 1.0, you can think of first
think of the space of all possible
programs and anything that you write in
software 1.0 is kind of constrained to
something that just makes intuitive
sense to you as a human being. Which
means you're very restricted to
basically just a single point on this
program space and maybe you can
put you have this point here, and maybe I
make this point here. But in software 2.0
you're using optimization to search
around program space that is you're
updating all of these different
parameters to your neural network that
are essentially a different program
until you find one that is pretty good
at solving the problem that you cared
about. So this is this really highlights
this fundamental shift in the way that
we go about solving programs. And this,
this process I'm going to try to prove
it to you. Well, it becomes very apparent
when you look at agents trained to play
video games, especially old Atari games.
So this is a game called Qbert. Has
anyone watched Wreck-it Ralph? You know,
when they're breaking out of the video
game is that little orange thing with a
snout.
That's Qbert! Anyway, your job as a player
is to control Q BERT to jump on these
colored cubes, on the top of these
colored cubes, and change their colors.
Every time you jump on the top it
changes its color, and you want to change
every single cube in this pyramid to be
the same color. But to do that you have
to avoid obstacles and not die. And I
think you can fall off as well. So, how
would we write a bot or a computer
program to play Qbert?! Well, we might use
something like the a-star pathing
algorithm and we might say for every
step that Qbert is at, or you know
current state, we'll find a cube
somewhere that hasn't yet been populated
by us, or the color hasn't been changed,
and we'll compute a path that takes into
account these enemies maybe that that's
some really high weight in your path and
he'll find even possibly the shortest
path to you know find the cube of choice
then it'll pick the next cube until its
descendant on everything. So this sounds
like a pretty optimal way to play Qbert
but let's see how a reinforcement
learning agent called canonical ES
decided the best way to play Qbert was.
So, far everything looks pretty
standard, nothing is out of the ordinary.
Just plain Qbert. See he jumps! Oh! so
what's you're witnessing is this
reinforcement learning agent found a bug
in the game code of Qbert. It's found
this sequence of dance moves that when
done in the exact order just continues
to make the scoreboard go up, and up, and
up, and up.
So, apparently this is the best way to
play Qbert!
And the point I want to make here is
that this makes no intuitive sense to a
human. Right! We would it's unlikely that
any of us would have found this bug I
mean QBERT's been around for the
greater part of 35 years and to my
knowledge nobody has ever found this bug.
And why this while this might sound like
a corner case. This is true of many
programs that we write in the software
2.0 world. They seem to just find a good
approximation or a good strategy for
achieving our goal and how it does that
may not make any sense to you. And I to
quote the professional Chinese go player
Kuji on his experience playing against
deepmind's alphago master AI he says,
"After humanity spent thousands of years
improving our tactics computers tell us
that humans are completely wrong. I would
go as far to say that not a single human
has touched the edge of truth of go".
To me that's pretty powerful.
Just to recap, Software 1.0, we explicitly
engineer systems. Software 2.0 we're
finding programs by using optimization
which you can think of as directed
search using training data as the guide.
So, does this mean that software 2.0 is
the ultimate silver bullet? Well, not
exactly!
For one thing building these things is
very time consuming and laborious.
You know, Google deepmind was able to achieve
what it did by using
hundreds of thousands of what they call
tensor processing units in the cloud.
And they train this the system to play
against itself for like what was seven
weeks on this massively distributed
cloud system which is the equivalent of
like, I don't know, was like a thousand
years of gameplay. That's certainly not
practical for everybody to do for
development, and the other problem is
that building these systems is a little
like a scientific experiment. We kind of
set up our experiment, we kind of wait
for a second to see what happens, we
check the results, did it solve our
problem, and if the answer is no-we
repeat that whole process again, and
there's a lot of kind of unproductive
waiting around in between these training
sessions. And there's some other
limitations that I'll get to in a bit. So,
you can think of this again as we're
building programs from data. So, if we're
building programs from data then what is
the software developers job in the
software 2.0 world?! For one thing, you
need to find the data that's going to
somehow describe your problem and you're
going to need to find the data that
somehow describes the solution. And this
is not an easy task. For my project with
the minecraft team in Microsoft Research
trying to find a data set that
represented exactly the type of terrain
that we wanted to generate. It was very,
very hard. And we came up with all kinds
of just weird and what we thought were
clever techniques, and even things like
machine learning on machine learning to
try to gather our data sets. So, it's
definitely not an easy problem. The other
thing to mention is since you're
learning from your data, garbage in
you're gonna get garbage out. So, if
you're not careful to find data that
generalizes then you're just going to
learn on some specific data or noisy
data and it's not going to learn
anything. Also to mention you might hear
something about neural networks being
biased but neural networks um cell
have no ability to be biased. The neural
network is just an objective computation
of these perceptrons. What is biased or
could be biased is the data that you've
presented the neural network, the data
that you trained on. So, essentially it
learned on the whatever your data set is
you need to make sure that at least it
represents your what you actually
wanted to represent because it's going
to behave exactly the way that you gave
the data. So that's part of your job as a
programmer in the space but you spend
probably around 80% of your time just
gathering data. That's what this world
looks like. So, when it comes to machine
learning or when it comes to software
1.0 I should say, we've become pretty
productive. But what do the tools and
software 2.0 look like in that stack?
Well, it doesn't really exist.
Unfortunately, in software 1.0 we have
these things called IDs which package
our text editor, our compiler, our
debugger, our deployment tools, everything
in one like nice little box with a bow
on it that we can use to be very, very
quick about building programs. And
software 2.0 such a thing doesn't really
exist. What we have are a bunch of
disjoint frameworks like Google's
tensorflow,
Facebook Spy torch, Microsoft CNTK,
and we even have some higher-level API's
like Keras but we have nothing
that's really a complete story. And also
we don't have very good debugging tools
either. So, we need something better. I
hear a lot of criticism about neural
networks because they seem to be this
black box we've trained this function
that map's some inputs to some outputs
but we have no idea how it works, or why
it works. But I want to make it an
analogy to you that kind of says that
that doesn't have to be the case. I want
you first can we all agree that trying
to understand someone
this code is a lot harder than trying
trying to read our own code. Yeah, right
so that's hard. And software 2.0 it's
kind of that on steroids? Right? Not only
did you not write the code, but no human
wrote the code, the optimization wrote
the code, and to top put the cherry on
top, you're looking at this code at the
Assembly level. You're just looking at
the lowest level instruction set of code
you didn't write. And debugging this is
going to be extraordinarily difficult. So,
I proposed something better. When we
debug programs it's not very often that
we sit there and sift through the
Machine code anymore.
Typically we map these
instructions back to our high-level
source code using symbols. So, why can't
we do the same thing of software 2.0, why
can't we have programming languages
which are higher level and have a higher
abstraction meaning to humans that
compile down into the lower-level neural
network? And then when we want to debug
them we map the weights and biases, and
activations back to our higher-level
languages, and we might be able to
understand what they're doing a whole
lot a whole lot better. So, this is a tool
called Gann lab which is an attempt to
try to understand what the heck is going
on during the learning process. And while
itself is more of a toy rather than a
real tool, I think it gives us an
interesting glimpse to the kind of
programming tool set of the future. As I
said before that training these networks
is kind of like a sign science
experiment, there's all this wasted time
in between things. What if we could watch
the training process happen live and
potentially even influence it before it
starts going down kind of a junky path?
That way we don't have to spend so much
time
resetting up our experiment and letting
this thing run. And we can also
maybe try to at least build some kind of
intuition about what it learned at the
end of the day. And this will be very
important when we try to prevent kind of
catastrophic failures. And what are these
catastrophic
failures you might ask? So, there is a,
I think, the most cited convolutional
neural network is called Alex Nets.
I think, it has like 30,000 citations in
academic literature. So, it's a very
well-known image classification network.
And it's able to recognize this image on
the left as a bus. Which is good. Well,
what do you think happens when we take
all of the pixel values of this bus and
sum them with this what seeming looks
like noisy image? What do we get?
We get another bus. So you would
think or you would infer that Alex net
would also recognize this as a bus.
Is that a safe assumption to make?
Well, you're right.
Alex net identifies this as an ostrich
and to top it off it's ninety nine point
seven percent sure of it.
So, what happened here?
Well, this noisy image is not noise. This
is a carefully crafted image that was
designed to trick Alex net. The
researchers were able to look at the
network and take the already trained
image classifier and retrain it. But this
time not to minimize an error function
but to maximize a error on a particular
desired training example. So, they wanted
to find how they could change the input,
such that it would just maximize the
ostrich label, and so they could then
take you know subtract that image from
the bus and then they found this noise
pattern. So, to define an adversarial
example it's an image that has
been augmented with imperceptible
non-random perturbations that
arbitrarily change the network's
prediction. So, the question that you
might ask next is what is the minimum
number of pixels that need to be changed
in an image to flip its label from one
class to a desired class?
Anybody guess? You think the minimum
number of pixels these? It's one. At least
it's been shown in some examples to be
just one. So, these are all examples of
where researchers have from it's a
Japanese university Kyushu University.
They devised this algorithm that can
find a particular pixel to change it to
a particular color, that flips a class.
From in this example from ship to car
with an almost hundred percent certainty
and this image of a horse to an image to
a label of a frog with basically
absolute certainty. So, this is a real
problem. And you can imagine if we have
these self-driving cars like say Waymo
or the Google self-driving car project,
and it's driving along the street, and
this team of criminals has discovered
that there's this particular pattern of
colors and they you know they go in
their stencils and they're exacto knife
and they take these little
cellophane strips and stick them on a
stop sign. As a human in a self-driving
car you don't really pay attention you
may not even be able to perceive those
cellophane pixels but suddenly you're
you're self-driving car is sure that
what it's saying is not a stop sign but
it's a green light or it's an empty
intersection. This is bad. So, this is an
open area of concern and it's kind of a
strange property of neural networks.
Again, they don't learn intuitive things
to humans, they just simply learn some
kind of high dimensional mapping from
some input/output. So, we need to make
sure that these things are safe for
human use before we deployed them in the wild.
So, let's look at some of the economic
opportunities. So, there was this team of
grad students at Stanford who decided
one day that they wanted to go out into
some fields and take pictures of plants.
They took their cell phones and they
walked around and they took the images
of the tops of cabbages and lettuce,
and also
weeds. In their goal here was to build an
image recognizer that could identify, you
know, given a picture of a crop where
the weeds were and what the real plants
were. And then they turn this into a real
product. They built this kind of like
a tractor combine add-on which has these
like actuators that spray
herbicides. So, rather than do what
farmers do today which is blanket
everything and herbicides and then make
sure that their plants have been
genetically modified to be immune to the
herbicides. This is a big problem because
a we're using a lot of herbicide which
means that our weeds, the weeds that do
survive, are ones that tend to be more
likely to be immune to the herbicide.
So, they started artificially evolved to
be very immune to herbicide. And that
means that we have to use even more
because now they're a slightly immune to
it. But with this system since they're
able to precisely identify what a weed
is they can deploy the herbicide just to
the wheat, and they can avoid spraying
all their crops with the herbicide. So,
this is shown to have a 90% reduction in
the amount of herbicide the farmers
needed. And then they sold this company
to John Deere for 300 million dollars. So,
finally I just wanted to end with some
closing statements: for one, the 2.0 and
Software 2.0 isn't meant to indicate
that it's the new way to write software,
and that 1.0 is this obsolete
version of software development it's not
true.
We don't suddenly render this
decades-long investment in software
development suddenly out the door. It's
just that we want to distinguish it
because it's a fundamentally different
approach to finding solutions to
problems. The other thing to note, is that
software 1.0 and 2.0 are completely
compatible, and they both have the
trade-offs, and it's really just up to
engineers in the field to decide when to
deploy one strategy versus another.
You just have another tool in your toolbox.
And lastly,
you all heard the old saying, "Practice makes perfect", but this implies
that we can't do any better.
What if our definition of perfect is just one of many?
So, instead I like to say, "Practice makes for a pretty good local minimum".  Thank you.
[ Applause ]
So, I guess
I'll answer anyone's questions you have.
Like you have a mic or are we just gonna
raise hands or...?
[ From the Auditorium: ] Let him yell! 
[ David: ] Let them yell? Okay, great!
[ From the Auditorium: ] For the canyons that you are trying to make for your work like Minacraft, what type of data sets are you looking for?
Do you like 3D models or Pixel of canyons cells
[ inaudible ]?
[ David: ] Yeah, so the very first thing we tried
because we wanted it actually,
if you're familiar with a procedural generation algorithm called Perlin noise.
It's only capable of producing 2D height maps and
that's a problem because it means you
could never describe something like an
arc, because you're only ever gonna have
one height value for any kind of XY
location on a grid. So, you can't have
kind of these things that have multiple
points so we started actually with
what's known as lidar in a format called
LAS and lidar is just point cloud data.
A point cloud is just kind of a 3D volume
of points in 3D space. And we trained our
network on that data. The problem is:
those data sets are much harder to find
in high resolution. So, Google on Google
Earth they have a data set that's very
very detailed on the arches national
park, that data set is proprietary. So, we
were stuck just scouring the USGS
geological website and unfortunately we
couldn't even find 3D data on Moab Utah.
So, we actually switched it to a white
Canyon which is also in Utah
somewhere. And it had some success but
doing things in 3D was very, very, very
expensive computationally. So, for the the
project that we're working on now with a
Microsoft Research and minecraft
we scaled it down just to height maps
again with the intention that eventually
we scaled this thing up and go back to 3D. So, again acquiring datasets is a very challenging problem.
[ From the Auditorium: ] If you could go back to that set of slide shown earlier about the school bus and [inaudible] like an ostrich...
so, that is the state that's an algorithm that's going to be like for example in a self driving cars.
[ David: ] Yes correct.
[ From Auditorium: ] So if you look at the image analysis of the stop sign and there is graffiti on the stop sign, or there is dirt,
or anything at all like that, or any alterations of that sort, how can you have any level of certainty where the algorithm is kind of correct?
[ David: ]Right, so remember that the mud
is kind of random, right? It's not
carefully crafted. These
adversarial examples are you have to
look into the network themselves to see
how the weights and biases get
manipulated with particular inputs. So,
things like mud and stuff are really not
a problem for image classifiers. It's
only if you deliberately try to fool
them is it even possible. The second property of these things when you
move from imagery to video it's kind of
a saving grace. The likelihood that this
exact pattern of noise again fools the
network at a slightly different
perspective goes down exponentially. So, a
car that's moving toward a stop sign is
much less likely to be fooled by a an
attack like this.
However, the attacks could be more
advanced and we're not sure to what
extent can these properties be exploited
but certainly when you when you move
from stationary imagery to video you
decrease the likelihood from an attack
of this of this nature. So that's kind of
a thank goodness and then they're also
mixing non imagery data as well, they're
using I think sonar. There's also a sonar
detector on the Tesla's. So, the hope
is that you can kind of have redundancy
between different sensor technology. So,
even if you were able to fool one
they'll likelihood that you could fool
off out is much lower. So, yeah this is a
known problem and there is active
research. I don't remember. I can link you
a paper at the end if you want to check
it out.
[ From Auditorium: ] When first saw the slide, when I saw that middle image, I thought what we were looking was differences and the pixel reception on a CCD
Because they are different from one to the next, right....they can be different if you capture an image on a CCD?
The pixels from one to the next, they respond slightly differently from one another, although they shouldn't.
I thought that is what we were looking at first.
[ David: ] Oh, no, we just looking at, just a....
[ From the Auditorium: ] A deliberate, a deliberate pattern that someone came up with for the algorithms, is that right?
[ David: ] Yes, that's correct and they had to
have access to the network.
So, that's the other kind of key here.
Without being able to take the already
trained network that you know is
deployed and re optimizing it to
maximize the error essentially. It's not
really possible. I mean, there are some
attacks that do it through inference but
it's very slow. You have to look
at output, look at an input. Without
having access to the network it's a much
harder attack.
[ From the Auditorium: ] So, a couple of things. For one this particular [ inaudible ], would that only work for identifying buses or would that , is that get [ inaudible ] image in general?
[ David: ] This particular noise pattern is only to fool a particular network called Alex net.
On this particular example. Yeah it does not
generalize. Sir?
[ From Auditorium: ] For a particular image. So if it's a video and you are moving then...
[ David: ] Poof, yeah.
[ From Auditorium: ] Probably won't fool anyone the next time.
[ David: ] But in this kind of simple attack. However, if you're thinking about other technologies
like facial recognition technology that
may just take a single screenshot of an
image and then make some decision on
that. This could be a problem there. Also
we're not sure what else
you can do with this? Right! Like could
you build some kind of more general
sequence of noise that rather than just
for one particular example to a very
very strong degree? Can you find just a
noise that tends to fool things in general, 
maybe not as strongly but just enough to
push the label from being a bus to an
ostrich. So, rather than have this 99%
certainty you could maybe again we're
using this technique that lets us search
this big complex space problem space to
find these weird inputs. It's possible
that we could find something that is
more general as an attack.
[ From Auditorium: ] The other thing I am worried about in this specific example, is that something that a [ inaudible ] would be able to counter the system?
Yes, without an access to the network you can do it
still but it's you have to basically try
and input, try it and look at the output.
So, it's very slow because you have to
literally run the whole feed-forward
propagation to wait for the output.
But there is actually
a paper where our researchers have done that.
It was on very low resolution
images. So, we're not sure how practical
it is in the real world.
[ From Audience: ] How would you be go about testing all of this AI and
[ inaudible ] as a tester I am looking at it  [ inaudible ], if it's one pixel, that's a very complex step?
[ David: ] That's a great question!
Testing this stuff is tricky and when
you find an edge case and this is
something that Carpathia mentions in his
his dock where they may never have
encountered this scenario before, the
really only way to combat it is to go
find data on that on that example and
then retrain the network. But certainly
you can imagine that if you'd written
unit tests, if it was even possible to
write unit tests for a neural network,
every time you retrain it potentially
that would fail. So, testing these things
is hard and you can test the general
case pretty easily. Right? You run
images, millions of images of buses and
make sure that they're all correct. And
you're like 99.9% accuracy pass our test.
What sometimes when this things fail,
they fail hard. I can give you an example,
there was this other image network, I
think it was deployed, at least as a kind
of a test for a quartz to analyze
images of people, to identify if they're
like they meet a particular description
of a suspect. And the researchers took
this network or the architecture and
training it on pictures of Huskies and
wolves to identify the between the two
and they found an example of where the
network identified a husky dog as a wolf,
and they dug deeper and they looked
at the activations that were being lit up
when it was looking at a picture of a
husky and they discovered that the
reason why it said it was a wolf, was not
because of anything the dog at all.
It was just the snow in the image. So,
essentially the network had just learned
in the presence of snow, it's a wolf. This
is not desirable. So, even though we're
achieving these insanely good high
accuracies we have to be careful about
bias and we have to be careful about
what these things are actually learning.
If you have a data set, where something
is true 98% of the time, you can building
a network that has 98% accuracy by just
returning true. It's not a
good model, it doesn't predict anything.
So, we need that you need to look at not
just the the total accuracy of these
things, we need to look at the recall, and
when you look at the precision, and we
need to see that when we come up with
false negatives that we don't come up
with really bad false negatives. So, that
is an absolute area of concern. And I'm
not sure how to test it.
[ From Auditorium: ] So, you when you describe  [ inaudible ] are generally simple  [ inaudible ]
is it plausible that for maybe like some applications of Software 2.0 having more
complex algorithms being run at each one of those be more productive the just
do simple... [ David: ] so simple ads and additions, and
multiplications, yeah. I actually kind of
thought about this exact thing earlier
today. I was trying to think, could you
build a web browser renderer
just by taking inputs of HTML and
looking at the output pixel buffer from
chrome? And the cool thing about this is
that you have tons of training data
right you could just literally randomly
scour the Internet look at HTML pages
and look at what they rendered and it
could you train a neural network page
rimbor. I thought it's gonna be pretty
hard when you're trying to render an
image link. You have a URL. What the heck
is that image gonna be? I mean, you're
what it's gonna learn is going to be
maybe some pattern in the URL name and
it would, you know, maybe if it has like
some name in there, like a
celebrity it would sort of make that
celebrity. But if what you had an
operation and neural network that could
do a WebKit call? Like, it could
actually take the link and go download it.
I think with that kind of an operator,
you could actually, probably build a
fairly decent renderer. So, that was just
an idea that I had where instead of just
summing weights and biases, or
doing ads and multiplications, could you
have an operator in the network that
says at this time I'm gonna go download
something? So, I think there is lots to be
explored. We really don't know how far
this stuff can be, we can take this stuff.
There's a kind of a silly paper on
archive called "Gradient Descent" by
gradient, sorry.... It's a using, it's like
learning basically how to structure
network by using a neural network to
learn structure it. So, it's kind of like this
recursive, self learning thing. And while
it hasn't been totally successful
the principle we could use it to find more
complex network or kind of basic network
types, these nodes, these perceptrons. That could maybe lead to more complex
behavior but do know that neural
networks are universal function
approximators.  Theoretically they're
capable of mapping any input to any
output. So, they could technically learn
to do a web page download somehow by
manipulating your machine. In theory it's
not likely but it could happen. But I
think you're right. Yeah, those are areas
to be explored certainly. And we do have
other operations in just the perceptron.
So, convolutional neural networks use no
types that do kernel convolutions. If
you're not familiar with that. It's just
a little matrix that you pass over an
image and you apply some operation to
this neighboring pixels. So, in
Photoshop you have like a Gaussian blur,
mean blur, and all of these kind of like
filter sepia and stuff, essentially the
the blur is just passing this little
called a convolutional kernel over the
image and averaging the surrounding
pixel values. So, that same concept is an
O-type in a convolutional neural network.
Rather than you tell it what weights are
in that kernel, like how important are
certain pixels to some for this blur.
It learns those values. So, there's certainly more to it than just
perceptron but in general that's kind of
the basic operator.
Another question?
[ From the Auditorium: ] I think it's a wrap
[ David: ] All right, thank you.
[ Applause ]
