PRESENTER: So let me introduce
Max Tegmark, our Max,
to the weekly CBMM seminar.
It's great to have you, Max.
And it's a special day
because you are here,
and because it's your birthday.
Happy birthday to you.
Everybody should [INAUDIBLE].
There is, let's see,
reactions like this.
MAX TEGMARK: Ooh.
PRESENTER: [LAUGHING]
Happy birthday.
MAX TEGMARK: Thank you.
PRESENTER: And for people
who don't know you,
I imagine not many, but Max
is a cosmologist, a physicist.
He's the director of the
Future of Life Institute.
He's a very good writer.
This book that he wrote,
Our Mathematical Universe,
is great, highly recommended.
And he also likes AI.
So being a physicist,
you will speak today
about physics for AI and the
AI for physics, or vice versa.
Max.
MAX TEGMARK: Thank you so much.
It's a great birthday
present for me
to get to talk with
you about stuff
that I'm super excited about
and to see you all here.
[INAUDIBLE] physics
and physics for AI.
AI for physics, I mean
how physicists can
use AI to do physics better.
And by physics for
AI, I mean kind
of the opposite, how physics can
hopefully give something back
to help machine learning and AI.
We all know, of course,
how amazing the progress
has been in AI in recent years.
Just think about it.
Not long ago, robots
couldn't walk.
Now they can do backflips.
Not long ago, we didn't
have self-driving cars.
Now we have self-flying
rockets that
can land themselves with AI.
Not long ago, AI couldn't
do face recognition well.
And now not it can
not only do that,
but it can simulate
[INAUDIBLE] face
saying things he never said.
Not long ago, AI
couldn't save lives.
Now we have actually
quite useful AI,
machine learning diagnostics
for prostate cancer,
lung cancer, eye diseases.
We have AI that can win
the annual protein folding
competition and do such
a good job on matching
the three-dimensional measured
X-ray crystallography shapes
that it's probably going to help
accelerate drug discovery soon.
Not long ago, AI
couldn't beat us at Go.
And now it's crushed
not only human gamers
at Go and at chess,
but more interestingly,
it's crushed the AI developers
who spent decades hand
crafting software to
play these games, which
is all obsolete now by having
Google's AlphaZero just play
against itself for
under a day, right?
Thanks to [INAUDIBLE]
postdoc [INAUDIBLE]
and his gang at DeepMind.
So if all this
progress is happening,
how can it be used
to help physics?
In lots of ways, obviously,
for example, just right here
at MIT in our
physics department,
I put together this
long list of how
a significant fraction
of my colleagues
there are using machine learning
to do their physics better.
I'll just give you a few
examples here to start off.
We have machine
learning used right now
to detect gravitational waves
better as massive black holes
a billion light years away
or thereabouts crashing
into each other, distorting
spacetime to such
a small amount you have to
measure to 22 decimal places
to see the thing.
Machine learning is great
for picking up these signals.
Other colleagues
are using machine
learning to detect
extrasolar planets
and other solar systems.
That's part of the reason we
have over 4,000 discovered now.
Machine learning is being
used to analyze data
at the large Hadron
Collider at CERN coming in
at these crazy data
rates, where you
can't have grad students do it.
And actually, most of the
hardware cost right now
for CERN isn't the building
the magnets and stuff.
It's the hardware for
running the machine learning.
How can physics
pay back its debt
and help machine
learning a little bit?
Well, both through hardware
and through software.
On the hardware
side, for example,
my colleague [INAUDIBLE]
in the physics department,
whom a number of you know, has
developed this optical chip
for faster machine learning.
So it looks like your regular
chip, a little black thing.
But instead of the
computation being
done by electrons moving
around in two dimensions,
the computation is done
by photons moving around
at the speed of light.
And it turns out that this
is incredibly well suited
for matrix multiplication,
which is, of course, one
of the key things we do
in our neural networks,
which it can perhaps
dupe on at scale
about a million times more
energy efficient than today's
chips.
So I think we'll be seeing
a lot of improved hardware
that will help a lot.
What about on the software
side, algorithm side?
What can physics
do there for AI?
Well, of course, AI still has
plenty of challenges, right?
Not only that there
are tasks we don't know
how to do, but also problems.
For example, you've all heard
about how this machine learning
system was deployed across
courtrooms in America.
But people just hadn't
understood well enough
how it worked and hadn't
realized that it was actually
racially biased.
And there are so
many other examples
of where we just didn't
understand our machine
systems well enough.
And that caused problems.
Boeing certainly wishes
that they had better
understood the very
simple automated
system that controlled the 737
MAX before they deployed it.
And the traders
at Knight Capital
certainly wish they
had better understood
their automated trading system
before they deployed it.
It lost $10 million per minute
and kept going for 44 minutes
until someone finally
caught on and turned it off.
Now, can you raise your hand if
you ever had a Yahoo account?
See if we see any hands.
Yeah.
So if you did, you were hacked
because all three billion Yahoo
accounts were hacked, right?
Raise your hand again if
you have a credit card?
It was probably also
hacked because Equifax,
all of their credit card
information was breached.
These sort of problems,
the first reaction people
tend to have is say,
oh, that has nothing
to do with computer science.
It's because of the evil
hackers that did it.
But then we know better.
The truth, of course,
is that here, too,
the fundamental problem was that
these systems, these security
systems, were not
well enough understood
by those who deployed it.
They hadn't understood
that there were actually
loopholes in them that the
hackers could exploit, again,
showing how there's
a lot of value
if you can just get things
you understand better.
I'm particularly excited that
you, Tommy are here today
because I love how
you, Tommy, draw
the distinction between the
engineering of intelligence
on one hand and the science of
intelligence on the other hand.
And you like to point out that
the engineering of intelligence
is very much about just trying
to make things work, at least
work well enough that you
can make money off of them.
Whereas the science
of intelligence
aims not just to make it work,
but to ask why does it work,
how does it work,
at a deeper level?
And I think this is really
the key to addressing
the kind of challenges I
mentioned to get a deeper
understanding of your systems.
And my research group--
I'm so grateful that I get
to be a part of the CBMM.
Here's what some of
my students looked
like before the lockdown.
And below you can
see what they look
like after the lockdown,
where they all have rectangles
around their heads.
But we focus on what I like to
call intelligible intelligence,
which is exactly this idea that
the more you can understand how
your machine learning
system actually works,
the more reason you
might have to trust it.
I'd say intelligible
intelligence not explainability
because this is a
more ambitious goal.
I'm not talking about a
system that can say some blah,
blah, blah, to
you in human terms
and explain why it
diagnosed you with cancer.
I'm talking about able to
understand things at a deeper
level so you actually
have reason to trust it
because you understand it.
It's an ambitious goal.
I'm going to tell you
today about four projects
at the interface
between physics and AI
that all have bearing on this.
So let's start off
with this one, which
is a paper together with a
former grad student, Tailin
Wu, who is at Stanford now.
And if you're here
Tailin, hello.
So to motivate this, let's start
by taking a machine learning
task which is very easy these
days, where you just retrain
the QN network from
Google DeepMind
to play this Atari
game, which many of you
have played as kids.
In the beginning, it sucks.
It keeps missing the
ball all the time.
But pretty quickly,
it gets quite good,
catches the ball every time,
plays, and discovers this trick
that you should always put--
you should make a hole and then
always keep putting the ball up
into that hole in the corner
and just rack up the points.
This feels intelligent, right?
But how does that actually work?
Is this intelligible?
Can we, for example,
trust that it's always
going to work this well?
Well, in this case,
I can show you
exactly how it works because
we just trained this network
on one of the computers at MIT.
And this is how it works.
It takes the pixel
values that give
the colors of all the pixels
on the screen, multiplies them
by a big matrix, applies some
nonlinear transformations,
more matrices, et
cetera, et cetera.
We know all of these
867,488 parameters.
Is it crystal clear
now how it works?
No, this is completely useless
as an explanation, right?
If this were instead some very
mission critical software that
was controlling the vision
system of my self-driving car
or whatever, I would have
absolutely no guarantees
because I just have no clue
how this is actually working.
So can you do better?
I think yes.
And I want to start
by just dispelling
a myth that some people
seem to have internalized
that the fact--
that the power of
machine learning somehow
comes from its mysterious
inscrutability.
I think this is
complete nonsense.
But some people seems sort
of resigned to the idea
that we can only
get this great power
because the secret
sauce is somehow
related to that inscrutability.
I think rather that the
power of deep learning
comes from its
differentiability, by which I
mean every single choice of
parameters in a neural network
still does something.
You can still take the gradient.
And you can, therefore,
get information
about how you should
change the parameters.
So you can quickly
get to the right place
in an exponentially
large search space
instead of just
practicing at random.
And that opens up
the possibility
that you could maybe have
something that does just
as well but is much simpler.
So I'm going to do a little
test of your own neural networks
by just showing you
something more complicated
than that ball in
the Breakout game.
And let's make this a
little bit interactive.
Try to predict where the
ball's going to go next.
And whenever you see any sort
of pattern with your brain,
shout it out so we
can all hear it.
Don't forget to unmute yourself.
Start by the most
obvious things, just
don't overthink this.
What-- is there any regularity
at all that you see here?
AUDIENCE: Well, sometimes
it bounces off a wall.
But then there are
other times when it just
turns around right
before it hits the wall.
So this the confusing part.
MAX TEGMARK: Very good.
It seems to bounce
sometimes against walls.
So there seems to be walls.
But where are the walls?
Are they just in
a triangle shape,
or is it a circular wall?
Or where are the
walls, would you say?
AUDIENCE: I think you're tracing
the inside of a [INAUDIBLE]
digit.
MAX TEGMARK: [LAUGHING]
That's an interesting idea.
Can anyone say anything
about any kind of wall, what
where it might be, or
whether its shape is?
AUDIENCE: It seems to be
a square, a rectangle.
MAX TEGMARK: It seems to be
a rectangular wall, yeah.
So when it hits it,
it seems to bounce.
And then when it's not
bouncing, how is it moving?
Is it always moving
in a straight line?
Or is it like there's some sort
of force acting on it or what?
AUDIENCE: [INAUDIBLE]
MAX TEGMARK: Does it
seem like the law,
that whatever the force is, that
it's the same force everywhere?
Or could there be
different kinds
of forces on different
parts of the screen?
AUDIENCE: It could be
held by an elastic band.
MAX TEGMARK: Yeah, maybe
it's held by an elastic band
so that it's doing
like a harmonic
oscillator, sinusoidally
oscillating, at least
on some part of the screen
but maybe not all parts.
AUDIENCE: It looks like a
gravitation [INAUDIBLE]..
MAX TEGMARK: Oh,
this is really cool.
So you were saying that
another part-- one of you
said that in one
part of the screen
it looks like it's
doing harmonic motion.
And then another
part of the screen
it looks like it's
gravity doing [INAUDIBLE]..
AUDIENCE: Yeah, like a slingshot
maneuver, it spins out.
MAX TEGMARK: Yeah.
So your neural networks
are really great, right?
You're not just telling me
a vast number of parameters.
You're giving me
some real insights
as to what's going on here.
So what happens
if we just throw--
write a simple feed-forward
neural network at this
and train it to just predict the
next position from the past two
positions, for example,
minimizing the loss?
It can do a pretty good
job of predicting it.
But if you try to predict
far into the future,
it starts sucking more and more.
What intuition
does that give us?
What understanding
does it give us?
Nothing, basically.
Here are the
parameters we got when
we trained a neural
network to predict this.
So how can we do better
So what Tailin and I did was
we borrowed four very old ideas
that have been
successful in physics
and deployed a machine
learning version of them.
So let me talk about these
four ideas one at a time.
Maya, there's a squirrel
attacking the bird feeder
here if you want to
try and chase it away.
Its neural network
is very smart,
and it's been looking at
this bird feeder for days
and actually [INAUDIBLE].
So the first one
is Occam's razor.
In physics, if we have a
simpler explanation that's
just as accurate as a
more complicated one,
we tend to prefer that one.
And Ray Solomonoff
put this principle
on a firm mathematical footing
with complexity theory,
together with [INAUDIBLE] and
[INAUDIBLE] and other greats.
The only problem is that
their definition of a simple--
or of complexity is NP-hard
hard to evaluate, generally.
So what Tailin and
I did was we figured
physics got a pretty long
way, made a lot of progress
using Occam's
razor, even though I
was a little bit vague and
fuzzy in how we had to find it.
So maybe we could, too.
So we defined a much
simpler complexity criterion
that's very fast to evaluate.
We said that if you
have an integer,
the amount of bits
of information
you need to store it is
just how many digits long it
is in binary.
So basically you take
the log of the integer.
If it's a rational
number, it's just
the two integers, the
complexity of the numerator
plus the complexity
of the denominator.
If it's a real number, well,
you can convert it to an integer
by dividing by the
precision floor of your CPU
and then take the
logarithm of it.
So if you now want to know--
make a plot of how complicated,
how complex is a number,
if you looked at
the diagram here,
you get this very,
very interesting thing
on the y-axis.
If you take generic
real numbers,
the complexity just grows
as a logarithm of it.
That's the thick red
line there, which
would scale up and
down [INAUDIBLE]
you change the precision floor.
But of course, if it
happens to be exactly 5/3,
then it's certainly
much simpler.
And generally, fractions
with small numerators
and denominators are simpler.
And you'll see later on
that the code we have
will automatically try to
minimize the complexity
and discover simple
fractions when they're there.
If we have a lot
of data, if we have
a model with many
parameters, then we
define the whole complexity as
just the sum of the complexity
for all the parameters in it.
But we also look at the
complexity of the data.
So if you just have a really
lousy model that always
predicts zero or
something, then you just
have to store the
whole data set.
If you have a
model that predicts
the data set pretty
good, then you only
have to store the errors that
you make in your predictions
after you apply the model.
So we sum up the total
complexity of everything.
And if you look at
how this plays out--
I just want to give
it a little shout out
to this very simple measure
of complexity of numbers
because it actually
automatically gives you
a much more robust method of
fitting data than chi squared
or minimizing the mean squared
error, as you can see here.
Because if you have one bad
data point, as in the left side,
the mean squared error
will give a lot of weight
to those points that are
quite far from the model.
And it will always
compromise and pull away
from the good data points
towards the bad data
points a little bit.
Whereas this other information
theory-based method,
it has the opposite incentive.
It has an incentive to just
keep doing even more accurately
on the things it can already do
accurately and ignore the rest.
So it'll just fit perfectly
on the stuff, which is good.
A second idea from physics
that we throw into the mix
here is one that goes
back to Julius Caesar,
to divide and conquer.
There is this story
that when Galileo
was sitting in church
400 years ago, maybe
being a little bit
bored by the sermon,
he noticed that the
chandelier was swinging
and tried to model this.
He didn't try to make
a model to predict
everything about our
universe at the same time.
He ignored what the
priest was saying.
He ignored the color of the
chandelier, everything else,
and just focused on the
angle of the chandelier
as a function of time
and tried to predict
that using his pulse.
And when he did this
he revolutionized
our understanding of
mechanics in physics.
So in the same
spirit, what we do
here is instead of trying to
create one big model that's
going to predict everything,
we put an ensemble of models,
that each is incentivized
to try to specialize and do
well on some aspect of
the data, for example,
in the case of
ball you saw maybe
on some part of the screen.
And we basically take
the harmonic mean
of how well the
different models do,
and we can prove that this
encourages specialization.
You're going to
see how it works.
We also use lifelong learning.
A human physicist doesn't
have to invent everything
from scratch every time
they see a new problem.
Similarly, we put
this AI physicist
in a series of
environments, and they
could use what it had
learned previously
and unify and apply
it to the new ones.
So let me just
show what happened.
So we take this kind
of data, and we feed it
into the computer like this.
We just give x- and
y-coordinates of the points,
and off it goes.
And after a bunch
of training, it
discovered entirely
by itself that there
seemed to be four
different domains where
the rules were different.
We didn't tell it that they were
supposed to be four domains.
We didn't tell it where the
boundaries were, either.
It just learned that.
And within each domain, it
was able to do quite well.
And we can already see
now, as human physicists,
that you guys were all right.
Whoever it was who said it looks
like a harmonic oscillator,
you were right.
That's what it was
doing in the upper left.
I didn't catch who said
gravity, but you were right
in the lower-left corner here.
There was an electromagnetic
field on the lower right.
And there was no force
in the upper right.
So if you look more
specifically now,
for example, in the lower
left, what it has learned is,
when we use Occam's
razor to simplify down
the neural network is-- that to
predict the x- and y-coordinate
of the next step of
where it's going to be,
you take the previous
x- and y-coordinate,
and you multiply by this matrix
in the upper left corner.
And then you add this
vector constant also.
Now, when you're a human,
though, and you look at this,
if you see it say 1.999990,
what do you think?
What is this?
What's your gut reaction?
What is it trying to tell you?
AUDIENCE: Loading point error.
MAX TEGMARK: It's
probably trying
to tell you that's really
supposed to be 2, right?
But fortunately, we have this
quantitative information theory
framework to test
whether it actually
fits better if it is 2.
And it discovered, sure enough,
that that should be a 2,
as you can see on the next line.
But it also similarly discovers
that the small numbers
on the right, they're not
supposed to be replaced by 0.
And if you just get
rid of the matrices
and just write out
what this is doing,
it's discovered a
difference equation,
which we know how to
transform into differential
equation, which we recognize
as Newton's laws, which
then auto discovered here.
We ran this on 100
different worlds
like this, with different
domains and different laws.
And in the summary
here, [INAUDIBLE]
performed about a
billion times better
than just the simple neural
network in terms of accuracy.
And it was also able to
learn with a lot less data
and a lot faster.
So intelligibility
isn't nice just
for helping gain trust
in what you've learned,
but it can also really
aid performance.
So let me give you another
example now of some
of these physics tools applied.
Here, the physics formulas
that are discovered
were kind of simple, right?
So together with
Silviu-Marian Udrescu--
salut, Silviu.
He's also on the call here.
SILVIU-MARIAN UDRESCU: Hi.
MAX TEGMARK: We
decided to see if we
could do tougher, harder ones.
This is a paper that we just
published in Science Advances
a couple of weeks ago.
And symbolic regression,
what is that?
Well, as Josh Tenenbaum and
others here who work on it
can tell you, it's
simply the challenge
of taking a bunch of
data and discovering
a formula that fits it well.
For example, Johannes
Kepler spent four years
looking at data like this
from measurements of Mars
until he discovered
that it was an ellipse.
Wouldn't it be nice if one
could do that automatically?
If the function is
linear, then, of course,
linear regression is so easy.
We do it all the
time, even if it's
a function of many variables.
But if it's an
arbitrary function,
this is known to be NP-hard
for the simple reason
that there are exponentially
many possible formulas that you
could do.
If you'd just make a list of
all formulas from the simplest
to gradually more
complicated, by the time you
get to even relatively
simple ones,
you might have waited a
million years or longer
than the age of the universe
to get to the Planck blackbody
formula.
So that's obviously
not good enough.
So in response to that, there's
been a lot of nice work.
Hod Lipson had the best symbolic
regression software to date
when we started our project.
They used a genetic algorithm.
But we decided to see
if physics could help.
So we had this vision that a
lot of the problems we actually
look at, even if the random
formula is NP-hard to solve,
[INAUDIBLE] special properties.
So [INAUDIBLE] for
example, has emphasized
that most formulas we care about
actually are compositional.
Even if it's a formula
of nine variables,
you can usually rewrite that
as a bunch of combinations
of functions of fewer variables,
often two variables or less.
We often have symmetries
as well, or separability,
where maybe the function
of eight variables
is just one function
of three times
one function of the other five.
We also tend to often be smooth
functions that neural networks
can do well on.
So how can we combine
these ingredients
to solve the problem better?
To build a test set, we took the
100 most famous or complicated
equations out of the
final lectures on physics,
stuff like you see here.
And for each one of
them, we made a big table
of numbers, which is the
starting point for the software
to deal with.
So you put one column
for each input variable,
give them random values.
And their last column is what
the formula evaluates to.
Your task is look at the
table, find the formula.
Linear regression
isn't good enough
because these are
not linear functions.
So you see Julius Caesar here.
That's because we tried the
divide-and-conquer strategy
again, using some of
these physics ideas
to see if we could break
from the simpler ones.
For example, we would
train a neural network
to fit the function
really well while still
having no clue as to what
the function actually was.
And then we do experiments
on the neural network
to test if it had any of
these simplifying properties.
For example, if
the neural network
discovered that actually the
only way it depends on column
two and three is by
the ratio of the two,
then we would replace
column two and column three
by one column, which
was the ratio of them,
and then restart the
software on a data
file with one column less.
Similarly, if it discovered
that they were separable,
then you could replace
this by two problems,
both which have less variables.
And this turned out
to be very helpful
because the basic reason
symbolic regression is so hard
is because of the curse
of dimensionality,
where problems get
exponentially worse
with the number of variables.
I'll skip over the details.
You can ask me later.
But if you look, for
example, at the function
here at the bottom, it's
the optics formula, right?
You can see this somewhat
messy expression is separable.
It's a function of theta--
sorry, a function of phi times
a function of delta and n.
It can train a neural network,
and then it can discover that.
And now it breaks it
apart into two problems.
And we have this
recursive loop, which
keeps going until the
individual parts get
so easy that a brute force
search or a polynomial fitting
or something like
that can zap it.
So as an example, look at this
problem here of nine variables.
it's Newton's law of gravity.
If it's faced with a table
with all these columns,
it does the dimension
analysis first,
so it can reduce the number
of variables a little bit.
And then it discovers, oh,
translational symmetry.
It only depends on C and B
in the upper-right corner
here by their difference.
So it can eliminate one column.
Now it has one variable less.
Then it discovers that it
only depends on E and F
by the distance, one
more simplification done.
And then it discovers
this is separable.
So it can factor this into
two mysteries, et cetera,
until it can solve
the whole thing.
And the results of this, we were
actually quite happy about this
because this really nice Eureka
software that I mentioned
from [INAUDIBLE] Lipson, in
addition to costing money,
it could only do the 71
out of the 100 mysteries
that we threw at it.
Our code that Silviu
did a heroic job on,
which you can find on GitHub for
free, solves all 100 of them.
Then we decided to see
if we could break it
by going back to our physics
books, like graduate textbooks,
and pulling out even more
complicated equations,
like these ones.
And it still solved--
so this time, the
Eureka software
failed on 17 out of 20.
It could only do three.
Whereas our code still
solved 18 out of 20.
And we have a new version of it
now which can do even better.
And this is an
example, the way I
think about it, actually, of
data compression, lossy data
compression more broadly.
I actually think of
all of physics as,
in a sense, being
lossy data compression
because we walk around in the
world, beautiful, sunny day.
And we almost immediately throw
away almost all the information
that comes into
our senses and keep
only the part that's
really useful for us
for predicting the future.
If you have a table of numbers,
like we gave Al Feynman,
and you run [INAUDIBLE]
minus 9 on it, for example,
just take up less space
on your hard drive, right?
If you were to have discovered
that the ninth column is
some function of
the other eight,
then you can compress
it even better, right?
And the more you can compress
things, then, in a sense,
the more useful your formula is.
And if you take this
information theory
point of view of
what we've done,
here is what came out of
AI Feynman's inner workings
in the process of tackling the
particular mystery of figuring
out the kinetic energy formula
and special relativity.
You can see it in the lower
right in its full glory.
And there's a trade
off when you do
data compression between how
much information you retain
about what's useful, the
opposite of being lossy,
this trade off between
inaccuracy on the y-axis,
on one hand, and
complexity on the x-axis.
So you can get
very low inaccuracy
by having the full formula.
The opposite extreme, it could
predict something super simple,
like always predict that
the kinetic energy is 0.
Now you get a huge
loss, a huge inaccuracy.
But the complexity's very low.
That's the upper-left
corner here, right?
Now, what do we
humans tend to value?
We tend to value things
which do pretty well on both
of these criteria, right?
Occam's razor says
simple is good.
But we also want accurate.
And you notice there's
one other point
on this frontier,
[INAUDIBLE] frontier, which
is in a corner, where it does
pretty well on both complexity
and accuracy.
Which one is that?
A big shout out here if
you have a suggestion.
AUDIENCE: MV squared over 2?
MAX TEGMARK: Yes!
And we were very
excited about this
because we had never taught this
AI anything about high school
physics and that particular
approximation to kinetic energy
that we humans found useful.
It, all on its own, decided
that m MV squared over 2
is a really useful
approximation for kinetic energy
just from looking at
the data, because it
can get a lot of accuracy
with much less complexity
than the full formula.
So I'm quite interested
in using this
as a tool not just for
discovering the exact laws,
but also for discovering really
useful approximate formulas
for things across science.
And thinking about
science's data compression
again more broadly, this
segues into a third project
I want to tell you about
just very, very briefly.
Suppose-- this is a paper
that the I also wrote
with Tailin Wu, very related
to what [INAUDIBLE] spoke about
in his recent CBMM talk.
Suppose I have a bunch
of cats and dogs here.
And your task is to
classify these pictures,
whether they're cats or dogs.
And suppose I tell
you that you have
to do some sort of
lossy data compression.
Instead of sending the whole
picture into the classifier,
you have to just do a
clustering and divide
all the pictures into groups,
say, three groups, for example.
And you can only tell
me whether the image
is in group 1, 2, or 3.
And based on that
integer, you now
have to predict whether
it's a cat or dog.
And now what's the best
grouping to do, right?
Should you just--
it's not so obvious.
And there's a lot of
interesting literature
about what's the optimal
number of groups to have
and what they
should be and so on.
So we were actually
very excited that we
were able to solve exactly this
problem for a special case,
whether it's binary
classification, like cats
versus dogs.
We ran examples also for MNIST
for two different digits.
Sevens and ones, they're easy to
confuse and for Fashion-MNIST.
And without getting into
detail, what we found
was that the trade off between
how simple things are-- it's
always simpler if you have
few groups or slower entropy
in your compressed data set.
So farther to the right,
in this case, is simple--
the trade off between that
and how much information
you retain about whether
it's a cat or dog.
In this case, you can see
[INAUDIBLE] curve as the one
we're talking about.
If you look at the
original images,
there's 0.7 bits of
mutual information, not 1,
because it's kind of hard.
But if you only do
two clusters, one
where the ones that
you guessed it's
a cat and the other
cluster where you guess
it's a dog, the mutual
information between your best
guess there and whether it is
a cat or dog is only 0.6 bits.
You can do a lot
better with 0.7.
And a fun physics link
here is that if you
look at these corners, we saw
in the plot I just showed you
before with the kinetic
energy, that the things that we
humans find most interesting
are in these corners, where
you do unusually well on
both simplicity and accuracy.
The corners here,
which also pop out,
they actually correspond
to phase transitions
in the machine
learning, where you
have these bifurcations going
from two classes to three
classes to four
classes and so on.
And a significant fraction
of all papers in physics
are about phase transitions.
And I would love to
chat with many of you
afterwards, if you have ideas,
because I just have a sense
that there's a lot
more fruitful work
to be done by linking
up phase transitions
and machine learning with stuff
that's been studied in physics.
The very last thing I
want to leave you with
is coming back to this
discovery of physics equations
from looking at moving things
and start by confessing
that I felt that even
though Tailin and I were
kind of excited about
the AI physics part,
we felt we kind of cheated
because the truth is
we had these pictures of
the moving dots, right?
But we didn't send in the image.
We sent them the x-
and y-coordinates.
And there was a lot
of human intelligence
that went into
figuring out that you
should measure the x- and
y-coordinates of the moving
dot.
Wouldn't it be much nicer
if you could skip that step
and just start
with the raw video?
Suppose you just had a video
of something moving like this
on some weird background.
Any compliments about the
artistry of this code.
It was Silviu-Marian
Udrescu who made it.
Wouldn't it be cool if you could
just send in the raw video,
and it would discover
that it should map this
and data compress this, map it
into some latent space, which
actually involves measuring
the x- and y-coordinate,
all by itself?
This might sound--
And then you could
study that latent space
and try to figure out what
the actual equations are.
This is actually a much harder
problem than you might think.
Because if you can solve it,
then with any kind of image,
then you will also be able
to automatically solve it.
The way you phrase the
problem, it doesn't even
matter that it's an image.
We just send in a vector of
10,000 numbers or whatever,
and it has to figure
out a way of mapping out
10,000-dimensional space
into 2-dimensional space.
It should also be
able to solve it
if you're looking, for
example, at the image
through a strange
distorting lens like this.
In which case, what
you want is not at all
to measure the x- and
y-coordinate of where it is,
right?
You would like the mapping that
the machine learning discovers
to undistort the
image so we actually
get back the actual useful
coordinates of where things
are.
This is what we humans
do every day when
you walk through the world
because the stereoscopic
projection means that
what we see around
us is actually
kind of distorted.
We don't just get
x, y, z, right?
So Silviu-Marian Udrescu
worked very hard on this.
We had a very
simple architecture.
We send in these video frames.
We have an auto encoder that
maps it into a latent space
and makes sure it
can map it back.
And then we have a time
evolution neural network
that tries to go
from the latent space
to the next step, [INAUDIBLE]
the last two to the next.
And the big challenge
is here we want
to bring in Occam again
with his razor and ask,
what kind of latent
space will give us
the simplest laws of physics for
how the latent space evolves?
So we thought a lot
about how can we
define simple in a
differentiable way
that we can train?
And we started thinking
about this guy, Einstein.
So if you have a mapping
that a neural network does
from its input to
its output, you
might use Einstein's
work on curved spaces
to say, maybe a space--
a map-- your neural network
is simpler if it doesn't curve
[INAUDIBLE] so much
if this mapping
has very small Riemann tensor
components, for instance.
And you can code this up.
But it's numerically
very painful
because you have to
take many derivatives--
many gradients, multiple
third derivatives of a
higher and so on.
And then we realized
that there's
a much simpler thing
you can actually do,
which is if you
just put a penalty
on the actual derivative
of the gradient,
so you're encouraging the
gradient of the neural network
to be constant, if the
gradient of the neural network
is constant, then, as you can
see from the second Einstein
equation here, the
curvature is always 0,
and it's nice and simple.
So we decided to try that first.
And remarkably, it
actually worked great.
In the beginning, we
had a lot of problems,
where it would discover
weird latent spaces that
looked like a cat, even
though the line was supposed
to be parallel and so on.
I could tell you more about
amusing reasons this happened.
But eventually,
Silviu persevered
and was able to discover
nice simple latent
spaces for everything, even
though we were distorting it
in some cases with this
weird lens so that original--
if it had just taken x- and
y-coordinates the image,
it would have gotten some
really weird latent space,
where the laws of motion
were super complicated.
If discovered instead
the undistorted motion.
And I can either stop
right now because I've
been going for 41 minutes,
or I can take 3 more minutes
and tell you a little bit about
how I feel this is exciting
also for physics, again.
Which do you prefer?
PRESENTER: Oh, for
another three minutes.
MAX TEGMARK: OK, I will.
I will do that.
So if I put on my
physicist hat again,
this stuff of latent
spaces, we used
to think in physics that there
really was Euclidean space out
there that we just
discovered, right?
But now those of you who
do awesome neuroscience
have discovered that
the organism will often
invent internal representations
and latent spaces and so on.
So you might wonder,
could it be that even
[INAUDIBLE] the physical space
is actually also a latent
space?
And I actually think that's
true because we learned
from Einstein that this whole
thing with inertial frames,
for example, that
you're supposed
to have the x-coordinate and
y-coordinate and the z-axis
all perpendicular to
each other, and it's not
supposed to be
accelerating and so,
that you don't have to do that.
In general relativity, Einstein
said, aw, forget about that.
You don't have to be
in an inertial frame.
You can have any coordinates
if you use general relativity.
Why do we still use this
more simple latent space,
where our axes are perpendicular
and things aren't--
and an object addressed
remains at rest?
I think it's because our brains
are choosing to make a latent
space where the laws of motion
are as simple as possible,
again.
We interpret it that way.
And I want to just show you.
Silviu and I were able
to automate that process,
in this case.
So we gave it five
different examples
with magnetism and
harmonic oscillator,
and a nonlinear cortical
oscillator, et cetera,
et cetera.
In all of these five
cases, even though we just
sent in video sequences,
it mapped things
into a two-dimensional latent
space, five different ones,
because these were five
different neural networks.
And if you look more closely,
you can see that some of them
are very squished
relative to the others
because the laws of
physics, in some cases,
would still work
out just fine if you
scaled the axes by
different amounts
or, in fact, did any affine
transformation where you just
take this two-dimensional space.
You take each vector and
multiply it by a 2-by-2 matrix
and add another vector.
So there are these six
degrees of freedom left.
And we were
wondering, could it be
that if you just insist on
using these degrees of freedom
you have, make the equations
look as simple as possible,
if it would actually discover
what we humans consider simple?
So Silviu mapped first find.
He took all the rocket
images from all the videos
and mapped them into all
of the five latent spaces,
figured out just
what affine mapping,
connected the
different spaces so he
could put them all together into
a single unified latent space.
And now we still have these six
parameters we can play with.
We can shift things sideways,
for example, put the origin
wherever you want.
On the right side--
or on the lower left-- on
the lower-right corner,
rather, you see the
equations of motion
that Silviu had put
in to start with.
And you can see
that some of them
don't care if you change the
origin, like the first one.
But the harmonic
oscillator, it cares.
Those equations
will be the simplest
if you put the origin
in such a place
where the center of the
harmonic oscillator was.
Then there's going to be
less terms in the equation.
And sure enough, if Silviu
tries all the origins and all
the different shifts and
plots how complicated all
the equations are
together, he finds
that there's a certain
shift that's the best.
And then you can start
rotating the space
with the best rotation.
In this case, there are
some of these formulas
which get much simpler when
you rotate by a certain angle.
And others, we pick that.
And then similarly, you can
do a little bit of shearing.
And at the end, it
discovers exactly what
we would consider the
simplest space, where,
in fact, the x-
and y-coordinates,
there's no relative
stretching, and equations
are beautifully simple.
And I suspect that this
is very much what's
going on in physics, actually.
We keep coming up with a
representation of the world
such that the description
becomes as simple as possible
for us.
Because if we have that internal
representation in our brains,
then that minimizes the
amount of computation
we have to do when we
try to predict the future
and figure out what are
the best actions to take.
So in summary, I've told you
that it's not just the case
that machine learning and AI
is helping physics enormously
in so many areas.
But I also feel that physics
and the science of intelligence
more broadly [INAUDIBLE]
really help machine
learning in various ways.
I've given you a
series of examples.
For example, by
combining Occam's razor,
divide and conquer,
and other ideas
for how you can go from
raw distorted video
to find the latent space
where the equations of motion
are so simple that you can use
AI Feynman to actually discover
the equations for them.
And then also showing
you how you can also
find which are the most
useful approximate equations
for fields where maybe an
approximate equation is
the best that you can hope for.
And I would like
to end by saying
that we would love to
collaborate with those
of you on the call here.
If you have any data
sets lying around on your
your own hard drives,
where you think
there might be some patterns
yet to be discovered,
it would be really, really fun
to see if any of these tools
could help discover them.
Thank you.
PRESENTER: Thank you, Max.
Great.
Let's give Max a great applause.
That was great.
[APPLAUSE]
