The following content is
provided under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue to offer
high quality educational
resources for free.
To make a donation or to view
additional materials from
hundreds of MIT courses,
visit MIT OpenCourseWare at
ocw.mit.edu.
So far we have learned about
partial derivatives and how to
use them to find minima and
maxima of functions of two
variables or several variables.
And now we are going to try to
study, in more detail,
how functions of several
variables behave,
how to compete their
variations.
How to estimate the variation
in arbitrary directions.
And so for that we are going to
need some more tools actually to
study this things.
More tools to study functions.
Today's topic is going to be
differentials.
And, just to motivate that,
let me remind you about one
trick that you probably know
from single variable calculus,
namely implicit
differentiation.
Let's say that you have a
function y equals f of x then
you would sometimes write dy
equals f prime of x times dx.
And then maybe you would -- We
use implicit differentiation to
actually relate infinitesimal
changes in y with infinitesimal
changes in x.
And one thing we can do with
that, for example,
is actually figure out the rate
of change dy by dx,
but also the reciprocal dx by
dy.
And so, for example,
let's say that we have y equals
inverse sin(x).
Then we can write x equals
sin(y).
And, from there,
we can actually find out what
is the derivative of this
function if we didn't know the
answer already by writing dx
equals cosine y dy.
That tells us that dy over dx
is going to be one over cosine
y.
And now cosine for relation to
sine is basically one over
square root of one minus x^2.
And that is how you find the
formula for the derivative of
the inverse sine function.
A formula that you probably
already knew,
but that is one way to derive
it.
Now we are going to use also
these kinds of notations,
dx, dy and so on,
but use them for functions of
several variables.
And, of course,
we will have to learn what the
rules of manipulation are and
what we can do with them.
The actual name of that is the
total differential,
as opposed to the partial
derivatives.
The total differential includes
all of the various causes that
can change -- Sorry.
All the contributions that can
cause the value of your function
f to change.
Namely, let's say that you have
a function maybe of three
variables, x,
y, z,
then you would write df equals
f sub x dx plus f sub y dy plus
f sub z dz.
Maybe, just to remind you of
the other notation,
partial f over partial x dx
plus partial f over partial y dy
plus partial f over partial z
dz.
Now, what is this object?
What are the things on either
side of this equality?
Well, they are called
differentials.
And they are not numbers,
they are not vectors,
they are not matrices,
they are a different kind of
object.
These things have their own
rules of manipulations,
and we have to learn what we
can do with them.
So how do we think about them?
First of all,
how do we not think about them?
Here is an important thing to
know.
Important.
df is not the same thing as
delta f.
That is meant to be a number.
It is going to be a number once
you have a small variation of x,
a small variation of y,
a small variation of z.
These are numbers.
Delta x, delta y and delta z
are actual numbers,
and this becomes a number.
This guy actually is not a
number.
You cannot give it a particular
value.
All you can do with a
differential is express it in
terms of other differentials.
In fact, this dx,
dy and dz, well,
they are mostly symbols out
there.
But if you want to think about
them, they are the differentials
of x, y and z.
In fact, you can think of these
differentials as placeholders
where you will put other things.
Of course, they represent,
you know, there is this idea of
changes in x,
y, z and f.
One way that one could explain
it, and I don't really like it,
is to say they represent
infinitesimal changes.
Another way to say it,
and I think that is probably
closer to the truth,
is that these things are
somehow placeholders to put
values and get a tangent
approximation.
For example,
if I do replace these symbols
by delta x, delta y and delta z
numbers then I will actually get
a numerical quantity.
And that will be an
approximation formula for delta.
It will be the linear
approximation,
a tangent plane approximation.
What we can do -- Well,
let me start first with maybe
something even before that.
The first thing that it does is
it can encode how changes in x,
y, z affect the value of f.
I would say that is the most
general answer to what is this
formula, what are these
differentials.
It is a relation between x,
y, z and f.
And this is a placeholder for
small variations,
delta x, delta y and delta z to
get an approximation formula.
Which is delta f is
approximately equal to fx delta
x fy delta y fz delta z.
It is getting cramped,
but I am sure you know what is
going on here.
And observe how this one is
actually equal while that one is
approximately equal.
So they are really not the same.
Another thing that the notation
suggests we can do,
and they claim we can do,
is divide everything by some
variable that everybody depends
on.
Say, for example,
that x, y and z actually depend
on some parameter t then they
will vary, at a certain rate,
dx over dt, dy over dt,
dz over dt.
And what the differential will
tell us then is the rate of
change of f as a function of t,
when you plug in these values
of x, y, z,
you will get df over dt by
dividing everything by dt in
here.
The first thing we can do is
divide by something like dt to
get infinitesimal rate of
change.
Well, let me just say rate of
change.
df over dt equals f sub x dx
over dt plus f sub y dy over dt
plus f sub z dz over dt.
And that corresponds to the
situation where x is a function
of t, y is a function of t and z
is a function of t.
That means you can plug in
these values into f to get,
well, the value of f will
depend on t,
and then you can find the rate
of change with t of a value of
f.
These are the basic rules.
And this is known as the chain
rule.
It is one instance of a chain
rule,
which tells you when you have a
function that depends on
something,
and that something in turn
depends on something else,
how to find the rate of change
of a function on the new
variable in terms of the
derivatives of a function and
also the dependence between the
various variables.
Any questions so far?
No.
OK.
A word of warming,
in particular,
about what I said up here.
It is kind of unfortunate,
but the textbook actually has a
serious mistake on that.
I mean they do have a couple of
formulas where they mix a d with
a delta, and I warn you not to
do that, please.
I mean there are d's and there
are delta's, and basically they
don't live in the same world.
They don't see each other.
The textbook is lying to you.
Let's see.
The first and the second
claims,
I don't really need to justify
because the first one is just
stating some general principle,
but I am not making a precise
mathematical claim.
The second one,
well, we know the approximation
formula already,
so I don't need to justify it
for you.
But, on the other hand,
this formula here,
I mean, you probably have a
right to expect some reason for
why this works.
Why is this valid?
After all, I first told you we
have these new mysterious
objects.
And then I am telling you we
can do that, but I kind of
pulled it out of my hat.
I mean I don't have a hat.
Why is this valid?
How can I get to this?
Here is a first attempt of
justifying how to get there.
Let's see.
Well, we said df is f sub x dx
plus f sub y dy plus f sub z dz.
But we know if x is a function
of t then dx is x prime of t dt,
dy is y prime of t dt,
dz is z prime of t dt.
If we plug these into that
formula, we will get that df is
f sub x times x prime t dt plus
f sub y y prime of t dt plus f
sub z z prime of t dt.
And now I have a relation
between df and dt.
See, I got df equals sometimes
times dt.
That means the rate of change
of f with respect to t should be
that coefficient.
If I divide by dt then I get
the chain rule.
That kind of works,
but that shouldn't be
completely satisfactory.
Let's say that you are a true
skeptic and you don't believe in
differentials yet then it is
maybe not very good that I
actually used more of these
differential notations in
deriving the answer.
That is actually not how it is
proved.
The way in which you prove the
chain rule is not this way
because we shouldn't have too
much trust in differentials just
yet.
I mean at the end of today's
lecture, yes,
probably we should believe in
them,
but so far we should be a
little bit reluctant to believe
these kind of strange objects
telling us weird things.
Here is a better way to think
about it.
One thing that we have trust in
so far are approximation
formulas.
We should have trust in them.
We should believe that if we
change x a little bit,
if we change y a little bit
then we are actually going to
get a change in f that is
approximately given by these
guys.
And this is true for any
changes in x,
y, z,
but in particular let's look at
the changes that we get if we
just take these formulas as
function of time and change time
a little bit by delta t.
We will actually use the
changes in x,
y, z in a small time delta t.
Let's divide everybody by delta
t.
Here I am just dividing numbers
so I am not actually playing any
tricks on you.
I mean we don't really know
what it means to divide
differentials,
but dividing numbers is
something we know.
And now, if I take delta t very
small, this guy tends to the
derivative, df over dt.
Remember, the definition of df
over dt is the limit of this
ratio when the time interval
delta t tends to zero.
That means if I choose smaller
and smaller values of delta t
then these ratios of numbers
will actually tend to some
value,
and that value is the
derivative.
Similarly, here delta x over
delta t, when delta t is really
small, will tend to the
derivative dx/dt.
And similarly for the others.
That means, in particular,
we take the limit as delta t
tends to zero and we get df over
dt on one side and on the other
side we get f sub x dx over dt
plus f sub y dy over dt plus f
sub z dz over dt.
And the approximation becomes
better and better.
Remember when we write
approximately equal that means
it is not quite the same,
but if we take smaller
variations then actually we will
end up with values that are
closer and closer.
When we take the limit,
as delta t tends to zero,
eventually we get an equality.
I mean mathematicians have more
complicated words to justify
this statement.
I will spare them for now,
and you will see them when you
take analysis if you go in that
direction.
Any questions so far?
No.
OK.
Let's check this with an
example.
Let's say that we really don't
have any faith in these things
so let's try to do it.
Let's say I give you a function
that is x ^2 y z.
And let's say that maybe x will
be t, y will be e^t and z will
be sin(t).
What does the chain rule say?
Well, the chain rule tells us
that dw/dt is,
we start with partial w over
partial x, well,
what is that?
That is 2xy,
and maybe I should point out
that this is w sub x,
times dx over dt plus -- Well,
w sub y is x squared times dy
over dt plus w sub z,
which is going to be just one,
dz over dt.
And so now let's plug in the
actual values of these things.
x is t and y is e^t,
so that will be 2t e to the t,
dx over dt is one plus x
squared is t squared,
dy over dt is e over t,
plus dz over dt is cosine t.
At the end of calculation we
get 2t e to the t plus t squared
e to the t plus cosine t.
That is what the chain rule
tells us.
How else could we find that?
Well, we could just plug in
values of x, y and z,
x plus w is a function of t,
and take its derivative.
Let's do that just for
verification.
It should be exactly the same
answer.
And, in fact,
in this case,
the two calculations are
roughly equal in complication.
But say that your function of
x, y, z was much more
complicated than that,
or maybe you actually didn't
know a formula for it,
you only knew its partial
derivatives,
then you would need to use the
chain rule.
So, sometimes plugging in
values is easier but not always.
Let's just check quickly.
The other method would be to
substitute.
W as a function of t.
Remember w was x^2y z.
x was t, so you get t squared,
y is e to the t,
plus z was sine t.
dw over dt, we know how to take
the derivative using single
variable calculus.
Well, we should know.
If we don't know then we should
take a look at 18.01 again.
The product rule that will be
derivative of t squared is 2t
times e to the t plus t squared
time the derivative of e to the
t is e to the t plus cosine t.
And that is the same answer as
over there.
I ended up writing,
you know, maybe I wrote
slightly more here,
but actually the amount of
calculations really was pretty
much the same.
Any questions about that?
Yes?
What kind of object is w?
Well, you can think of w as
just another variable that is
given as a function of x,
y and z, for example.
You would have a function of x,
y, z defined by this formula,
and I call it w.
I call its value w so that I
can substitute t instead of x,
y, z.
Well, let's think of w as a
function of three variables.
And then, when I plug in the
dependents of these three
variables on t,
then it becomes just a function
of t.
I mean, really,
my w here is pretty much what I
called f before.
There is no major difference
between the two.
Any other questions?
No.
OK.
Let's see.
Here is an application of what
we have seen.
Let's say that you want to
understand actually all these
rules about taking derivatives
in single variable calculus.
What I showed you at the
beginning, and then erased,
basically justifies how to take
the derivative of a reciprocal
function.
And for that you didn't need
multivariable calculus.
But let's try to justify the
product rule,
for example,
for the derivative.
An application of this actually
is to justify the product and
quotient rules.
Let's think,
for example,
of a function of two variables,
u and v, that is just the
product uv.
And let's say that u and v are
actually functions of one
variable t.
Then, well, d of uv over dt is
given by the chain rule applied
to f.
This is df over dt.
So df over dt should be f sub q
du over dt plus f sub v plus dv
over dt.
But now what is the partial of
f with respect to u?
It is v.
That is v du over dt.
And partial of f with respect
to v is going to be just u,
dv over dt.
So you get back the usual
product rule.
That is a slightly complicated
way of deriving it,
but that is a valid way of
understanding how to take the
derivative of a product by
thinking of the product first as
a function of variables,
which are u and v.
And then say,
oh, but u and v were actually
functions of a variable t.
And then you do the
differentiation in two stages
using the chain rule.
Similarly, you can do the
quotient rule just for practice.
If I give you the function g
equals u of v.
Right now I am thinking of it
as a function of two variables,
u and v.
U and v themselves are actually
going to be functions of t.
Then, well, dg over dt is going
to be partial g,
partial u.
How much is that?
How much is partial g,
partial u?
One over v times du over dt
plus -- Well,
next we need to have partial g
over partial v.
Well, what is the derivative of
this with respect to v?
Here we need to know how to
differentiate the inverse.
It is minus u over v squared
times dv over dt.
And that is actually the usual
quotient rule just written in a
slightly different way.
I mean, just in case you really
want to see it,
if you clear denominators for v
squared then you will see
basically u prime times v minus
v prime times u.
Now let's go to something even
more crazy.
I claim we can do chain rules
with more variables.
Let's say that I have a
quantity.
Let's call it w for now.
Let's say I have quantity w as
a function of say variables x
and y.
And so in the previous setup x
and y depended on some
parameters t.
But, actually,
let's now look at the case
where x and y themselves are
functions of several variables.
Let's say of two more variables.
Let's call them u and v.
I am going to stay with these
abstract letters,
but if it bothers you,
if it sounds completely
unmotivated think about it maybe
in terms of something you might
now.
Say, polar coordinates.
Let's say that I have a
function but is defined in terms
of the polar coordinate
variables on theta.
And then I know I want to
switch to usual coordinates x
and y.
Or, the other way around,
I have a function of x and y
and I want to express it in
terms of the polar coordinates r
and theta.
Then I would want to know maybe
how the derivatives,
with respect to the various
sets of variables,
related to each other.
One way I could do it is,
of course,
to say now if I plug the
formula for x and the formula
for y into the formula for f
then w becomes a function of u
and v,
and it can try to take partial
derivatives.
If I have explicit formulas,
well, that could work.
But maybe the formulas are
complicated.
Typically, if I switch between
rectangular and polar
coordinates,
there might be inverse trig,
there might be maybe arctangent
to express the polar angle in
terms of x and y.
And when I don't really want to
actually substitute arctangents
everywhere, maybe I would rather
deal with the derivatives.
How do I do that?
The question is what are
partial w over partial u and
partial w over partial v in
terms of, let's see,
what do we need to know to
understand that?
Well, probably we should know
how w depends on x and y.
If we don't know that then we
are probably toast.
Partial w over partial x,
partial w over partial y should
be required.
What else should we know?
Well, it would probably help to
know how x and y depend on u and
v.
If we don't know that then we
don't really know how to do it.
We need also x sub u,
x sub v, y sub u,
y sub v.
We have a lot of partials in
there.
Well, let's see how we can do
that.
Let's start by writing dw.
We know that dw is partial f,
well, I don't know why I have
two names, w and f.
I mean w and f are really the
same thing here,
but let's say f sub x dx plus f
sub y dy.
So far that is our new friend,
the differential.
Now what do we want to do with
it?
Well, we would like to get rid
of dx and dy because we like to
express things in terms of,
you know, the question we are
asking ourselves is let's say
that I change u a little bit,
how does w change?
Of course, what happens,
if I change u a little bit,
is y and y will change.
How do they change?
Well, that is given to me by
the differential.
dx is going to be,
well, I can use the
differential again.
Well, x is a function of u and
v.
That will be x sub u times du
plus x sub v times dv.
That is, again,
taking the differential of a
function of two variables.
Does that make sense?
And then we have the other guy,
f sub y times,
what is dy?
Well, similarly dy is y sub u
du plus y sub v dv.
And now we have a relation
between dw and du and dv.
We are expressing how w reacts
to changes in u and v,
which was our goal.
Now, let's actually collect
terms so that we see it a bit
better.
It is going to be f sub x times
x sub u times f sub y times y
sub u du plus f sub x,
x sub v plus f sub y y sub v
dv.
Now we have dw equals something
du plus something dv.
Well, the coefficient here has
to be partial f over partial u.
What else could it be?
That's the rate of change of w
with respect to u if I forget
what happens when I change v.
That is the definition of a
partial.
Similarly, this one has to be
partial f over partial v.
That is because it is the rate
of change with respect to v,
if I keep u constant,
so that these guys are
completely ignored.
Now you see how the total
differential accounts for,
somehow, all the partial
derivatives that come as
coefficients of the individual
variables in these expressions.
Let me maybe rewrite these
formulas in a more visible way
and then re-explain them to you.
Here is the chain rule for this
situation, with two intermediate
variables and two variables that
you express these in terms of.
In our setting,
we get partial f over partial u
equals partial f over partial x
time partial x over partial u
plus partial f over partial y
times partial y over partial u.
And the other one,
the same thing with v instead
of u,
partial f over partial x times
partial x over partial v plus
partial f over partial u partial
y over partial v.
I have to explain various
things about these formulas
because they look complicated.
And, actually,
they are not that complicated.
A couple of things to know.
The first thing,
how do we remember a formula
like that?
Well, that is easy.
We want to know how f depends
on u.
Well, what does f depend on?
It depends on x and y.
So we will put partial f over
partial x and partial f over
partial y.
Now, x and y, why are they here?
Well, they are here because
they actually depend on u as
well.
How does x depend on u?
Well, the answer is partial x
over partial u.
How does y depend on u?
The answer is partial y over
partial u.
See, the structure of this
formula is simple.
To find the partial of f with
respect to some new variable you
use the partials with respect to
the variables that f was
initially defined in terms of x
and y.
And you multiply them by the
partials of x and y in terms of
the new variable that you want
to look at, v here,
and you sum these things
together.
That is the structure of the
formula.
Why does it work?
Well, let me explain it to you
in a slightly different
language.
This asks us how does f change
if I change u a little bit?
Well, why would f change if u
changes a little bit?
Well, it would change because f
actually depends on x and y and
x and y depend on u.
If I change u,
how quickly does x change?
Well, the answer is partial x
over partial u.
And now, if I change x at this
rate, how does that have to
change?
Well, the answer is partial f
over partial x times this guy.
Well, at the same time,
y is also changing.
How fast is y changing if I
change u?
Well, at the rate of partial y
over partial u.
But now if I change this how
does f change?
Well, the rate of change is
partial f over partial y.
The product is the effect of
how you change it,
changing u, and therefore
changing f.
Now, what happens in real life,
if I change u a little bit?
Well, both x and y change at
the same time.
So how does f change?
Well, it is the sum of the two
effects.
Does that make sense?
Good.
Of course, if f depends on more
variables then you just have
more terms in here.
OK.
Here is another thing that may
be a little bit confusing.
What is tempting?
Well, what is tempting here
would be to simplify these
formulas by removing these
partial x's.
Let's simplify by partial x.
Let's simplify by partial y.
We get partial f over partial u
equals partial f over partial u
plus partial f over partial u.
Something is not working
properly.
Why doesn't it work?
The answer is precisely because
these are partial derivatives.
These are not total derivatives.
And so you cannot simplify them
in that way.
And that is actually the reason
why we use this curly d rather
than a straight d.
It is to remind us,
beware, there are these
simplifications that we can do
with straight d's that are not
legal here.
Somehow, when you have a
partial derivative,
you must resist the urge of
simplifying things.
No simplifications in here.
That is the simplest formula
you can get.
Any questions at this point?
No.
Yes?
When would you use this and
what does it describe?
Well, it is basically when you
have a function given in terms
of a certain set of variables
because maybe there is a simply
expression in terms of those
variables.
But ultimately what you care
about is not those variables,
z and y, but another set of
variables, here u and v.
So x and y are giving you a
nice formula for f,
but actually the relevant
variables for your problem are u
and v.
And you know x and y are
related to u and v.
So, of course,
what you could do is plug the
formulas the way that we did
substituting.
But maybe that will give you
very complicated expressions.
And maybe it is actually easier
to just work with the derivates.
The important claim here is
basically we don't need to know
the actual formulas.
All we need to know are the
rate of changes.
If we know all these rates of
change then we know how to take
these derivatives without
actually having to plug in
values.
Yes?
Yes, you could certain do the
same things in terms of t.
If x and y were functions of t
instead of being functions of u
and v then it would be the same
thing.
And you would have the same
formulas that I had,
well, over there I still have
it.
Why does that one have straight
d's?
Well, the answer is I could put
curly d's if I wanted,
but I end up with a function of
a single variable.
If you have a single variable
then the partial,
with respect to that variable,
is the same thing as the usual
derivative.
We don't actually need to worry
about curly in that case.
But that one is indeed special
case of this one where instead
of x and y depending on two
variables, u and v,
they depend on a single
variable t.
Now, of course,
you can call variables any name
you want.
It doesn't matter.
This is just a slight
generalization of that.
Well, not quite because here I
also had a z.
See, I am trying to just
confuse you by giving you
functions that depend on various
numbers of variables.
If you have a function of 30
variables, things work the same
way, just longer,
and you are going to run out of
letters in the alphabet before
the end.
Any other questions?
No.
What?
Yes?
If u and v themselves depended
on another variable then you
would continue with your chain
rules.
Maybe you would know to express
partial x over partial u in
terms using that chain rule.
Sorry.
If u and v are dependent on yet
another variable then you could
get the derivative with respect
to that using first the chain
rule to pass from u v to that
new variable,
and then you would plug in
these formulas for partials of f
with respect to u and v.
In fact, if you have several
substitutions to do,
you can always arrange to use
one chain rule at a time.
You just have to do them in
sequence.
That's why we don't actually
learn that, but you can just do
it be repeating the process.
I mean, probably at that stage,
the easiest to not get confused
actually is to manipulate
differentials because that is
probably easier.
Yes?
Curly f does not exist.
That's easy.
Curly f makes no sense by
itself.
It doesn't exist alone.
What exists is only curly df
over curly d some variable.
And then that accounts only for
the rate of change with respect
to that variable leaving the
others fixed,
while straight df is somehow a
total variation of f.
It accounts for all of the
partial derivatives and their
combined effects.
OK. Any more questions? No.
Let me just finish up very
quickly by telling you again one
example where completely you
might want to do this.
You have a function that you
want to switch between
rectangular and polar
coordinates.
To make things a little bit
concrete.
If you have polar coordinates
that means in the plane,
instead of using x and y,
you will use coordinates r,
distance to the origin,
and theta, the angles from the
x-axis.
The change of variables for
that is x equals r cosine theta
and y equals r sine theta.
And so that means if you have a
function f that depends on x and
y, in fact, you can plug these
in as a function of r and theta.
Then you can ask yourself,
well, what is partial f over
partial r?
And that is going to be,
well, you want to take partial
f over partial x times partial x
partial r plus partial f over
partial y times partial y over
partial r.
That will end up being actually
f sub x times cosine theta plus
f sub y times sine theta.
And you can do the same thing
to find partial f,
partial theta.
And so you can express
derivatives either in terms of
x, y or in terms of r and theta
with simple relations between
them.
And the one last thing I should
say.
On Thursday we will learn about
more tricks we can play with
variations of functions.
And one that is important,
because you need to know it
actually to do the p-set,
is the gradient vector.
The gradient vector is simply a
vector.
You use this downward pointing
triangle as the notation for the
gradient.
It is simply is a vector whose
components are the partial
derivatives of a function.
I mean, in a way,
you can think of a differential
as a way to package partial
derivatives together into some
weird object.
Well, the gradient is also a
way to package partials
together.
We will see on Thursday what it
is good for, but some of the
problems on the p-set use it.
