Hello Internet! This is Oscar Veliz again
this time with a video on one of the
more confusing topics in numerical
analysis which is convergence order I'll
define it and then go through some
examples with you before showing you how
this relates to speed
we'll also compare this to Big O
If you've seen other videos on this channel
you know that I bring up order a lot
"Now about the order a fixed point iteration"
"does have a linear order the bisection
always has a fixed constant multiplier
of 1/2" "Newton's method we know has a
quadratic convergence but so does
Steffensen's method" "in that order of
convergence Wegstein writes that this
is actually quadratic convergence but he
doesn't actually prove it" "let's talk
about the order of Muller's method"
"The order of Durand-Kerner is quadratic" "it
does require that you have d derivatives
in order to be able to use it but it
does give you d+1 convergence order"
and many other videos let's start with
error or in this case the absolute error
meaning how far where current guess is
from our actual solution X* here's
our equation for error given that our
next value for our error is some
constant M times our previous error
raised to an alpha that alpha is our
order generally the higher the alpha the
fewer iterations you'll need let's take
that error equation and step it back one
iteration using these two equations we
can divide them and eliminate our M term
then take the log of both sides from
here we can use properties of logs to
extract that exponent then solve for
alpha giving us this equation
let's try to compute alpha for bisection
using the function x squared minus X
minus one we'll start from the interval
[1.618, 2] and run it
for a few times afterwards lets compute
the error after every step then using our
alpha equation use the first three
errors to come up with our first alpha
of about one then the second set of
three errors to come with the next alpha
which is also about one then we'll fill
in the table with the rest of our alphas
once we have alpha we can find M
starting from our error equation we
simply solve for M then using a pair of
errors and its corresponding alpha plug
everything in to give us an M of about
0.5 look at the next pair of errors
along with its corresponding alpha this also
gives us an M of about 0.5 indeed
bisection has an alpha of 1 and a value
for M of 0.5 this makes sense because
we're reducing the error by half every
time
let's try to compute the alpha for
another root finding method in this case
false position we'll use the same example
from earlier of x squared minus x minus
one equal to zero but now I need a
larger interval because this one
converges much more quickly from our
interval [0,2] we perform a few
iterations a false position method then we can
compute our errors and afterwards compute our
alpha
notice again that alpha in this
case is about one meaning this is
converging linearly indeed for false
position
it's a linearly convergent function and
in this example our M is about .15 note though that M is going to change
depending on the function and root
finding method
let's do one last example
of computing alpha in this case Newton's
method using our same example function
familiar let's start from the value of
two this converges relatively quickly as
you can tell
afterwards you can compute our errors and
then compute our value for alpha which
is going from anywhere between about 1.9
to 2 and indeed newton's method is normally
quadratically convergent meaning alphas
2 and this example M is about .4
but note that that's going to change
depending on a few factors
another factor that can impact your speed other
than alpha is the amount of function
calls your method has for example secant
method has an alpha about 1.618 compared
to steffensen's method which has an
alpha about 2 both of these methods need
two function calls the difference is that
secant method only needs the two
function calls at the beginning afterwards
you can save the value from the last
iteration's function call with
steffensen's method we need two new
calls each iteration which can indeed
take longer
if we apply both of these
methods to the function X cubed minus x
squared minus X minus 1 they both take
six iterations to converge the
difference is it that steffensen's method
takes longer
let's evaluate the function of x^8 + 15x^4 - 16
using these root finding methods with
increasing alphas here the number
of iterations it takes to find the root
starting with about 45 with bisection
and then slowly increasing in speed
lowering the number of iterations until
about 2 where it plateaus let's then look
at the runtimes for each of these
methods with bisection being by far our
slowest and then speeding up as we
increased alpha until that plateau mark
we're doing two iterations that are more
complicated didn't save us on runtime
we also are not including the amount of
time needed to find the derivatives
which if they are not given to you needs
to be included as well
a quick note on Newton, Halley, and
householder each of them require
multiple function calls to F F' and
so on and in the last example I could
have a symbolically to remove those
function calls from the methods for
example Newton's method look like this
Halley look like this and so on
so the last example didn't include
function calls
if we look at Newton, Halley, and
Householder more generally these are
the number of additions and subtractions
each iteration needs as well as a number
of multiplications and divisions raising
to powers and function calls this
doesn't quite tell us the whole story
though
let's multiply these against a typical
number of iterations from our previous
example here you'll notice that they
have essentially the same number of
function calls this though is still kind
of misleading let's actually break it up
now our table looks something like this
if our function was something like
x^3-x^2-x-1
then those higher order function calls
get smaller and smaller so you might
prefer using something like a higher or
a function like Householder-3 if our
function with something like sin(cos(e^x)
then needing those higher
derivatives starts look much more
complicated and in this case you might
prefer something low order even linear
but how fast is linear if we take our
error formula and use for example
bisection we know that alpha is 1 and M is
1/2 plug those into our error formula we
now have this of our next error being
our previous error halved
so if-how long would it take us to find
a b-a in absolute value that's less
than some tolerance epsilon we'll just
assume that b-a is 2 and our
ending tolerance is 10 to the minus 12
with b-a equal to 2 that means
the most our error could be is one our
next error would be 1/2 and then 1/2
divided by 2 and so on what we're
really looking for in this case is 1
over 2^n giving us our tolerance
epsilon
doing a bit of math we can come up with this
equation then if we take the logs to both
sides we can isolate n therefore log
base 2 of 1 over epsilon gives us a
value for n in this case our method
would take about 40 iterations
using the same approach let's figure out
how fast quadratic actually is using our
error formula we'll use Newton's method
as an example with an alpha of two you
don't know what M is but that's not
going to matter a whole lot and you'll
see why this gives us our new error
formula and then the question we need to
answer is how long would it take for our
error to get smaller than some epsilon
we do need to assume that this is
converging otherwise this could take
literally forever and that our first
error is gonna be greater than one the
ending epsilon we'll use is something
like 10 to the minus 12 for the sake of
argument let's just say that our first
error is 2 this means our next error
would be M times 2 squared or M times 4
we made the assumption that our method
was converging though this would mean
that our next error should be smaller
than our last one which would make M
less than 1/2 in order for this to be
true let's use M of a third giving us our
next error at one point three repeating
let's then use that same 1/3 times 4/3
squared to give us our next error of
16 27ths and so on and so forth
after some time though we'll need to
change gears because our error squared
is got a matter a lot more than our
value for M here's what I mean
at some point we'll get to an error about .1
and for illustrative purposes I'll
just make M equal to the value of one
plug those into our error formula we'll
say 1 times .1 squared giving us
.01 our next error would be 1
times .01 squared giving us
.0001 plug that in we get 10 to the
minus 8 our next error be 10 to the
minus 16 giving us about eight
iterations total when you're close our
error term is really what counts let's
just say 0.1 squared raised to an n to
give us our tolerance of 0.1 to the 12th
essentially what we're saying is 2^n
equal to 12 using logs this says we
need about 4 iterations once we're close
it could also be helpful to view order
as the amount of accuracy in terms of
digits you're gaining each iteration for
example with bisection all you're doing
is reducing the error by half you're
gaining about one or fewer digits of
accuracy each time you run with Newton
you're doubling the amount of digits
you're gaining
Halley you're tripling Householder-3
you're quadrupling and so forth with
secant method an order about 1.618
you're tripling about every other
iteration you might also expect that
when you're close and using a root
finding method with an order greater
than one it should take log base alpha
of D iterations to converge where D is
the number of digits of tolerance which
you can write as log of 1 over epsilon
let's go back to this example from
earlier which indeed did take four
iterations but now let's start further
away starting from about 10 it takes
Newton's method a lot longer to converge
on root if we compute the errors after
every iteration we can then compute the
orders alpha notice that Newton's method
starts relatively linearly and then
speeds up close to quadratic as we get
closer to the root we can also compute
the M terms and note that these are not
constant what I really want you to know
from this example the importance of
picking a good starting point for root
finding methods there is another method
of comparing algorithms commonly used in
computer science known as Big O we talk
about order this is a measure of how
much we're reducing the error by every
iteration so for example linear order is
slower than 1.618
is slower than quadratic and so forth
with Big O it is a measure of algorithmic
time complexity from some input size n
so for example Big O of a constant is
faster than Big O of log n which is
faster than linear which is faster than
quadratic and so forth it is a measure
of asymptotic growth rate so for example
Big O of 3n^2 + 2n + log(n) is Big O of n^2 since we ignore
multiplications by constants and n^2 is the largest term
with bisection we can closely compare it to
binary search which takes log base 2 of
n iterations where n is the number of
terms in your sorted array or a fully
balanced binary search tree giving us a
Big O of log n bisection is something
similar it takes log base 2 of b-a over epsilon iterations where b-a
is interval size or b-a over
epsilon is the number of terms this
means that bisection has a Big O of log
of b-a over epsilon
what about Newton's method
Newton's method takes log base 2 of log of one of epsilon iterations
when you're close
which you can rewrite as log base 2 of d
where d is a number of digits of
accuracy in Big O you would say this is
log of d simply because log in Big O is
always assumed to be base 2 Halley's
method to something similar it takes log
base 3 of d iterations when you're close
giving us log of d now you might be
wondering what happened to the base 3
that's because a change of base in logs is
the same as multiplying by a constant so
that gets ignored this means that
Newton's method and Halley's method have
the same Big O for something like
Dekker's method which combines bisection
and secant method it has the same Big O
as bisection even though it runs fast a
secant method
this makes Big O less useful for
comparing these kinds of algorithms
we've covered a lot in this video and I'd
like to leave you with some key points
you can use three errors to compute
alpha and then afterwards compute M
keep in mind that your first few
iterations are about getting close and
that function calls impact your speed
also higher alphas don't always save you
time as shown in the earlier example
keep in mind you won't get your full
alpha if you're too far away from your
roots other things that could keep you
from getting that full speed are things
like condition of the polynomial and if
there's multiplicity involved those are
topics for another video also that Big O
is less useful for comparing these
conceived algorithms and now you know
why I have lots of example code for all
these root finding methods on github
as always thank you for watching I'll
leave an end card to my subscriber
milestone video showing how you can help
support me on github if you have other
questions about order of convergence please
let me know in the comments below
thank you
