INSTRUCTOR: In this
video, we're going
to see a number of
optimization problems, which
are intrinsically nonlinear,
but can be reformulated
as linear programming problems.
And this will always
be due to the presence
of piecewise linear
convex functions.
The first thing we
want to do is define
what is a piecewise
linear convex function.
So this is a function
of this form--
f of x is the maximum for i that
ranges from 1 to some number
m of a number of affine
linear functions,
so of the form ci transpose
x plus a constant di.
To visualize a piecewise
linear convex function,
we have an example below
to be able to visualize it.
In this example, x
has only dimension 1
while, in general,
of course, x is just
an n-dimensional vector.
So what do we have
in this example?
We have exactly three
affine linear functions.
The first one is c1 x plus d1.
This, again, is c2 x plus d2.
And the third is c3 x plus d3.
So these are the three
functions inside the max.
So we're taking now the max.
And so, for each point x, the
value of the function f of x
is given by the highest among
the three corresponding lines.
So, for example, for this
point here along the x-axis,
we have one function
gives us this value.
One gives us this value.
And one gives me this value.
And so, since we
have the max, we
are going to choose the largest.
If we do this for every
x, we are selecting always
the affine linear function
with the largest value.
The most well known example
of a piecewise linear convex
function is the absolute value.
In fact, the absolute
value of x can also
be written as the
maximum of x and minus x.
In this case, as you can see,
this is the function minus x.
This one is the function x.
And the maximum gives us
exactly the absolute value.
Great, so now let's think
about piecewise linear convex
functions inside linear
programming problems.
So let's consider a
constraint of this form--
a piecewise linear
convex function less than
or equal to a number h.
Of course, this is
not a linear function.
Therefore, we can't directly
use it in our linear programming
problem.
So the setting you
should be thinking off
is that you already have a
linear programming problem
with a bunch of
linear inequalities.
And then you have these
additional constraints.
Let's say you have a solver
to solve linear programming
problem, and you can't just use
the solver with this inequality
because it's going
to return an error,
telling you that
this is not linear.
However, there is an easy way
to write this constraint using
only linearity inequalities.
In fact, it is easy to
check that the maximum
of a number of functions is less
than or equal to h if and only
if each function is, in fact,
less than or equal to h.
Therefore, instead of writing
this piecewise linear convex
function, we can write all
the m linear inequalities
as ci x plus di less
than or equal to h.
Let's see an example.
Here we're minimizing
a linear function
is subject to non-negativity
constraints on the variables.
And then we have this piecewise
linear convex constraint.
This is not a linear
programming problem,
but we can transform it into
a linear programming problem.
So everything we have to do
is replace this inequality
with two inequalities,
x1 plus 2 x2 less than
or equal to 2 and 2
x1 plus x2 less than
or equal to 2, which is
exactly what we've done here.
Formally, we should prove
that these two optimization
problems are equivalent.
We know how to do that.
But, in this case,
it's really trivial
because we know
that the feasible
region is exactly the same
among the two problems.
And this immediately
implies that they
are equivalent because they have
the same objective function.
So we just understood how to
write, in a linear programming
problem, inequalities of the
form piecewise linear convex
function less than
or equal to h.
Now what if, instead, you
have a similarity inequality,
but with greater
than or equal to h?
This is a good exercise for you.
Think about it.
Are you able to write
down this inequality
in a linear programming problem?
So we've discussed how to deal
with piecewise linear convex
functions among the constraints.
What if, instead, we have
piecewise linear convex
functions in the
objective function,
for example, a
problem of this form?
You're minimizing a
piecewise linear convex
function subject to a number
of linear constraints.
Also, in this case, we
will be able to write down
this problem as a linear
programming problem,
but it will be just a
little bit more tricky.
So the function that
we're minimizing
is this piecewise linear convex
function-- max ci x plus di.
Now the whole idea
that is needed
to make this transformation into
a linear programming problem
is to realize that
this maximum is
equal to the smallest number z,
which is larger than the max.
This is really obvious
because what we're saying
is that z must be greater
than or equal to the max,
but we're picking
the smallest such z.
And this will be
the z that satisfies
this inequality at equality.
So, of course, we will have
z equal to the max, which
is exactly what we want.
But we have seen an
inequality of this form
before when we were
discussing constraints.
And we know how to rewrite
with only linear inequality
constraints piecewise linear
convex function less than
or equal to z.
How can we do that?
We can replace it with z
greater than or equal to ci
transpose x plus di for every
i that ranges from 1 to m.
So we've just understood
that our max is exactly
the smallest z that satisfies
all these constraints.
So we replace the
max in the objective
with a z, which is
to be minimized,
and we add among the constraints
all the m linear constraints
that we have just introduced.
Clearly, what we have obtained
is a linear programming
problem.
What you can check is that
this linear programming
problem is, in fact, equivalent
to the original problem.
And I'm going to leave this
as an exercise for you.
While working out
this exercise, you
will also realize
that it is really
fundamental for this
argument to go through
that, in the
original problem, we
are minimizing this piecewise
linear convex function,
as we are, because,
if you replace
that minimization
with a maximization,
then things are quite different.
Let's see an example.
Here we are minimizing our
piecewise linear convex
function, which
is the max of two
linear functions subject to a
number of linear constraints.
So the idea, once again, is
to replace the max with z,
and z should be greater than
or equal to both the functions
in the max--
so here they are, z greater
than or equal to 2 x1 plus 4 x2
and z greater than or
equal to 2 x1 plus x2--
while the outer constraints
remain unchanged.
Next, we're going to see
optimization problem involving
absolute values.
Let's have a look at
an optimization problem
of this form.
So we're minimizing the sum
of ci absolute values of xi
subject to some
linear constraints,
Ax greater than or equal to b.
And we assume that all the ci's
are greater than or equal to 0.
Now it can be shown
that this is just
a special case of what we
have just seen, namely,
that the objective function, the
sum of ci absolute value of xi,
is, in fact, a piecewise
linear convex function.
In order to show
this, you would need
to write it down as
the max of the number
of affine linear functions.
And, once you've
done that, then you
can write this
optimization problem
as a linear programming problem,
as discussed previously.
However, this
requires some work,
and there's a much quicker way
to deal with this optimization
problem.
Essentially, the trick is
to apply our previous idea,
not to the whole
objective function,
but singularly to all
the absolute values.
So let's pick one
absolute value of xi.
And, as before, we observe that
this is the smallest number
zi that is, at the same time,
greater than or equal to xi
and minus xi.
So then the absolute
value of xi can
be replaced with a zi in the
objective function provided
that we add among the
constraints xi less than
or equal to zi and minus xi
less than or equal to zi.
If you do so, you obtain the
following linear programming
problem.
Minimize the sum of ci zi
subject to Ax greater than
or equal to b, which were our
original linear constraints,
and xi less than or equal
to zi and minus xi less than
or equal to zi for every i.
This is a linear programming
problem with variables
xi's and zi's.
Once again, you should show
that the linear programming
problem that we have
obtained is indeed
equivalent to the
one we started from.
In proving this,
you will realize
that the assumption
that all the ci's are
greater than or equal
to 0 is fundamental
for this equivalency to hold.
Let's see an example.
Here we're minimizing 2
absolute value of x1 plus x2.
In this problem, as
you can see, there
is the absolute value
only on the variable x1.
So we only need to introduce
one additional variable, z1,
because, for x2, there's no need
to introduce any new variable.
So we replace absolute
value of x1 with z1
and add among the
constraints that x1
is less than or equal to
z1 and minus x1 less than
or equal to z1.
Next, we are going to
discuss data fitting, which
is one of the most fundamental
techniques in data science.
There are several different
ways of doing data fitting,
and we will see that at
least a couple of them
can be written as linear
programming problems.
So let's state our
data fitting problem.
We are given m data points.
Each one is of the form ai,
bi where ai is a vector in Rn,
and bi is just a scalar.
We would like to
extrapolate from this data
a predictive model that
is able, in the future,
to predict bi given ai.
One way to think about this
problem is the following.
Let's say that every
i is a test patient,
and every entry in the vector
ai is the result of a test
that this patient took.
For example, a1 could
represent blood pressure.
a2 could represent
the age of the person.
a3 could represent the weight
and so on and so forth.
And then bi tells us if this
person has heart disease or no.
Then, in the future, whenever
a new patient comes in,
the idea is that we can
test all the a vector
for this new
patient, and we want
to be able to
predict this b based
on the previous observations
done on our m test patients.
In this situation, we often use
a linear model of this form--
b equal to ax, meaning that,
if a new patient in the future
will come in, we
measure the vector a,
then multiply it by a vector
x that we have obtained
from all our previous data.
And, in this way, we
obtain the predicted
b that tells us if this new
patient has our disease or no.
Here x is the parameter
vector to be determined
using our m test patients.
A fundamental quantity is then
the residual or prediction
error of the parameter vector x.
And this measures,
for every data point,
how well our vector x
predicts bi given ai.
So this can be written as the
distance from ai transpose x
to bi, which can be written as
the absolute value of bi minus
ai transpose x.
Clearly, we are searching
for an x that minimizes
this prediction error.
However, remember that we
have m different patients.
So we have one prediction
error for each patient.
We, essentially, want to
minimize all these values
at the same time.
And there really are
just several ways
to do that, which
are not equivalent.
So one way is to minimize
the largest residual.
In this way, our problem becomes
minimize the maximum residual.
And this is exactly
a problem of the form
that we have just seen.
We are minimizing a piecewise
linear convex objective
function.
In fact, this is the maximum
of the two m functions, which
are bi minus ai transpose x
and ai transpose x minus bi,
because each absolute value
is the maximum of these two
functions.
So we can immediately write a
linear programming formulation
for this problem.
We replace the piecewise
linear convex function with a z
and then add the
two m inequalities.
Let's say that every affine
linear function inside the max
is less than or equal to z.
And here they are.
In this way, you can solve
your data fitting problem
with a linear
programming problem.
But, remember, the one
we have written here
is not the only way to interpret
our data fitting model.
Another possibility
is to minimize
not the maximum residual,
but the sum of all of them.
Of course, you obtain
a different problem,
but you might be interested
in this one instead.
Also, in this case, though,
you can solve this problem
as a linear programming problem.
Once again, we
use the same idea.
We replace every absolute
value with a new variable, ci,
and add to the constraints
that bi minus ai
transpose x and its opposite
are less than or equal to zi.
So we obtain the following
equivalent formulation.
We minimize the
sum of all the zi's
from 1 to m subject to bi
minus ai transpose x less than
or equal to zi and minus bi
plus ai transpose x less than
or equal to zi for every i.
You can check that you
have obtained, indeed,
an equivalent problem.
And, clearly, the one
that we have obtained
is indeed a linear
programming problem.
And this concludes this video
on piecewise linear convex
functions.
