[George Pappas]: The last talk of today will be given by Andreas Krause who is professor at
ETH and he's going to talk to us about
safe exploration and reinforcement learning.
[Andreas Krause]: So, I'd like to start by thanking the organizers for putting
together this fantastic conference at
this point in time. I think it's
super important to get the fields
together and I'm very thankful that
they invited me.  I'm super excited to be here today and to tell you
about some work that we've been doing in
the context of safety and risk and
reinforcement learning and particularly
in the context of exploration and system
identification. And this is a joint work
with a fantastic set of PhD students and
collaborators that I'll acknowledge as
we go along. This will also connect to
some of the concepts that Angela Schöellig already talked about and
touched upon in her talk earlier this
morning. Okay, so we live in a world where
an increasing number of decisions are
made by systems that learn, and these
systems learn online. Maybe the most
ubiquitous example of this are
recommender systems, right? These systems that try to figure out what we like by
observing how we interact with the
content that they show us. Whether
we read that article or buy that product
or click that ad, etc. It means that
the very decisions that these systems
take becomes the very data that they're
being trained on. That means that
we very quickly leave this nice cozy
setting of i.i.d. supervised learning
that we are also familiar with and really
have to think and take seriously
feedback loops and learning, right, and I
think this is really a call to action to
bring these communities together to take
these questions seriously. Of course, in a
machine learning community that
subfield that maybe is most focused on
this, the context of reinforcement
learning, writes a model we've seen many
times, where does this agent situated in
the environment which it can affect,
using actions to
hopefully try to generate some reward
and some good outcomes for it, and of
course these are precisely the settings
studied in control, adaptive control
and so on maybe with a little bit of
different notation, so I hope you
apologize for me sticking with the RL
notation for now, we will come back to
some more control of theoretic setups
towards the end of this talk.
Okay?  Okay, let's see. Good - and so of course one of the central
challenges in reinforcement learning is
this dilemma of trading exploration and
exploitation, right? The agent doesn't
know how the world work, it needs to
deliberate: 'how do I carry out actions to
figure out how the world works vs. how
do I use what I learned in order to make
better decisions?' And RL, of course, has
been around for many years as has been
controls but there's been quite some
recent breakthroughs in reinforcement
learning fuelled in particular through
advances in deep learning, I say in the
context of Go and so on, right? That you all
have seen mentioned, but I would argue that a lot of these breakthroughs really have
been in situations when we have a
perfect computational model of the world.
And Go in particular, you know
precisely how the board is going to look
once we and our opponent have taken the
moves, and this means that exploration is
really a computational concern. And if
you have computational power of Google
and the like, then we can really simulate
a lot of agents playing against each
other and trying to learn about the
consequences of the actions. But if you
think about applying reinforcement
learning in real-world applications like
autonomous driving, or medical
decisions or various other forms
of automation, we typically don't have a
perfect model of the world and that
means that trying out an action has
potentially injurious consequences.
And what we've been thinking about really is how should one actually think about
notions such as safety and risk in the
context of exploration, and I would
argue as long as you actually start
deploying these systems and really have
them learn online, any action that they
take is exploratory so you really have to
understand that question. Okay, so let's
go back to this perception action loop,
and of course we have to first define what safety even is, and that's a
difficult question. So one might be able
to say certain states and state action
combinations that I really would like to
avoid and maybe express those with a set
of constraints. The problem is really
that, ahead of time, you don't know the
dynamics of the world, right? That means
you don't know ahead of time which
actions are actually safe, or what
actions will get you into states from
which you can't recover, and that's
really the fundamental problem, that you
basically have to deal with constraints
that you don't know ahead of time,
right, and that's fundamentally the problem you need to understand.
I'd like to tell you about some thoughts we've been exploring
in this context, and to that we need to
go more formally in different settings
and in reinforcement learning there's
sort of these two broad kinds of
approaches: model-based RL that we
first tried to 'elicit, identify, learn' a
model of the world and then act
accordingly, and of course what's called
the model-free setting that we would, say,
consider a family of controllers of
policies pi and directly optimize over
the long term value of those policies as
a global optimization problem. And so I'd
like to start exploring and discussing
some of these ideas in the context of
model-free reinforcement learning, but
I'd actually like to put the 'free' in
quotes because we, in the end, will
actually be using models for this 
long-term value. Okay, and so I'd like to
ground these discussions in the context
of real application, and this is a
collaboration that we have with the Paul
Scherrer Institute, that's a large federal
research facility in Switzerland who are
in the process of finalizing the SwissFEL
free electron laser. That's the 700
meter long linear accelerometer about 30
kilometers from Zurich, and this
instrument generates short x-ray pulls of
extremely short durations - a tenth of a second duration, a millionth of a
billionth of a second - and using such an
instrument, one can image
extremely fast processes like molecules
turning into each other in the context
of chemical reactions, and that has lots
of potential applications ranging from
materials to drug design and many other
areas. It's also quite a
complicated machine with lots of
literally moving parts. So the way that it roughly
works is that as an electron source,
these electrons undergo
sort of a sequence of magnets arranged in
a snake-like fashion called undulators,
and these basically shape the
intense x-ray radiation that
this machine provides, and of course the
question is, sort of how to configure
these magnets, move them around in order
to shape the beam into the properties
that they actually like, right? You like
to maximize intensity and you like to
control the pulse frequency, pulse 
energy according to application
requirements, but the system is also extremely sensitive with respect to some other
conditions, like humidity, temperature and
so on. These contents constantly
fluctuate and one needs to adapt. It also
needs to adapt according to different
application environments. Ok, so this is
quite a challenging problem, and, in fact,
there are safety constraints. You don't want to
break that machine. How could that happen? Well, there could be beam losses in
various different places and this might
de-magnetize the magnets, and that you
really don't want to happen, that's
extremely expensive to replace. Okay, but
fortunately, there's the range of beam
monitor - beam loss monitors that sort of
monitor whether you get close to doing
something unsafe. So you might think
about this, sort of, in the following kind
of cartoon that we have - this black box,
right, our laser, and you'd like to input
some control parameters like configuring
these magnets in order to try to optimize some notion of rewards, say the
beam energy that we can produce. But at the same time, you don't want to
incur these beam losses. We also have constraints where we quantify these beam losses.
The trouble is, we know neither f nor g ahead of time.
Right, that means that the only way to learn
about them is to actually carry out an
experiment and, according to the
measurements, decide how to
to proceed. Then the goal, of course, is to
maximize, say, the intensity, maybe
cumulatively over time - we can talk
about different aggregations - while
satisfying these constraints, and of
course you'd like to do that every step
along the way. So you do not just
eventually want to find a feasible
solution, but one for every step. So
basically, the problem you still have to
solve is to try to optimize a unknown objective under unknown constraints
while guaranteeing feasibility every
step along the way. Okay, so I should have
convinced you that this is a completely
hopeless problem, at least in general, so
you'd like to understand a little bit of
when does this actually become tractable?
Let's start with a little bit of a
cartoon here in a simple setting where
you have just one-dimensional f
that's equal to the constraints, so you
would like to optimize f while making
sure that wherever we evaluate, we stay
above this safety threshold. Okay, now
if you don't know anything, even the
first action could be infeasible, right?
So the first thing we'll assume is
that we have a starting point, maybe an initial controller derived from
first principles, physic models, that we
can start to explore. Okay, but
that's not enough, right? So, if there is
no structure whatsoever, it has no
hope to generalize, right? So the second
action could break us.
Okay, so if you want to have any hope, we need to make some sort of assumptions
that we can exploit. Some sort of
regularity assumptions that we can try
to extrapolate from and generalize from.
Okay, and so one natural assumption might
be some kind of smoothness, right, but then even
if you have the smoothness assumption, we shouldn't really expect to be able to
get to something like global optimum, but
what you can really only hope to get to
is sort of the best solution and the connected components of the
feasible set that contains our starting point, and we call that the reachable optimum,
it's going to be our benchmark. Okay, and so now - of course, we somehow need to express
these assumptions about how these
functions might look like, and one
natural way to express prior assumptions
is to take a Bayesian point of view. So we
are going to endow the objective, the
constraints with the Bayesian prior, and
that's exactly what's done in the field
of Bayesian optimization - where you basically
put Bayesian priors on the objective and
the constraint, and now these models
these surrogate models allow to make predictions about the function that
we don't have experiments, but we also
can predict the uncertainty, right,
quantify the uncertainty in the
prediction, and that predict of
uncertainty can then be used in index-
based policies in order to navigate this
exploration/exploitation dilemma and I
spent a huge amount of work over the
years exploring the setting, both on the
unconstrained setting but also with
constraints. But existing work in the
context of constraints really sort of
heuristically tried to encourage feasible
solutions eventually, but there's no
guara- so, in general it will produce
happily unfeasible solutions along the
way and there's also no guarantees associated with it.  Okay, so now also at
this point I would like to say - so, this
is sort of the place where we can really
try to plug in assumptions. So, and
I think, in settings that involve
physical systems and so on, we really want
to make use of what we know, right, and
Bayesian perspectives sort of give a
very naturally way of encoding prior
assumptions, and just a lot of nice works,
for example by Sebastian Trimpe and his
group to think about what sort of the
right Gaussian process priors you might
want to use, say, for linear dynamical
systems and so on. So there's lots of
prior knowledge that can be - I can sort
of infuse these models. Okay, so now how
can you use those? So here's sort of the
cartoon, right?  Suppose we can get
ourselves in this position where after
carrying out a bunch of experiments,
we are quite confident that the true
objective
lives somewhere within these grey
confidence bounds. Okay, so we believe
that with very high probability, the true
function lives somewhere within these
grey areas. Why is this useful?
Well, we can sort of try to get a pessimistic
estimate of the optimum - so the maximum
lower bound, which is this green dashed
line - and then, well, we should expect that
the maximum should be somewhere within
these greeny areas here, right? Where the
upper bound exceeds this best lower
bound, and that's sort of the typical
rationale that sort of justifies this
optimism in the face of uncertainty
principle. Picking the point that
maximizes, say, the upper confidence bound, which is a widely-used strategy in RL
bandits, Bayesian optimization and
the like. Okay, but we also want to
think about safety, right? Maybe being too
optimistic is also not the right thing
to do. Okay, and so of course one can
apply sort of the same picture and the
same rationale for constraints. If
this is now the model of our constraints
and we want to make sure that we
don't fall below the safety threshold,
well, we can sort of try to construct a
conservative under-approximation of this
feasible region, and try to restrict our
exploration to this red set. Okay, now all
this rationale is sort of nice right
but it only works is you can actually
construct these confidence bounds, and of
course we added cruxes. Now it
turns out that, and of course this depends
now on the model, right, and so we need to
specialize. So far, I didn't say anything
about what kind of model we use. Now in
context of Bayesian optimization, the
most commonly used models like Gauessian
process models, those are non-parametric
models that are very expressive but
they're still sort of tractable, so we
understand them very well theoretically.
Okay, now for Gaussians - well, you get
credible intervals for one point by
just taking your mean and add some
multiple times the standard deviation,
but you really want to have sort of a
bound that holds uniformly over the
domain, right? If you want to apply this
rationale from the previous slides, you
want to be able to reason about this
uniformity of the domain and uniformity
over the steps of the algorithm. And
moreover, maybe you shouldn't be too
confident that you have really the prior
distribution from which the true
dynamics are being drawn. Okay, so it
turns out it's actually possible to
configure these confidence bounds to get
frequentist coverage to basically
establish these confidence bounds even
for adversarially chosen objectives -
f and g - as long as they sort of agree with
your prior. By that, I mean that basically
you need to scale these confidence
bounds according to the reproducing
kernel Hilbert spaced norm associated with the covariance function that the Gaussian
process prior uses - that's this first
term in this bound here - that sort of
quantifies the complexity - the agreement
of the true function with your model.
Then, of course, it also should depend on,
sort of, how high-capacity, how rich this
model class is. If it's - really, if you want
to accommodate a lot of different kinds
of functions, you should expect to
somehow admit a lot of uncertainty, and
this is here, this information capacity
gamma which somehow quantifies some kind
of rate distortion. So how many bits you
can extract about a nonparametric prior
from a limited number of samples. And it
turns out this is actually - it can be, one
can - it's some sort of concave
function. It is always sort of grows
monotonically, and one can actually
understand this analytically using ideas
from some modular analysis. I'd be super
happy to talk with you more about this
offline, but this is actually a quantity,
it's very well understood. You can bound
this analytically for different kinds of
kernels. We can approximate it
empirically using efficient algorithms
and so on. So the main takeaway message
is that we can actually understand sort
of fairly well these credible intervals
and confidence intervals that can derive
for these Gaussian process priors and it
means one can sort of realize these
pictures. And it suggests sort of one
natural way of using them in exploration -
namely is, use standard algorithms like
say, UCB, but always constrain the
exploration with respect to the feasible
domain. It turns out that actually fails and
I'll show you why in a moment.
So instead, we do something different. So
what we're going to do is we're going
to build these models for both our
objectives and our constraints, and we
maintain a classification of the domain.
So this red set here is our conservative
inner approximation of the safe set, and among those points - we will ever only sample
these red points - but among them, so
I'm going to pick relevant ones. What I
mean by relevant - so one way for a
point to be relevant could be is because
it's plausibly optimal, right - these
points that I've shown in
green before - but then, look at this
picture here, right. So now in the
situation, the green set of points, these
plausible maximizers, are containing the
strict interior of this red set, and at
this point the algorithm, if you just
were to play UCB, or standard
exploration strategies, there's no
incentive to sort of explore the
boundary. That's why you also have
to think about expansion, right? Trying to
further identify and enlarge this feasible
region. Okay, and that's what these purple
points do. So these are called 'plausible
expanders', this is the point that plausibly
allow you to infer that additional actions are
safe. Okay, and if you now sample among
them, the most informative one,
the most uncertain one, then it
eventually actually finds the better
solutions on this other side of the
domain. That's sort of the cartoon of the
algorithm, I'll spare you the psuedocode,
the details on the paper, but one can
basically prove under the same sort of
regularity conditions that you need for
these confidence bounds to hold that
this algorithm is both safe and complete.
So it can basically establish for
any given confidence level one minus delta,
this is over the probability of the
noise realization that the algorithm
will never make an unfeasible decision
and one can bound the number of steps it
takes in order to find an epsilon
optimal reachable point. Okay, and in
general, you haven't made any
assumptions why this is in general
non-convex functions, so these are - the sample complexity in general would be
exponential in the dimension, right?
There's no way to avoid this with the
cost of dimensionality but I'm going to
talk a little bit about that in
just a moment. So here are some
experiments that Mr. Felix (Berkenkamp) carried out,
actually in collaboration with Angela
Schoellig who is co-advising him with
experiments at Angela's lab. Just to
illustrate this algorithm - so that's the
certainly not as exciting in some of
the videos that Angela showed earlier
but I think it illustrates the algorithm
nicely. So it's basically this robot
trying to solve the simple task of
flying from A to B. It starts with some
first principles controller that gets
the job done, but it's not very efficient,
sort of wobbles around, and so what Felix
did is he injected different tuning
parameters in the self PR controller and
then optimizes over these tuning
parameters in order to get more
aggressive behavior. There's lots of
parameter settings that would
immediately de-stabilize the robot. So if
you just do UCB, the robot would
immediately crash against the wall, right?
But by keeping track of these safety
constraints, the robot actually fully
autonomously experiments results with its own control parameters without
crashing.
Okay, so there's just a quick comparison. Good. Okay, so let me come back to the SwissFEL
application that I mentioned before.
So that's actually a lot of additional
challenges that arise. So I mentioned
the safety constraints, but here this
is really high-dimensional problems you
want to tune tens to hundreds of
parameters. There's issues like
heteroscedastic noise, so the noise
depends on the input, need to
contextualize to different application
requirements. One can simulate, in
principle, but simulations are actually
extremely slow, extremely computationally
intensive. So we really want to carry out
experiments on the actual system.  One - these magnets that move around so
you can't just teleport from one
parameter setting to the next, but you
actually have to sort of plan a path.
There's lots of interesting algorithmic
challenges that arise that I'm very happy to
talk more with you offline, but I just
want to point - to talk a little bit about
this issue with how to scale to
high- dimensional domains, and so the basic algorithm I have described before really
requires basically this discretization.
Obviously, it is not going to scale
beyond the handful of dimensions. So, one
way to fix this is to basically sort of
look at the sequence of problems that
are simpler, each of which. Okay, and one
sort of an extreme case is to basically
just consider, sort of, a sequence of
low-dimensional and extreme case
one-dimensional subspaces and apply the
algorithm only on sort of these
one-dimensional subspaces, right? So in a
cartoon picture, you start maybe at this
initial point here. You would pick, say,
a direction - propose a direction - and
then basically apply a SafeOPT to do
a line search, right, to find the next
point.
Okay, propose the next direction, keep
going and so on. Okay, and so it turns out
that using this simple strategy, one can
actually both get local and global rates.
Okay, so global rates, we can basically
converge at Lipschitz rates that are
exponential, but exponential only in the
ambient or intrinsic dimension of the
problem - so I'll say something about this
in the slide - but also, in general, you can't
really hope to get global convergence,
but if you are sort of in the basis of
attraction of a strongly convex local
optimum, you converge there with
polynomial dependence on the dimension.
Right, so it basically combines the benefits of local optimization and
global optimization. And besides just
random directions, there's also based off
proposing more informed directions based
on this Bayesian
that we discussed in the paper. So here's
just a picture that I'd meant in mind
right, so basically - if it turns out that
this function f of x really varies only
on some low-dimensional subspace - so it's defined as g of a times x for some flat
matrix a - then you only have exponential
dependence on the dimension of this
low-dimensional subspace. You don't know - don't need to know that nationality
and you don't need to know that subspace,
and so we apply this on the actual
device. So here's some experiments that
my students carried out and the
algorithm does quite a bit better than
sort of the standard local search
techniques that are being applied there
at the moment, right. So you would
basically take the existing parameter
settings, detune them adversarialy and then
sort of let the algorithm recover and, of
course, by applying this sort of more
global search, you can actually find
better solutions, and all of these
actually produce feasible solutions along the way, so it doesn't violate any
of the safety constraints. Okay, so this
now can scale to higher dimensions and
so on. Okay, so there's other applications.
I want to give a shout out to some
exciting work at Caltech and Joel Burdick's
group and also in collaboration with Yanan Sui - he was one of the
co-authors of the safe-op paper - and Yisong Yue, where they used a variant of the
safe-op algorithm on a medical
application for spinal cord therapy and
they actually used it in the context of
clinical trials, so I very much
encourage you to take a look at that
work. We also have been using this and various
other applications - one recent one is to
use it in order to contextually tune
parameter settings for building energy
management and some other applications.
So, in the last two hours of this talk, I'd like to *laugh* just talk a little bit
about how some of these ideas can be
used in actually modeling dynamics.
So far, this has been, sort of, very much this global optimization
model three - quote, unquote -
kind of setting, so I'd like to highlight
how some of these ideas can also be
applied when explicitly reasoning about
dynamics. Okay, and so one very natural
way to do that is to now basically
instead of modeling the value
function, right, to directly model the
dynamics, so we can think about the
nonlinear dynamical system that might
consist of an a priori model along with
an unknown disturbance which is modeled
with the GP. It has been done by a number of people here in this
room, right? Really pioneering work by
Claire Tomlin, Melanie Zellinger,
Angela Schoellig and others and so on
So this is a very, very natural model of
basically combining deterministic
informations, first principles models
about a system with nonparametric
capacity to describe deviations from it.
Okay, and now - why is this useful? Well,
once you have this kind of model, you can
sort of try to forecast the uncertainty
using basically the uncertainty in that
sort of model, and then try to come up
with control strategies that avoid bad
states and all plausible outcomes,
and of course the hope is while you keep
doing that and keep collecting more
information, one can condition - become
less conservative, right, and
basically explore larger parts of the
space. That's sort of the high level
cartoon, and of course the challenge is
how to actually do this, right, and so now one thing you can do is you can
certainly use these confidence bounds
that I've talked about, basically for one-
step-ahead predictions. If these
confidence bounds for F, right, you can
sort of predict one step ahead, what are
sort of plausible realizations of the
dynamics, but of course if you project
multiple steps ahead then sort of these
sets get very complicated, you have
nonlinear dynamics. So what is shown in
this paper at CDC last year is how to
basically tightly over-approximate
these Gaussian process forward dynamics
using ellipsoids, which one can do very
efficient polytope and containment tests.
Okay, and so given essentially these
frequentest conference guarantees that I
talked about before, these really give
rigorous - basically - containment
guarantees for all plausible dynamics
going forward. Okay, now the problem, of
course, is you don't just care about
one-step-ahead predictions or five-step-ahead predictions but really, where safety
is a long-term property. You want the
robot to never crash, right? And so if
just reasoning that maybe you can go one
step further and one step further and
one step further and one step further -
well now I actually can't go anywhere
anymore, right? So this you don't want to
happen. Okay, so now the question is sort
of how can you avoid that happening? Well,
you somehow have to think about, sort of,
basically explicitly maintaining a
subset of the safe space the
can actually be safe. And as various ways
of doing that using ideas and control
theory, Liapunov stability and so on
that one can combine with these
nonparametric confidence bounds that
I've described before. So one way how to
do that, we published in a paper in
NeurlPS two years back, where now the
safe seat, the safe starting point, is
replaced by initial stabilizing policy
maybe derived from a first principal
model linearizing the system, together
with sort of initial estimate of a safe
region in which this policy pi is
guaranteed to keep the system. Now, what
one can do is to basically sort of keep
collecting information in that set,
conditioning on it, which
allows to certify a larger subset of the 
domain as safe, and basically one
can continue exploration in that set -
the policy pi will keep you in there,
right - so it will guarantee to keep you
safe, and basically it will sort of keep
expanding the subset of this
region, which one can guarantee to
remain in. Okay, and so under some
conditions, one can even show that this
allows to find, in a sense, the
maximal and approximation of the safe set.
Okay, and now this one can combine with ideas from model predictive control
where basically, one can now plan ahead
both sort of the performance strategy
using standard techniques in model-based
RL, but also a fallback plan, plan B,
that will take you back to the safe set. By tying them together at this
first action, you know, whatever happens
in the first step, you'll be able to come
back. Okay, and that one can do by basically using these ideas from each ability
analysis with these nonlinear dynamics,
right? And this really works for
non-parametric systems, right, modeled with GPS. Okay, so that's basically what I wanted to
talk about. So the keys, sort of takeaway
messages, is that one can actually get these
nonparametric confidence bounds for rich
models like GP's, and combine them with
ideas from robust optimization, stability
verification and so on, but really key
insights from control in order to reason
about sort of all plausible dynamics,
right, and then of course that raised the very natural question - how do I now go
about collecting information to explore,
to become sort of less
conservative as I go along? There's lots of
challenges going forward, right? How to go
beyond GP models, we sort of
heuristically apply these ideas for
Bayesian deep learning models
and so on, but of course GP's are sort of
what one can really understand
analytically very well. How can we learn
good priors if, for example, in the context of simulations, how do you scale to more
complex domains? And of course,
going beyond sort of static scenarios to
really think about non-stationary
environments, multi-agent interactions
and so on. So I also want to point out, so
that Felix and I recently gave
a tutorial at the European Workshop on
RL with a lot more detail and the slides
are all online. Okay, with that I'd like to
also thank all the collaborators of this
work and thank you all very much for
listening.
*Applause*
[George Pappas]: Questions for Andreas?
There's a question.
[Audience Question 1]: Thank you for the great talk. So, I think GP's are really good for Bayesian reasoning
because you have tractable Bayesian
analysis for them, but one of the
challenge with GP's is that the
conditions under which a model assists a
function can be represented by a GP are
very abstract conditions. Like these
Hilbert spaces -
[Andreas Krause]: Sure.
[Audience Question 1]: - and all these conditions, so for safety-critical applications, do you
have any intuition on when a function
can be represented by a GP?
[Andreas Krause]: Sure. So, I would argue that these conditions actually are not very abstract.
So, these RKHS's are pretty well understood. We know exactly how these functions look like,
right? Maybe bounding them in general
might be difficult, but they
actually give a chance of really coming
from sort of first principles
limitations of the - coming from physics,
for example - to really try to get a sense
of what the RKHS norm of function
could be. Of course, in general it's
difficult, right, but but at least there
is sort of a handle as a hope to
do this. I think, sort of, because one understands these spaces
pretty well, it might actually be
possible to do this, I think. That's
actually an advantage of these kinds of
models.
[Audience Question 2]: Hi. Here. So for high-dimensional embedding that's also data-driven, right,
you have some initial data point and we
embed into low dimension from high
dimension. How do you track that
uncertainty?
[Andreas Krause]: What do you mean? So in this line, Bayesian optimization, for example-
[Audience Question 2]: Yeah, yeah.
[Andreas Krause]: Well, so in exactly the same way, right?
So I mean if you look at
the slice of the GP on a subset
it's just a different index set.
Just...GP, you can easily calculate
what's the covariance function on that
restriction and so on, right? So that's, one can...
[Audience Question 2]: As you increase, as you add more data points
the embedding also becomes more...
[Andreas Krause]: So the embeddings remain one-dimensional, but you can share, because of this joint
model, you can share observations across
different slices.
[Audience Question 2]: Okay, thank you.
[Audience Question 3]: What sensors does your drone controller have?
[Andreas Krause]: So, this was all externally observed,
but the details - Angela can probably talk more about
right, and certainly I can point you to
Felix who can tell you all the details of this experiment.
[Audience Question 3]: It's just the camera?
[Andreas Krause]: This is their camera
system the tracks the drone externally,
so that's not on board.
[Host]: I can ask a question. So, in the linear Bayesian optimization, where you went into the
random coordinate descent, how does the
order or the choice effect either
theoretically or empirically the outcome?
[Andreas Krause]: Sure, I mean so what we analyzed theoretically
is really, literally, random directions,
isotropically with a Gaussian and
a prior. So, in some sense,
you sort of benefit from guarantees you
get even for random search, right, but
since you actually always compare with
the current solutions, you also get these
local convergence guarantees, so that's...
But what we also discuss is how to use
the model in order to propose more
informed directions. So, for example, what
you can do is once you couldn't - so the
derivatives of a GP are still a GP, so
it means that you can sample, Thompson
sample a direction in which you can search. That's one technique that we use on a paper.
[Host]: Okay, there's one more question?
[Audience Question 3]: Here, sorry. So your GP bounds are always in probability, right?
[Andreas Krause]: Yeah.
[Audience Question 3]: So, and it seems like a common case is that you're going to be active against a
constraint. So it - I mean, just lots of,
you set up lots of optimization
problems the optimal that you find might
be active against the constraint so it
[Andres Krause]: Yeah.
[Audience Question 3]: So, doesn't that mean with some probability, you're eventually going to expect,
you know, to break your super expensive European instrument?
[Andreas Krause]: Well so, okay so the
guarantees hold with respect to the
noise realization, right? I mean
they rely on concentration bounds from the noise,
and now I think the point is more - so you
might not want to... become
the strictly feasible... so we actually
want to maybe guarantee strict
feasibility, right? I think it's more a
matter of where they actually choose the
constraint. I think that's more the
design of the feedback signal
that you would get for - give for the
constraint function.
[George Pappas]: It's 5:30, let's all thank Andreas for a wonderful talk.
*applause*
