[ticking clock]
[chess pieces moving]
[ticking clock continues]
Alright. Welcome back.
Some of you may remember me from a couple hours ago.
So what I'm gonna do in this last session
is try to talk a little bit about
some of the richer models and some of the complications
that come up as we
really try to push these models
into real world
settings.
There are a number of different things that come up.
We started to talk a little bit about scalability,
we've talked about [inaudible].
We had some examples of that.
Issue of human behavior,
robustness to uncertainty,
learning, and finally, evaluation.
We left off this morning and I had
talked us through some very basic game models,
what is a security game?
How do we think about
game theoretic reasoning and reasoning about an adversary
in this very simple two by two
kind of scenario? And
we all know the real world is not
this simplified. There's way more than two actions.
We may not know these numbers and the payoff matrix exactly.
There may be unexpected actions that occur outside of these
actions that we've modeled.
And so these are all the problems that get us excited
as researchers because these are
the novel challenges
and they come up as we start tackling
new domains, so
let me give you one additionally story about how this
really came about
leading us into the scalability challenge really for the
the first major time.
So I arrived at USC in
late 2008
and they had just finished the ARMOR application I talked about,
the airport
checkpoint and canine allocation system.
And we were just starting a project with
the federal air marshal service
to develop this system that eventually became called
the IRIS system for Intelligent Randomization In Scheduling.
So this was designed for the federal air marshal service,
most of you probably know
who the air marshals are and what they do.
They basically inflate law enforcement,
they're assigned to commercial airline flights.
And like many of the problems that we've talked about,
the core problem is that there are
not enough air marshals to assign
an air marshal or an air marshal team to every flight.
The scale of the
problem at the time we were looking at it at least was
approximately 30,000 flights per day,
27,000 or so
in the domestic sector and 2,000 or so
in the international sector.
Even by itself, before we start thinking about
the adversarial nature of the problem at all
this would be a massive scheduling problem.
A very difficult problem just to solve,
just meeting all the constraints that you have on the schedules.
We can't teleport an air marshal from one place to another,
we can't put them on two flights at the same time,
and there's even some further constraints in terms of
duty hours and so forth,
all which have to be captured just to get a feasible
schedule that you can actually fly.
Then we add on top of that
the main concern that we were trying to address here,
which is that we don't want to be flying predictable schedules that,
so we can figure out that you're always on this flight
on Monday morning and never on
the afternoon flight from
Dallas to Chicago, or whatever it is.
So we started doing some preliminary analysis,
we looked at even a very simple case where
we had ten officers and a hundred flights,
and we started coming up with,
as a computer scientist, really terrifying numbers.
That there were ten to the thirteenth, so
one with 13 zeros after it.
Even for this really simple problem, and then
we're looking at the original problem and it's
orders of magnitude larger than that.
So this is
the time when the scalability challenge really came into focus.
We eliminated the problems [inaudible]
by focusing just on the international sector, but even that was
extremely challenging.
And the core problem here was the
very large number of defender strategies.
So if we looked at the transition
the original ARMOR problem
had
hundreds of actions for each of the players,
and the air marshals is, we can't even list
the numbers in a font that you can
that you can see.
When we tried to run our
solvers that worked nicely for the
original ARMOR problem,
you get up to
tens of flights
and the system just crashes, runs out of memory,
can't solve it
when we're talking about trying to get orders of magnitude larger.
So an interesting problem for us
computer scientists,
I'm not gonna go through all the clever techniques
that we have come up with over the years to try to
to improve the scalability and to be able to solve real,
world-sized problems,
but there are a number of them. And
in each of these categories we have
probably tens if not
more than that, papers that have been written by
Ph.D students, doctoral thesis that have been written
about solving a lot of these problems so
it could certainly go into a lot more detail.
Just to give one very high-level
intuition for one of the ideas that was used in ARMOR,
the core problem is all these combinations of schedules
so as you increase the number of Air Marshals
you have lots of different
ways to assign them in combination with one another.
That causes the,
this is the,
one of the few mathematical slides I have.
This is actually a representation of an optimization problem
that becomes very
intractable as you have all these combinations,
but when you look at the problem itself you start to
to notice a lot of repetition
and that leads us to
compact representations that
eliminate a lot of that repetition.
So we came up with ways of representing this problem
in a much more efficient way,
and that allowed us to,
at least for this system scale,
quite dramatically
to larger and more complex cases.
That wasn't the only trick we had to use, there's also some
more advanced optimization techniques
that had to be brought into bear.
But eventually we were able to get positive results, so
as computer scientists,
we often show kind of these run-time plots
showing how well we're able to do in terms of
algorithms, and this is the kind of thing that we
we like to see, so
the original ARMOR system
as you increase the number of targets, the runtime is just
shooting up dramatically.
Our
more intelligent, algorithmic designs
let us scale up.
We've got
up to 20 targets here and we're almost instantaneous
and when we started looking at the
more realistic scale problems
we were still able to solve those within seconds,
as opposed to
hours of computation time, or not even being
able to solve the problem at all.
So the scalability issue
is maybe not that much of a
interest to many of you.
It's of great interest to computer scientists, but this is
one of those real-world challenges that
just about every problem we
we run into
you've got to do something to address scalability
and be able to actually
analyze the complex problems that arise
in the real world.
So next thing that I wanna
talk about a little bit
is the human behavior, the human element.
So a lot of
basic game models
the starting point for analysis has always been
perfect rationality.
Perfect rationality is
easy to capture mathematically.
There are only certain ways that somebody can be
perfectly rational and maximizing their goals
effectively.
Humans
humans don't always behave
perfectly rationally.
There's a number of reasons for this.
When we talk about modeling games
we have to make a lot of assumptions to be able to
write down an explicit model of the game.
Some of these assumptions involve, for example
what are the utilities
for an attacker for different outcomes?
So how much does an attacker prefer
one kind of outcome to another?
So we have to try to say something about that.
What does the attacker know about the defender's strategy?
There's different assumptions we can make there as well,
we can assume complete
observability,
we can assume that they don't know anything
about the defender's strategy,
or something in the middle.
We can assume some sort of partial
observability kind of model.
But again, we're making
assumptions to try to construct a model
that we can analyze.
We could also,
we have to make assumptions about what procedure
an attacker's going to use to make a decision.
These rationality models where they're maximizing expected utility
is one way to that they could make a decision,
but there may be other ways,
particularly if they are human adversaries,
it may be difficult to describe exactly how they might
reason about the game and make a decision.
And the reality is that
even the best models that we can construct,
are going to be
estimates of the real situation in interesting ways.
So a lot of the research directions that have been
that have been pursued
have tried to address
various aspects of how we are approximating
the real world.
So one big picture research direction
that we published a number of papers in is trying to
improve and validate
more accurate ways of modeling
human behavior that go beyond
just
rational agents.
We'll talk about just a couple of examples of that.
Another big picture direction that we've been
pushing in is trying to come up with ways of analyzing
these models that are
more robust,
and taking into account the fact that
the models are approximations
and that there may be uncertainty
about the model itself in different ways.
So we want to build in some robustness
into our analysis.
And then the final direction which is
fairly new, Arunesh talked a little bit about
learning in the final part of his
presentation.
We've also been doing some other work on
making our models more adaptive
and being able to take into account
additionally information, additional observations that we may have
and learning to improve our models over time.
That also has the affect of making us more robust
and adaptable.
So first on the human behavior,
element,
there's a whole branch or research
called behavioral game theory, it's a sub-field of game theory,
which jumps off from the observation that
when we take
human beings and have them play games,
I had you play a very simple game in the early morning session,
we do this to humans all the time
and in laboratory settings to try to
see how humans actually make decisions in these kind of
simple games.
And then
so the first analysis is almost always,
well humans are not behaving according to a Nash equilibrium
kind of prediction, right? They're deviating in all kinds of ways
moreover, they're not even necessarily playing that predictably.
It's not that everyone plays a game the same way,
there's a lot of variation in how humans play games.
There are some
predictable patterns.
But
certainly not easy to predict exactly how
humans are going to play.
So the agenda of behavioral game theory is to try
to come up with better descriptive models
for how humans actually play,
then test them
in these laboratory experiments and ultimately,
perhaps even in fielded studies
of real world interactions.
So we draw on
observations and techniques from psychology and experimental
economics to try to motivate us to find
more predictive models, and there are
many, many different phenomena,
different kinds of models that have been
posed and tested to varying degrees
about how humans actually made decisions
in the real world, so
I've listed a few here, choice three.
Prospect theory,
anchoring biasses,
so anchoring bias is this notion that you tend to anchor
on certain kinds of solutions,
like uniformed distribution may be a natural anchoring point.
subjective utilities,
and many, many, more.
So what we've been pursuing in the security games
literature is
behavioral experiments,
trying to integrate some of these
models from psychology
and then testing them out in
laboratory experiments designed to mimic
these kinds of security decisions.
So we're going to actually recruit human participants.
We are going to
get them to play games.
We recruit them from different sources. We often recruit
from mechanical turk, because it's
fast, cheap, and relatively easy.
Although you can also recruit from
populations of students,
or if you
have the resources you can try to get pools of
security experts to try to play these games as well.
The image that we have on the right here is
an interface to one of the early experiments.
This is an experiment that's actually designed to mimic
the decision-making that's taking place in
the ARMOR application from the attacker's perspective.
So there are
eight options
representing the eight different
terminals at the airport.
I would present to our participant in the experiment
some information about what their potential gains are if they
choose one of these options
and also what their potential losses are
if they are,
if they choose an option which is being protected.
I would present to them
information about the defender's strategies, so recall that we
are modeling
attackers who have surveillance capabilities,
so we actually present to them
information about the strategy that's being played against them.
The probability here that there's not going to be a guard
in each of these options,
or that there will be a guard.
And then there's also information about what the
the guard's payoffs are.
So these games are
presented to the participants with instructions.
There's some practice games that are played
to make sure that they understand the game.
And then we let them basically play the role of the attacker
and see what they actually do.
They are rewarded for success, so we pay them
real money when they are successful
in making good choices.
We can then see the patterns of behavior
and we can test
different game theory
theoretic solution algorithms including these different behavioral
models
against actual human-decision makers.
So there have been
quite a number of these experiments conducted,
not just in this game but in
games that reflect green security games,
these kind of poaching scenarios and others.
We've also looked at a number of different
kinds of behavioral models.
One example of a behavioral model that we've looked at
called the quantal response equilibrium.
This actually comes from early work in choice theory, which
basically shows that people do not always pick
the very best alternative. They are not
perfect maximizers.
But they do tend to make choices that follow distribution
where
higher value choices are picked
more often.
So that's
called a quantal response.
So a quantal response
can be used in game theory by replacing
a normal best response where you choose
the optimal response to whatever the opponent does.
With a quantal response
it basically adds a noise term, and
there's a perimeter within this quantal response that
interpolates between
purely random responses in one hand
and a purely random response on the other hand.
So you can kind of smoothly vary the amount of rationality
that you are assuming.
[inaudible]
And
when we integrate this into game models now we can
compute an optimal response against
and attacker who's behaving
according to
some level of noise or irrationality
in their response function.
So that's one example.
To give you an example of experiments, so
This is an experiment run using the kind of game
interface that I showed you before,
with seven different payoff structures. So we
we can generate different kinds of payoff structures,
different reward levels, and so forth.
In this case, we used five different
different techniques, five different strategies,
including this quantal response and also a couple of other
different behavioral models.
So the participants played all the games
and
in this case they did not receive any feedback until
all the games had finished.
And we can measure
what is the average defender expected utility?
So basically, how well do these strategies do
against the actual
the human participants in these games?
So this is
four of the payoff structures.
You don't need to understand for now
what all these different bars mean, they're all
different variations of behavioral
game theory algorithms that we've tested to see which ones
do best.
So the important thing is that
these yellow lines represent the perfect rationality solution.
We assume
the attackers are perfectly rational
and we optimize against that.
And several of the different behavioral solution qualities,
solution concepts
are able to improve against the human participants over
the perfect rationality models.
And in this case,
the best response to the quantal response equilibrium,
that I mentioned on the previous slide,
does consistently better than
than all of the perfect rationality models
and it is the best overall among the
the behavioral models that were included in this experiment.
So this is ongoing work, this is a long term research agenda
to try to explore the advantages and
and how good these behavioral models
can get.
There's a lot of validation, a lot of experimentation.
These experiments take
quite a while to run because you actually have to go out
and do a
an experiment with
people and analyze the results.
But it has been shown that we can
do better than just assuming perfect rationality
at targeting our responses against
the kind of decision-making processes that humans use.
I'll just mention that the current leading model, which
the last I heard is doing
the best among all the models
aversion of quantal response that also includes
something called subjective utility. So
there's some additional perimeters where you can
modify how the
how you are assuming the participants
perceive their utilities.
So that is one direction that we've been pursing
with respect to human behavior.
I now want to talk a little bit about some things that we're doing
more generally
about robustness to uncertainty,
about other perimeters of the model.
So the idea here is that we want to be able to account for
other kinds of uncertainty
in our model itself.
So one big example is
payoff uncertainty.
So when I showed you that matrix
I wrote down a number that said, if target 1 is attacked
and
it's not defended,
the payoff for the attacker is negative two.
That's a very specific number.
So it's likely that we have some information about the payoffs
in the game, especially
the defender's payoffs. Maybe
less
reliable information about the attacker's payoffs.
But for both of those cases there's likely to be
some amount of uncertainty about those values.
There may also be uncertainty about
the observations that we have.
So in cases in these games where
players are making observations about each others' strategies.
There may be uncertainty about that.
There's also decision-making uncertainty,
which relates to the previous section where it's talking about
human behavior.
Humans make decisions in lots of different ways.
So we've looked at different ways of representing
this kind of uncertainty
and I'm gonna talk about a few of them
that relate to payoff uncertainty,
although these methods can also be used to
capture different kinds of uncertainty
about the game models.
So one that we already encountered
previously in the morning talk was
what we call finite bayesian game.
So here
we're modeling
a small set of different types of attackers.
So the idea is we are not sure exactly
what type of attacker we are facing,
perhaps an attacker with a different kind of capability
or a different motivation.
But there's a small set of
specific attackers that we might be facing.
We can capture that in a finite
bayesian game.
So we have a different utility function represented
for each possible attacker we can be facing
and we can translate that into a larger game and
solve it using an optimization model.
So that doesn't change things too dramatically, except for
making the game larger and harder to solve.
Another way that we've looked at to represent uncertainty is
what we call a distributional
payoff representation.
So this takes the approach that
so we may not know exactly what the payoffs are,
so we want to be able to model
a range of possible payoffs and actually
the likelihood of different payoffs within that range.
So before, for each target
we had just two numbers for each player.
So one represented
what is the payoff if it's a successful attack? And one
represented the payoff if it's not a successful attack?
So in these kinds of models we actually have
a probability distribution that can be
an arbitrary probability distribution over
possible payoffs for each of these two cases.
On the graph here
a higher number represents a payoff which is more likely.
So this allows you to do things like saying,
the payoff
has maybe a Gaussian distribution
centered around this particular value
and more or less uncertainty depending on
the standard deviation.
Although we
we don't restrict ourselves to Gaussian, we can allow
arbitrary distributions over payoffs for these kinds of models.
So now instead of having a point value, we have
a more expressive distribution over possible values.
Theses games are
in fact, very difficult to solve.
We come up with a lot of algorithms that we can use to
to solve these games, to find good coverage strategies,
taking into account this uncertainty
about the actual
payoffs in the game.
And I'll show you
one slide of results here.
So
this is the defender's expected payoff on the Y-axis.
We're varying here the amount of uncertainty
about the payoffs. So the variation between
our payoffs and the ground truth payoffs.
So have a think about this.
And we have a number of different solution techniques here,
you don't need to know what all of them are.
What I will show you is that these two bottom lines here are
solutions that assume perfect information.
They basically take the ostrich approach and,
despite the fact that you have uncertainty about the payoffs,
you
just take the mean of the distribution, for example,
and assume that that is
your payoff in the game.
So you ignore
the uncertainty that you have about the payoffs.
And what we find is that those approaches
are quite brittle,
and
it's okay if you don't have any uncertainty at all,
but as soon as you start adding
some noise into the model,
the payoffs drop off quite quickly.
These upper lines here, so the green one
is an approximation
that does sampling in only one dimension.
The top lines are sampling
and approximating
both about the type,
and they are approximating the optimization.
And we find that these are able to be much more robust
to this kind of uncertainty.
As you add more uncertainty your payoffs are naturally
going to go down, right?
So you can't optimize as well when you know less information.
But,
they fall off
much less dramatically, so
if you have a little bit of noise you can still do
very, very well
and even as you add much more
extreme levels of noise,
you're still doing much better than if you
simply ignored the fact that you had uncertainty.
So the main point here is that
uncertainty
exists.
We have it in our models and
our models do not perfectly capture the world,
but we do have computational techniques for
accounting for this kind of uncertainty in our models.
We can solve those models and we can do,
we can have much more robust solutions than if we just
ignored the uncertainty.
So both of these are what we call bayesian games in that they
they have specific distributions
that express the likelihood of different
types of events. So
these are great in certain ways, because they are
a very general frameworks for modeling uncertainty.
And we can get exact behavior predictions based on these models,
but they do have some limitations.
So they require this distributional information,
so they require you to specify
a whole distribution over the possible values,
and that could be difficult
to specify.
So there's even more perimeters that we need to
we need to get from our data, or from our
experts who are providing the values.
And then can be wrong about what the distribution is
and
as I mentioned the models are extremely difficult to solve, so
it's very hard to scale them up. We have to do a lot of
approximation if we want to solve really large models.
So another variation of
a way to capture uncertainty that we've looked at
is using intervals instead of
distributions.
So what this allows you to do
is instead of, again specifying
a single point value,
we specify a range of values
when we say that the value could be
somewhere within this range.
So for example, we could say that the attacker reward
for attacking target one
is somewhere between one and three.
So that gives you a larger range of possibilities.
There's less information here, so we
we don't know anything about
the likelihood of different values within that range in this model,
all we're saying is that
the values fall
within these two extreme points.
So that leads us
typically to do something like maximizing
the worst case that we can have
for any attacker value within
this range.
But it is distribution free,
it allows us to
to analyze these games. It turns out that they are
much easier to solve. So we can actually solve these
in computer science terminology, it's in polynomial time.
We can solve these kind of interval security games,
in practice, we can scale up to
millions of targets, whereas
for the distributional games it may be hundreds.
So this may be the only option for representing uncertainty
if you really have a massive game and you do want to
capture uncertainty about payoffs in some way.
And again,
this is a single slide that shows
the payoffs.
So
this bottom line again is the
ostrich approach of ignoring uncertainty that we have.
So just using a point value in the middle of the interval
and
saying, okay we're gonna solve this
just using our normal game algorithms.
You can see that that
that approach gets very low payoffs.
The other approaches that do account for uncertainty,
you know these are several different algorithms that we have
that have different computational costs.
But all of them do
much, much better, and are actually quite robust
to adding uncertainty into our game model.
So the next thing that I wanted to give you a flavor about
is some of the work that we've been doing
very recently
using learning.
So the starting point here
is that in many games
we actually do have
repeated interactions with an adversary.
So our original game modeling we were thinking mostly about
terrorism, and airport security,
and these kinds of incidents.
And unfortunately there
we're in a fairly dated, poor environment because we
don't have that many incidents, we don't have
long repeated interactions that we can observe.
But there are many other
types of security domains where we do have
lots of data and we are repeatedly interacting
with an adversary over time.
And that leads us to a different kind of
way of thinking about these game models
where we have a repeated game,
multiple interactions, and it allows
it allows us to learn
about an opponent. It also allows
them to learn about us.
So this had led us to think about
different kinds of game models
where we're learning machine learning kinds of techniques
to update
game models over time based on
observed behaviors.
And one of the issues that comes into play,
that I'm gonna talk about, is the need to
explore,
you can also think about this as maintaining situational awareness
in certain domains
to be able to learn about
an attacker's responses and perhaps learn about how
the situation or attacker behavior may be changing
over time.
So I'm gonna motivate this from the context of
a simple
border security kind of scenario.
We've also been thinking about this recently
in the context of network security, where again
there's lots and lots of
probing and attacks that are going on over time that we can
attempt to learn from.
So to investigate
this issue we came up with a very
simplified abstract model
that we think captures something interesting about how
these defender and attacker interactions can evolve
over time.
So this is a resource allocation scenario based on
the idea of their being a number of zones
that you want to patrol, perhaps in a
a border region.
In our green security games it could be
regions in a park or something
So we're gonna model this just as having a fixed number of zones
and the key difference here is that we're gonna model
multiple rounds of interaction over time.
So we're gonna play this game
repeatedly, and the players are going to
observe what happened in the previous rounds
and be able to update their strategies
for future rounds.
So there's gonna be
multiple attackers in this case.
So there's gonna be
May attackers who are making decisions of which
zone they are going to attempt to cross through.
And the defender is going to patrol one zone, although
we could generalize that and have multiple resources, but
for simplicity we're just gonna assume that the defender picks
one zone.
And in each round
the defender then goes to this zone
and
apprehends the
the attackers who tried to cross in that zone,
or who sought that action,
and does not observe anything happens outside of that.
There's an interesting trade off here between
going to places, like in the
criminology example where you know that there's a lot of activity,
and learning more about what's happening in other places
that you haven't observed activity.
And this is especially relevant in dynamic situations where
the patterns of activity may change over time
where
the attackers make be learning about
where we are placing resources and
adapting and modifying to that policy
over time.
So interestingly this kind of
scenario maps a very,
neatly
in certain ways, into
a problem which has been studied in machine learning
called a multi-armed bandit problem.
So in a multi-armed bandit problem
the basic idea is that you have
a bunch of slot machines,
each of which
you can pull the arm on one of these slot machines
and it is going to give you and unexpected payout.
So this is
motivated by a kind of casino example.
I love the octopus puling all of the
the slot machine arms.
I like presenting this slide just to show that picture.
But the basic problem is
you want to decide which arm you're going to pull
in each round
to maximize your overall payoff
over the course of many rounds.
At the beginning
you know nothing about what is the
the payoff in each of these slot machines.
So the only information that you get is
you pull the arm on a slot machine
in one round, you see what result it gave you.
But they're not gonna be consistent right, there's gonna be
a distribution of payoffs that you get from each of these machines.
So you have to pull the arm many times
to get a good estimate of
how good this machine is
versus another machine.
The fundamental problem that this exposes is
balancing what we call exploration
and exploitation.
Exploitation is, I want to pull the arm
that based on the information that I have so far
is going to give me the best pay off.
Exploration is
I may want to pull arms that
haven't given me the best payoff in the past
to learn more about what their payoff is
to see if maybe they actually have a higher payoff than
what I've estimated based on my history so far.
So it's a fundamental balance between
learning more about the environment
and exploiting the knowledge that you have.
So
if the attacker in our
in our simple border scenario chooses [inaudible]
with fixed probabilities,
this is exactly multi-armed bandit problem.
In the more realistic scenario where the attacker can actually
adapt and change behavior themselves, this turns into
a different version of the problem that we call an adversarial
multi-armed bandit problem where we make assumptions that the
attacker - the arms
in this
in this bandit problem may actually have their values
adversarially changed over time.
And there are learning algorithms that have been proposed.
I won't go into the details, but the interesting thing is that we have
algorithms like UCB
and EXP3 that tell us
how to balance
our exploration and exploitation actions such that
over time as we
have large numbers of interactions
we are going to converge to always playing
the best strategy.
And that is true even in the case where we have
and adversary who is actively trying to
minimize our payoffs in the case of this
EXP3 algorithm and the way,
just to give you a little bit of intuition of how it works is
basically you always have
some exploration that's going on
for the arms that seem to have low values.
And if you observe a change in one of those arms
you very quickly reevaluate
how valuable that arm is
and you devote a lot more resources,
what were arm pulls,
to quickly trying to
learn what's going on, what has changed.
And so you [inaudible]
very quickly to respond to changes
in the underlying distributions.
So we've done a number of experiments
in simulation exploring these kinds of learning algorithms
in this basic border patrolling scenario.
So these lines represent
both the
the pure learning algorithms,
so EXP3 in particular is this blue line here.
But they also represent a number of other
learning algorithms that are combined
with some prior information from an equilibrium analysis.
So basically we're taking an estimated game model,
using that as a starting point, and then
learning and updating that game model over time.
And the interesting thing that we find from that
is that we can get
the advantages are very quickly
we don't have to have this sort of slow start
of a learning algorithm.
We can take advantage of the information that we know
immediately,
but then we also converge to a high value
and we've also done experiments where
we've looked at things like intro ducting,
injecting changes.
So the adversary's
payoffs or behavior change
at different points.
And these learning algorithms are quite robust
and are able to
detect that
and quickly rebound to higher values.
So the last topic on my agenda for this session
is talking a little bit about evaluation.
So you've seen a few slides throughout about evaluation,
but I want to give you a little bit more of a big picture
about some of the kinds of
evaluations that have been done, especially if they deployed systems.
And evaluating these systems
is not easy.
It is quite difficult to
get access to the data, it can be
difficult and expensive to perform experiments.
But we've done a number of different kinds of evaluations
to try to asses
how well these systems
are doing at optimizing limited security resources.
So the core question is
so all these methods that we've presented
that I
used this methodology of security games and modeling
these scenarios using the tools of game theory,
are they superior?
Do they actually show advantages in comparison with
the relevant base lines.
Two relevant base lines here would be
human schedulers, the human experts
who have often been doing these scheduling tasks
prior to implementing these systems.
Or other
benchmarks so
base line simple randomization strategies [inaudible]
randomization strategies that are not
fully optimized.
And we've done a number of different types of evaluations.
So
the first step would be
lab evaluation so we can do
evaluations against simulated adversaries.
So you've already seen
some examples of that.
We could do lab evaluations against
human subjects as in the behavioral experiments.
The next step would be
doing field evaluations to experiment with the
the systems in the field and look at the
patrol qualities.
To look at the unpredictability,
the coverage kind of properties that these patrols have
in comparison to
what was done before. So we can do comparisons with
real schedules.
We could do scheduling competitions
where we can try different solutions against each other
and look at the results.
And then we can also have experts look at these
schedules and try to asses which ones
are better and for what reasons.
And finally,
the gold standard in some sense would be
seeing how the system works in the field.
So we can try to
see how well these systems do against
mock attackers or red teams.
We could also
look at what actually happens
when these systems are deployed, so we can
look at what data we are able to collect
and analyze about capture rates of
actual adversaries.
Now, the amount of data that we have
is gonna be very domain dependent, so
I'll show you a couple of examples of that.
So
why does game theory
perform better?
So human schedulers
they have predictable patterns,
I'll show you an example of that on a slide in a moment,
but they also have a lot of effort,
cognitive burden,
a lot of time that goes into scheduling.
So that is a downside compared to just being able to
hit a button and let the computer do the scheduling.
If we compare to simple, random dice-rolling techniques,
uniform random kind of strategies,
those repeatedly fail in deployment so
they're susceptible to things like sending officers
to very sparsely populated terminals
and
nobody wants to be
protecting an empty room, you wanna be
where the action is
and where the actual risks are.
So we can also think about,
you know, weighted random kinds of strategies, but
what is the right kind of weighting?
You can look at different types of weighting, but they all
are reliably inferior to
the optimized weightings of the game theoretic models.
Let me
present a few slides.
There have been many more experiments than these
that have been done,
but these are some examples of the different types of
evaluations that have been done.
So this is a
a simulation experiment
with the IRIS system for [inaudible] where we
compared
two different types of weighted random
strategies, as well as a uniform random waiting,
compared to the IRIS
system.
This is just different
different sizes of schedules
and we are looking at the defender expected utility
and we found in this system using the actual
payoff values that were
entered into the system by the Air Marshals that
the utilities for the IRIS system
were dramatically higher than
the defender expected utilities.
But that's just an experiment
in simulation.
This is a field evaluation that was done for
of the schedule quality for the PROTECT system.
So we haven't talked in detail about the PROTECT system,
it's similar to IRIS
and ARMOR.
This was done for the Coast Guard,
so this is helping to schedule
and randomize patrols for
the Coast Guard first in Boston,
and that's where this data comes from.
And so they actually
had thought really carefully about how to test the system
before it was deployed,
so they did pre and pos-analysis of what their patrolling
schedules looked like
before PROTECT was deployed, based on
what was being done by the human schedulers
and then after
PROTECT was deployed.
And this is across the seven days of the week.
So these are lines
representing different targets that had different values
and
the X-axis here is the count of the number of times
that these
targets were patrolled on different days.
And the graph here
the thing to notice
two things.
One is
this dip on day two.
So for some reason
there's a lot less
patrolling activity going on on day two.
If I'm an attacker I'm definitely thinking about
planning an attack for day two, instead of any other days
out of the week.
And there's also a lot of
crossing in these lines, there's a lot of inconsistency and some
patrol,
some targets are being patrolled more on certain days
and less on other days
and so forth.
This is what the graph looks like,
same process, same data collection
after PROTECT
was
deployed.
So you can see
we no longer have this
this dramatic
weak point in the patrolling.
So now it's evened out, so
there's a much more consistent  patrolling pattern across days
and also you can see that the lines
are much more even across the days.
And what that reflects is
that the targets that have the higher
risk values are being consistently
patrolled more frequently
and there's no variations of that across the different days
that could potentially be
exploited.
And when they calculated the expected utility for the defender
this resulted in a 350%
increase in the expected utility,
driven by the fact that
before PROTECT
a rational, intelligent attacker is going to
exploit the weak points in these patrolling strategies.
So another evaluation
of the scheduling
system IRIS was done by the federal Air Marshals.
So they actually did
a version of this test internally. So we don't have the
actual data to present, because that
is sensitive, but
so they did exactly this test where they
had their
previous human scheduler schedule the
the flight patterns for the [inaudible]
and they compared that to the IRIS
flight patterns. And over
a six month time period of looking side by side at these schedules,
they determined that the IRIS system was
scheduling more effectively.
And there's actually a [inaudible] report that talks about that
analysis that was done.
This is another
system, TRUST, that we haven't talked about
in great detail. So this is actually a system
that was developed in coordination with the Los Angeles Sheriffs Department for
optimizing patrolling on the metro system
in Los Angeles.
And one of the interesting things about that system
is that we actually get a lot more data, because
they're doing things like
checking for fare evasions,
which happen much more frequently than terrorist attacks.
So they're able to do some
systematic tests.
So this is one of the results from
a test. This is an expert evaluation
of comparing the
the game theory schedules to a human-generated schedules
on these trains. And they had some questionnaires that they
had the experts evaluate these schedules and
look at whether they were
they made sense in different ways.
And the game theory solutions were consistently evaluated as
better than the human schedules.
There have also been some field tests done against
red teams or mock adversaries.
Again, one of the examples of where this was done was the PROTECT system.
So they did
these sort of mock attacker
teams who actually went out and
were observing what was going on in the port of Boston
pre and post-PROTECT.
And their evaluation was that deterrence
improved.
There were also some interesting real world indicators.
So one of the chiefs there
in Boston
so after PROTECT had been deployed
started getting questions about
about whether or not the Coast Guard had actually
recently acquired more boats and,
of course they hadn't, they were using the same number of boats, but
because they were appearing at different times and
different places than people were used to,
the perception was that they were
they were all over the place and they were in
more places
than they had actually been
patrolling before.
So another example of a field test against
real adversaries, this is again with the TRUST system,
where they did a controlled experiment
and they were evaluating
the game theory solution versus a random patrolling
on the metro rail system, they did 21 days
of patrolling.
Identical conditions, so
so a true controlled experiment we have system A
and system B.
We're trying to
match them up under conditions that are
as close as possible.
And they also did a version of
what's called random plus human.
So that is a randomized schedule,
but that the humans are allowed to modify at will.
So
the humans can say, 'This action doesn't make sense.
We'd like to
we'd like to go here instead.'
So it's injecting some randomness, but
it's really
human scheduling as well.
And here are some of the data from that experiment.
So this is measuring,
then they were looking at things like
fare evasion attempts on the network,
as well as other kinds of violations.
And this is
the number of captures, the number of warnings,
and the number of violations in
an average thirty minute period,
comparing the game theory
optimized solution
compared to the solution where
there was a randomized schedule
then the humans were allowed to
to modify that schedule in whatever way they wanted.
And what we observe is that
in all of these different categories the game theory solution
is actually capturing, resulting in better
detection rates
than the other schedules.
There's also results and data that we can collect,
not from controlled experiments, but just by observing
what happens.
So that was the first kind of data that was collected
at LAX. So
they looked at
at the
drug apprehensions and
different kinds of behaviors that they were detecting
at the checkpoints before and after
they deployed the ARMOR system.
And what they noted in the data,
again, this is not a controlled experiment,
but there was a pretty clear jump in the
detections that they were getting
after they deployed the ARMOR system.
And finally,
again in the category of expert evaluation,
several of these systems have been deployed
consistently over years at a number of these
different agencies.
They're not forced to
keep deploying these systems, they use them because
they're viewed as effective
and
useful additions to the security policies.
And these are a number of
different awards and commendations that the team has
received over time for
successfully deploying these applications.
And there's a lot of appreciation I think on both sides for
the successful partnerships that
have been necessary to
to get these applications deployed.
So, that's all I have for now,
but I'd be happy to take any questions.
[Audience Applause]
