>> Okay. So, we're going
to start the next session.
We're starting this session
with a feature talk.
This is an invited talk from
a local researcher in this
practicing in the field.
Today, we're going to have
Mike Dodds from Galois,
talking about Continuously
Verified Cryptography.
>> Hey. So, I'm a Principal
Scientist at Galois.
So, I wanted to just to
start out by talking
a bit about what
Galois is because I
feel like we don't have
as much visibility
in this community
as I would like.
Galois is a research
and development lab.
We started out in Portland.
We've got sort of
three locations.
There's about 70 people there.
There's a really
ridiculous number of
PL people in Galois
at the moment.
I think it must be
sort of 40 or 50.
It's really a very large
number of people,
and we are like a PL meets
applications shop really.
So, lots of programming
languages, lots of analysis,
lots of verification, lots
of security and crypto,
and we tend to use
these kind of things.
So, symbolic execution
and model checking.
Our history is really in Haskell.
So, our founder was a guy
called John Launchbury.
He was like very influential in
the early design of Haskell.
But we also branched out
into all sorts of other
programming languages as well.
So, and just to give you an idea
of the people who work there.
My background is, I was
a professor in the UK,
mostly interested in separation
and concurrency relax memory.
So, I suppose that
I'm going to be completely
honest with you,
we're recruiting at the moment.
So, you guys should apply.
But anybody who would
write a POLP paper,
or PLVI paper, or a CAV paper,
totally welcome to Galois.
We do a lot of that research.
The sales pitch of size.
I'm going to talk a bit about
continuous verification which
is a very cool projects,
which actually, I was
mostly not involved with.
I'm going to talk about
other people's research today.
So, actually the people
who did this,
were these awesome group of
people I think there are
a couple of missing off here,
because I couldn't find pictures.
Particularly, Byron Cook over
the far side there
and his group at
Amazon and a whole
bunch of people
in Galois including
Aaron Tom who stays
today and Erick Mullen who did
some awesome stuff who is
around here somewhere.
Let me just kinda give
a bit of context.
I find very exciting is
that static analysis tools
are starting to get out there
into the real world and
people are starting
to use them in
software development workflows,
and that's very, very cool.
Google started to use
a tool called ErrorProne,
they run that on
pretty much every commit
that goes into their repository.
So, for Facebook are also,
they have this magnificent
tool called Infer,
which I'm very happy to say
is a separation logic tool.
It makes me very,
very sort of warm and
fuzzy inside to say that.
Amazon have this tool
as well which
is what I'll talk about today,
which is using a tool called Saw.
This is really part of their code
quality process, really that.
>> Microsoft is been doing
that since 2000 [inaudible].
>> I'm sorry.
Yeah, and Microsoft is here.
I suppose the thing I wanted to
distinguish is really
that they're doing
it as part of
their DevOps process
and I don't know how much
Microsoft to do that.
But let's talk about that.
Okay, I feel like.
>> Twenty years ago.
>> Okay.
>> Sorry. That's
understandable. I apologize.
But the motivation
here is really,
they want to sort of
improve code quality.
They want to, and so
there are lots of
things that companies do
to improve code quality.
They have revision control
and testing and peer review,
and big companies are just
starting to use task that
can ask this as part
of that pipeline.
That's very cool like
programming languages ideas,
analysis ideas are
getting out there.
The thing that
Galois got into was
the Amazon verification story.
So, this is something where we're
doing proofs of
correctness waiting on
core infrastructure and
we're doing it using
a symbolic execution
tool called SAW.
Now, I'm going to tell you TLS.
I don't know how to explain
to you why TLS is important.
But we're doing something on
an implementation on TLS.
This is a bit different
in flavor from Everest.
Because instead of
having out from
sort of from first principles
implementation of TLS,
we're gonna go to Amazon's
existing TLS implementation
improved and things
corrects about it.
Amazon TLS implementation
was really inspired,
it's a new
implementation but it's
not designed particularly
for verification.
It was inspired by
all the horrible vulnerabilities
that people have finding.
So, it dropped some
of the insecure and
less secure features.
It's much smaller.
It's only, for example OpenSSL is
70k lines of code and s2n is
only six K. The component
that I talk about
verifying today is HMAC,
which is the keyed-Hash
Message Authentication
Code implementation.
So, HMAC is really
the thing that provides
the signature for
the message when you're
sending it over TLS.
It gives you authenticity
and integrity and actually
the really nice thing
about HMAC that makes
it very amenable to
verification is that
the specification
itself is really tiny.
This is pretty much what
you see in the RFC.
The RFC is like a bunch of
boilerplate discussion
and there's like
one line of math which
tells you what HMAC does.
So, that's makes it. That's very
good from us that means
we can write it down.
So, what we'd really
like is to sort of
take this HMAC
specification which is
concise and easily auditable
and we'd like to connect it to
the C HMAC implementation
which is fast, interoperable.
It's also got some
structural differences
from the way the HMAC
is defined here.
So, here's the summary
of the approach.
We're going to write
a formal specification.
We're going to write some
scaffolding to bridge that gap.
Then we're going
to use some tools.
Then we're going to increasingly,
we're going to integrate into
the CI development environments.
The first step is to write
a formal specification.
It turns out that,
because we've already
got a specification
language for crypto,
we can just write it down
pretty much straight off.
Here's the roughly sort
of the formal
specification and here's
the specification in crypto which
is Galois specification language,
and you can see that, they're
very sort of modular syntax
that's pretty much identical.
That means that we can use this.
We can also use this to do
things like generate tests.
We could use it to
synthesize code.
There's two things
that we've done
with crypto in the past.
But in this case, we're using
it to verify existing code.
Now, we want to connect
the HMAC specification
to the C HMAC.
That's more of a
challenge because
we really want to
know that these two things
really do the same thing.
We're going to have to build
some scaffolding to bridge
that gap because there's
too big a gap just
to do this off.
You can think of
this as kind of part
of the proof process.
We're going to do this by
layers of abstraction.
We're going to have
high-level crypto code.
Then we're going to have
some low-level crypto code
that sort of mirrors
the structure of
the C code and then
we're going to have
some production s2n code.
When we get to this point,
we're going to
incorporate s2n data
structures and the s2n
API and we were going to omit
some gnarly features like
pointers and memory allocation,
level of optimizations.
>> So, once you've
got this pipeline,
once we've built
our high levels spec
or low-level spec in the s2n,
connected to the s2n,
we then going to want
to actually show
that those two things are
related to each other.
So, the tool we're going to use,
is something that Galois had been
using for a very long time,
which is a tool called
The Software Analysis Workbench.
It's really a
symbolic execution tool.
So roughly, you can
think of it as saying,
you take, I think,
you take artifacts
and another artifact,
and you turn them
both into SMT things
and then you compare
those two things together.
You use symbolic
execution to build
those SMT comparable things.
And then, we can just
check it in an SMT solver.
And so then, I should say,
by the way, one way
that crypto code,
we get away with making
our lives easier,
is in this case all this
crypto code doesn't
have loops in.
So, we're just talking
about loop-free code.
So, we check it out as
so and it tells us whether
it's correct or not.
Then we're going to, then
once you've done that proof,
we want to integrate it into
the development environment.
So, that means that
we want to have
continuous integration.
Now, why do you want
continuous integration?
Well, you wants it so that
nobody makes any mistakes.
So you don't want new mistakes
will be introduced
into the system.
So, we've set it up so that
the proofs run automatically
on the code changes.
So that means that when you
commit a piece of code,
then you can't commit it to
the repository if
the proofs don't pass.
And the proofs are
independent of the C code,
they just depend
on the interfaces,
and that's because you're
compiling both the code
and the specification down
to this SMT representation,
which you then basically,
you're using the SMT to compare.
The proofs are reasonably
easily adaptable.
I mean, this is obviously,
a somewhat moveable,
but we're not talking
about having to do like
a cock proof in order to
like get this to go through.
If the function body changes,
it's very likely
they're going to be
no changes to the proof,
and that's because
you're just shuffling
the internal change,
behavior of the code.
If the interface changes,
you're going to have
to change the proof,
but it's probably not
going to be too big.
And if the cost
structure changes,
there are going to be some small
changes around the edges.
And then, we've got and
this is very very nice,
so we integrated into
Amazon Web Services CI system.
So, that what that means
is that every time
you commit something to s2n,
it runs the build, and
the build tells you that
I pass the tests.
So that means that,
what are we on?
like 900 commit?
No, not that many.
How many commits are we
on now? I don't remember.
>> I'm not sure.
The arrays is definitely,
it's around 1000
probably or something.
>> Yeah, it's around 1000
commits or something. So now.
>> Let's go ahead and put you
through Travis jobs fast.
>> That may not be, that may
just be for the whole
of s2n though.
Yeah. All this stuff runs.
It means that if you tried
to commit something,
that would violate
one of these proofs.
Then, then you would
get an error and you won't
be able to commit it.
>> Quick question on
the previous slide.
>> Yes, sure.
>> So when you say
prove independence,
so you don't have
a loop invariance or anything.
So, when the structure
of the curve changes,
if you have a loop invariant,
you might need to change?
>> Yeah, that's
right. Yeah. So, I
mean, if you wanted to.
The idea with this whole thing
is to like focus on
really core critical code
and then just
like simplify as many of
the issues as possible.
But there are
some situations where,
so SAW doesn't really do loops.
>> So it's just doing.
>> Its just doing pathways.
Yeah, exactly.
Yeah. So, the reason that we say,
actually I'll talk about
this in just one second,
because actually
the whole pipeline
is a bit more complicated
than what I mentioned here.
We have a CUV paper,
which will be a CUV'18.
So down this end,
we've got proofs with
SAW, mostly automatic.
At this level here, we've
got the incremental API.
So, what that means is it's
got a bit more structure,
and it's actually allowing
you to do things like
keep on adding messages
and then finishing up.
So to do this connection
between the monolithic
and incremental API,
we need a little bit of
cocked to do the induction,
and then up here,
we've got proofs of
indistinguishability from random.
And to do that, we
have a connection
between our high-level
specification
and some work that
Lana Beringer and other people
did on proving the HMAC,
that the crypto specification
of HMAC is actually correct,
in terms of its high-level
security properties.
So we go ahead and just do this.
We did TLBG, we
did TLS handshake protocol
so state machine proof,
and we're working on some of
the rest of these things.
So, crisis pausing for
various kinds of issues.
So basically, the work
is ongoing with us.
I mean, I think that Amazon,
the way that we've approached
this is that we want to
pick the high-value cases first.
And we are about
to get kicked off.
You're making
a ominous face, I see.
Anyway, yeah,
this thing is feasible,
which is very exciting,
and it's also something
which is I think becoming
more acceptable.
It's something which is becoming
applied in to the contexts,
which aren't Microsoft,
which are of
course ahead of
the curve on this.
And it's now being
applied in industry.
So it's there's lots of
exciting stuff going on
in this area, I think.
>> Okay. Well, let's
thank the speaker.
It's great to see these
back to back talks,
really looking at the theory
and the reality of how
to do this at scale.
It's really wonderful
work. Let's have
a couple of questions. Nick.
>> How do you boil the TLS
handshaking into loop-free code?
>> Aaron, do you want
to talk about this?
>> So, it's just proving
that there's one function
that handles the single
state machine translation,
and we're just proving
that that always takes
a state to a legal
Successor-state.
So, that function itself
doesn't have any loops in it.
It just gets potentially
called from a higher
level thing it does.
>> And what do you prove
by the whole handshake?
>> So, we're just
proving that that
function matches
the handshake machine.
So, we don't prove
anything outside of that,
so far. We might in the future.
>> One other question, yes.
>> What is the business case for
Amazon to implement something
like this in their pipeline?
Is it quantified in
terms of number of
bugs that are
causing downtime or.
>> I can't speak Amazon.
I think that what I
hear from talking to
people in industry,
is that they are
primarily interested in avoiding
bugs instead of
increasing assurance.
I think that there's a lot
of people in the industry
who are interested in like
moving faster as well.
It's a coin of phrase,
so they want to reduce
the amounts of
assurance that they
have to do for these systems.
And you can imagine that
for something like TLS,
you would like to be
able to develop it
rapidly or rapidly patch it.
If you're going through
the normal assurance process,
you'd have a very long process
to be sure that
the system was correct.
But I think that these tools
are really about
allowing people to build
more confidence in
the way they commit.
Which is an interesting switch of
mindset between I suppose
the maybe safety critical
applications of formal methods,
where really
the whole point is to
achieve of very high level
of assurance.
And somewhere like, I don't know,
somebody like Facebook
or Amazon or Google,
where really what
they want to do is do
as little assurance as possible
but still achieve
good quality code.
So, I think that there's
a maybe difference in mindsets,
but I think I think that's
my broad understanding
of why these tools
getting more uptake.
>> Okay, and with
that, we'll have
to defer more questions,
but let's thank
the speaker again.
All right our next speaker is
Sarah Chasins, from UC Berkeley.
She's a little bit far,
not exactly local,
but she collaborates with
prosthetic and she's going
to be talking about Helena,
a web automation language.
>> Hi everybody, I'm Sarah.
I'm going to go ahead and
actually start by talking about
how many people want data that
went collect off the web.
And I think it's a ton right now,
but I think it's going to
be even more going forward.
And in addition to
this difference in scale,
I think the composition
of the folks who are
interested in this is
going to be changing.
So right now, the people who have
the tools to access web data,
it's mostly going to be
people who can code.
But going forward,
I think a lot of
non coders are getting
interested in this stuff,
and I'm going to go
through a couple examples.
So, these are all examples from
our current social
science collaborators.
Basically, these are
questions that they're asking
based on information that
they can find on the web.
There are a lot of
different reasons that people,
I'm focusing on mostly
are social scientists,
and there are a lot of
different reasons why
they might want to
collect these large
data sets from the web,
but there's not a lot of
support for them to do this.
So, we're breaking
it down here into
what the coders can use versus
what the non-coders can use.
There's a ton of stuff
in that coders box,
all the web scraping
libraries that you might
already be familiar with.
At non-coders, the options
are more limited.
Over here, you really
need to be able to
actually reverse
engineer that webpage.
You have to understand
the DOM internals.
You have to understand
how the web pages
actually interacting
with the web server.
That's the unifying
thing about all of
these libraries and we can't
expect non-coders to do that.
So, for non-coders,
you have some options.
You can hire someone
who's actually going to
sit in front of that browser
and copy and paste stuff,
sounds crazy but it
actually happens.
You can hire a coder
to use one of
the tools from the other box,
another thing that
definitely happens,
but it will require
some more resources.
Or you can use some of
the recent programming
by demonstration tools that
have come out and in particular.
I'm going to be
focusing on the tool
that we've built called Helena.
Basically, this is the structure,
and demonstration goes
in, program comes out.
Maybe the user interacts
with the program,
does some edits, and then they
can use that to
collect their data.
We'll do a really
quick video demo here.
Basically, the idea with Helena
is that you are going
to demonstrate how you
would collect the first row of a
relational or a
multi-relational dataset.
So, here we've opened up
our Chrome extension.
We're going to gather
information about
the first author from
a list of authors.
Then for that first author
we're going to grab
information about
the first paper in
that author's list of papers.
That gives us the whole
first row that we want.
At this point, Helena is looking
at the web pages and
trying to figure
out if there are
any relations for which
we might want to
repeat some things.
It's not that we interacted
with the first author and it
decided to add the loop
to go over all authors.
It's going to do
the exact same thing for papers,
at which point we
have the full script,
we can go ahead and run it,
and we're going to
watch it actually
happening over here
in the web page.
On the left we'll start to see
the data as it gets collected.
So, at this point, now
it's collecting it.
That's how we write
a web scraper by
demonstration and
we can use this.
If we let this keep going,
this is going to run for
three three million rows.
This is really big
data sets that we're
collecting based on
really a pretty quick
programming example.
Hopefully, I can make
the clicker work to get
us to the next slide.
We gave this to
some programmers because we
wanted to see if this had
some advantages over more
traditional web programming and
we couldn't exactly bring
in our social scientists
and ask them to use Selenium.
We compared the more
traditional web programming
language Selenium
with Helena and it
turns out, yes,
with Helena we're getting
time from first use
to successful task in
about six minutes,
and it turns out if
you are trying to use
a more traditional web
programming language
you're probably
going to time out at
60 minutes unless you
have prior experience.
We're getting a lot
of advantages.
Basically though,
we then want to go
from just that first output draft
of the script to how can we
make these scripts which
are going to run for a long time?
I mentioned three million rows.
That's going to run
for hours and hours.
We have to make them robust.
We have to make
them actually fast.
We're going to ask
these end-users
to have do some things
that we wouldn't
normally ask from the users of
an end-user programming tool.
In particular, I'm going to
be talking about skip blocks.
I'm going to start with
a motivating example.
We're working with
a team of sociologists,
and what they want to do
is every single night,
scrape all of the apartment
listings on Craigslist in
order to give this data to
the City of Seattle. They
have a contract with them.
We were wondering, what's
going to be the issue?
What's going to be the problem?
We thought it would be
some subtle algorithmic thing.
No, it wasn't anything like that.
It was that the
undergrad that they
had running this every night
on his home laptop
connected to his home WiFi.
In the middle of the night
the WiFi would go out and in
the morning we'd have
to start and would
go back and do
all the same work again.
So, that's pretty annoying.
Then it turns out that
even when he wasn't
getting these WiFi failures,
we were still repeating
a lot of work just
because of how much churn
there is in this data.
So, when you load up Craigslist,
it's starting at the beginning of
all the results and then it's
indexing into the current list.
So, by the time you get to
page two of your
Craigslist's data,
the things that were
the end of page one,
if there have been
new listings those have
already been pushed onto
the head of page two.
You might end up going
back and rescraping those,
and that's pretty silly.
We don't really want
to do that. We were
wasting hours on that.
So, what we're going to be
talking about here are
problems like these,
problems that we as
the scraping script
can't actually prevent.
The network's going
to go down sometimes.
The computer that
you're running it's
going to crash sometimes.
You're going to have
to be able to handle
failures gracefully
one way or another.
Then there's also data changes.
We can't control what
the web server is going to do.
If it wants to change the data,
it's going to change
the data and we just
have to be able to handle that.
Then, there are
a couple other final
interesting problems
along the way.
What if we want to do some
longitudinal data collection?
We might want that
to be incremental.
If we've scraped
three days worth of
data this week and it took
three days to run that script,
we don't want it to
take another three days
to collect the new data
a week from now.
Let's skip over the ones
that we already
did and just get what's new.
Also, sometimes
these are just slow.
We've had scripts that
will run for a week.
That's pretty long, we
might want to parallelize.
Then it turns out
also that sometimes
rate-limiting that websites
do is based on IP.
At that point, you might
want to distribute it
across multiple machines and
this is an interesting
and fun side problem.
These are the things that we
want to be able to handle.
In fact, we want our end users
to be able to handle it.
We have a lot of constraints
on what they can do,
but we also want to make
these really robust scripts.
Basically, even though these
look like really
different problems,
all we really want
to be able to say is
just don't repeat the same
stuff that you've already done.
It's complicated to say
what is the same stuff
because as I just said the data
is constantly
changing on the web.
You can't expect it
to remain the same,
you might get a different
set of objects,
you might get objects that
have their attributes changed.
How do we actually
say what's the same?
What we're actually going
to do is ask the user to
introduce a construct
called the skip block
and that will be used to
actually tell us how to tell
whether two objects
that we are seeing in
different places are in
fact the same object,
and also, to associate
the code that
will actually operate
over that object.
So, basically, the idea is
that we will keep a commit log.
And if we've already
seen an object that
has this key attributes that
tell us it's the same object,
that means that we
can go ahead and
skip it the next time we see it.
If we have never seen it
before, then we'll go
ahead and execute
that associated code.
The good thing about this
is that the user does not
have to reverse engineer the DOM,
the server interactions,
any of that stuff.
They're just reasoning about
the output data which
they already know
about and think about.
Basically, here's the text-ify-ed
representation of that
same authors and papers
program that we
talked about before.
Let's go ahead and add
a skip block into this.
Here, what we're saying is if
you have already seen
an author that has
the same name and
the same institutions and
author we've seen in the past,
that means you can go
ahead and skip it.
That's just assume
that's the same one,
don't bother to do
that code again.
So, basically, here are
the key attributes,
the author name, the
author institution.
That's what we're using
to tell us whether it's
the same object and then we have
the block of code that
actually operates on it.
This is a durable log.
This is not just a
per run a situation.
So, when we come back in
a week and try to
reserve the same author,
we will know to skip over.
This, in any run that
I've bolded here,
that's the default behavior,
obviously that's something that
we might want to manipulate.
So, that's a fun and
interesting side question.
Basically, how we tackle all of
those disparate problems
with this one thing.
So, extrinsic
failures, basically,
if the network goes down
and we have to restart,
let's just skip over
all those ones we already did.
Pretty straight forward there.
For data changes, same deal,
if we're getting
repeated objects because of
how much Craigslist is
updating its fresh data,
we can just skip over
that when we see it.
For longitudinal scraping,
this will automatically
incrementalize.
So, everything that
we scraped this week,
we won't have to scrape it
again when we come back
to it a week later because
it's in the commit log.
If it's just slow, each
of these skip blocks,
each of the individual
objects identified
actually ends up being
an independent sub-path,
so we can distribute that across
fellow workers and if
we're handling captures,
then we can just do that
across different machines.
There are a lot of
fun design questions
about how we should
do the skip blocks.
I'm not going to be able to
tackle most of these today,
but definitely come talk to me.
I think one of the really fun
ones is how should we split
those independent
sub-tasks across
a bunch of different parallel
or distributed workers.
It's a really cool
design problem.
I'm going to go really quickly
over a couple of results.
This ones mostly just
interesting because it
highlights just how much data
changes on the web.
It turns out that
just in a single run,
if you add these skip blocks,
you can get up to
about 2x speedup.
Just because there's
that much new data coming in
all the time. I thought
that was interesting.
This is what we see
if we parallelize.
You can see we're getting
pretty close to the ideal line
with the exception of
this Twitter benchmark
which ends up being CPU bound.
Turns out if you go to
thousand tweets
into a page it gets
to be a really big page
and it's a lot to process.
But if you actually
distribute that across
multiple machines then you get
right back to
that ideal line so it
gets to be exactly what
you expect and want.
We also looked at whether
our end-users are non-programmer
identified people actually can
use this and the answer is, yes.
So, we kept the reasoning
for this construct at
such a high level that end-users
can learn how to use it.
People who tell us that they're
non-programmers can
learn how to use it in
seven minutes and they can
add an additional skip block,
each new skip block
that they might add to
their program in
about a minute, 61 seconds.
So, you're very satisfied
with those results,
and with that I would love
to take any questions.
>> We have time for questions.
>> Seems like skip logs are
generally useful concepts.
How would you put this into
a more general purpose language?
>> Yes. We spent a while
thinking about how would we put
this in a Python
script or a Python,
if they're using Selenium
or some other library
to do their own scraping
in a more traditional way.
We looked at how
we could do that,
it turns out it's not that hard.
I mean, it's a pretty
straightforward concept
as long as you've
hooked up everything
you need for keeping
that durable commit log.
You're pretty much okay.
>> Who are you partnering with?
You said that you
were partnering with
people that are non-programmers.
>> Yeah, we have about,
at this point it's between 8 and
10 teams of social scientists
who are collecting
a bunch of different data
with us. It's really fun.
>> Has that shown up
in the publications?
Do you have certainly experience
report related to it?
>> Those benchmarks
that we showed were
basically tasks that we took
from their target tasks,
but we haven't done
the experience report.
I think there might
be a question.
>> Is the skip blocks a form
of computational thinking?
I mean when you talked
to non-programmers,
are they able to reason
about skip blocks the same
way a computer scientist
or a programmer would?
What has been
your experience with it?
>> I don't know if it's
exactly the same way.
So, the way we explained it in
the basically the
tutorial that we gave
them that took them
seven minutes to go through,
was to think about if you were
seeing the output road
that represented each
of these objects.
What would you use to
decide if two of
them were the same?
So, that was how they
were thinking about.
I don't know if
that's the same way
that a programmer
would think about it.
That is a really
interesting question and I
would love to study that.
>> Question here.
>> If you're dealing with
catches on the back that
probably if this is semi
adversarial relative to
the websites and so forth.
So, you're writing
scaling problems.
Do you have a variant advisory
that basically you essentially fire
that through them [inaudible]?
>> Yeah. It turns
out it's really easy to put
this on Amazon Web Services.
>> Okay.
>> Yeah, we do.
>> That works out well?
>> It works out beautifully.
>> Just to figure out
how to operate in that.
>> So, there is more
of this setup work.
I think it's definitely
going to be easier to
tackle if you are
a programmer. Yeah.
Getting the AWS UI to
work it looks tough.
>> Cool.
>> Do you have a question?
>> I was just going
to say, do you
have any protections to
prevent people from,
who are not programmers using
the AWS to DDoS [inaudible]
>> I mean, as with
any programming tool,
this is going to give people
the option to write
programs that are bad.
>> Also, DDoS their
Amazon Web Services credit card.
>> Yeah, that's true too.
>> All right. With that I guess
I'll say thank you again.
Okay. That's it. All right,
so I have the pleasure to
introduce our next
speaker, Bill Zorn.
He's going to be speaking about
a concept called Sinking Point.
I think he get
his laptop to work.
There you go. Okay, he's
going to be talking about
Sinking Point and I guess
we'll find out what that is.
He's from UW, and he's
also my son, so yeah.
>> Thanks dad. So today,
I'll be talking
about Sinking Point.
This is a project that I'm
working on with Dan Grossman,
who's my adviser, and
also Zach Tatlock.
I think he's actually
not credited in
the program because advisers
have been changing,
but anyway let's
get right into it.
So first, let's talk
a little bit about
good old IEEE 754 floating-point.
It's fast, it's portable.
It's great because it means that
just about any
programming language
these days is at least as
good as your K84 calculator.
IEEE 754 is completely specified.
Each operation is
correctly rounded,
which is to say we
tried to do it with
as much precision as possible and
usually this leads
to behavior that is
close to the behavior
that we would
expect from real numbers,
but sometimes not, right?
It's a very well-known sort
of challenge with floating-point
that reasoning about
when the behavior is
like real numbers is hard.
So, to give some
concrete examples,
if we fire up good old Python,
we import math and we run
an expression like this.
So here, I've got Pi.
Obviously, I'm adding and
subtracting
the same number to Pi.
So, I should get Pi back,
right? Which is 4.
If I want to do something
slightly more complex,
like say an algebra,
and I need to solve
the quadratic equation here,
so we can put in some
arguments for a,
b, and c and get x.
We'll see that as we
make a very small,
keeping b and c the same,
our result seems to converge
to a negative c over b.
But as we become even smaller,
then we'll see it starting
to diverge again and
then eventually we find
that there's a 0 at 0.
So to put this on a graph, right,
this blue line is
the nice beautiful mathematical
function that we would
expect for x as a function of a,
keeping b and c the same.
Those black points
are the behavior from
IEEE 754 floating-point that we
saw from the points on
our previous slide,
if we think about this in terms
of the actual parabola, right?
So here's with a
relatively small a,
remember we're solving for the 0,
so the 0 crossings.
What our floating-point
results seem to
indicate is that as
we make smaller,
our parabola is going to
become more and more of the line.
Eventually, 754 says that if we
were just to make if
we were just to make
this line a little bit
straighter,
eventually it would go
through the origin,
which isn't right.
So, here's another
really fun function.
This is sin of 2 to the x.
As we get further
from the origin,
the two to the x part will
grow faster and faster,
so the wave will oscillate
faster and faster.
IEEE 754 is actually pretty
good at evaluating
this with doubles.
So, for 15 we get
the right number,
for I don't know, 25.3,
we get about eight digits
that are right.
If we give it a
ridiculously large number,
like say 60, we'll
actually get all the bits right,
which is kind of astonishing,
because if we deviate from
an integer even slightly,
we'll get a result that is
completely uncorrelated
with what we would expect
the real function to do, right?
So surprisingly,
none of these examples
are surprising, right?
floating-point is
fully specified.
It's just doing what
the specification says.
All floating-point
results, I mean,
all floating- point numbers have
the same amount of precision.
All results have the same amount
of precision and
it's left to the programmer
to determine if
those bits are actually
meaningful and
corresponding to the behavior
of real numbers.
So, I mean, if you're
a numerical methods guy
writing something in WebM,
this is probably
fine because you can
afford to spend a lot of time
reasoning about your code.
But in the general case,
where we just want to
do real arithmetic,
this seems kind of troublesome.
The idea of Sinking Point is
that rather than giving
back all those meaningless
bits all the time,
we will actually dynamically
reduce precision, right?
We're not going to be any
more accurate than IEEE 754,
but the precision you do git will
correspond to actual behavior
of real numbers rather than,
whatever the
floating-point format said
happens in these
weird cases, right?
The cool thing about this is that
we can do this with
very little overhead.
So, we only need to keep
around a few extra bits.
We only need to do
a few extra bit wise operations.
If you, for example,
build some hardware,
I think that it could
go just about as fast
as IEEE 754 floating-point.
So, let's go back over
the same examples.
Here's what Python said.
What Python really meant
here by four was like
4.00000, right, exactly 4.
There's a 53-bit and
significant it's all zeros.
If we do the same thing
with sinking point,
we still get 4,
the answer is 4, but this is
saying for twiddle
approximately 4.
I'm not going to give you
any more bits here because
I don't know what they are.
If you use this few
bits to represent Pi,
4 is actually the
best you can do.
So, this number
makes sense, right?
If we go back to
our quadratic formula here,
we can see that a lot of
our digits are totally bogus.
With sinking point, we won't
even give you those digits,
and we also have
an interesting notation here.
So, this is 0 at some exponent.
Zero can't have precision.
It doesn't have a significant,
but it can have an exponent,
and if you have an inexact
0 with some exponent,
what that really means is like
this is a number, it's small,
I don't know what it is, but
its absolute value is
less than two to that exponent.
So, this is a huge
exponent, right?
We're seeing a small number
that's less than 2 to the fourth.
That's not a very encouraging
0 for our parabola, right?
Then, meanwhile, I'm
cheating a little
bit here because if
you see a real 0,
it will actually give you 0.
I'm cheating because
since b is 2,
we can actually do the square
root of this stuff exactly.
If we were to put in like b is
2.1 and get an inexact result,
we would see 0 with an exponent,
but rather than being like 4, it
would be like negative 48.
So, it seems like that result
is much more encouraging,
if you really have a
0 there rather than,
I don't know somewhere
between negative 10 and 10.
Again, with sine, we can see that
in all of these cases we chop
off not all of the bits,
but most of the bits
that are bogus.
What's cool here is that
sinking point is actually
smart enough to understand
that 60 is a special case.
Because we can compute
that exponent exactly,
it won't chop off any
bits when they're good.
But in the cases where
your function gives
complete garbage,
it will just punt and say, "Oh,
I don't know what sign
isn't it in that range."
There's no precision, right?
So, how does this work?
You've probably learned IEEE
544 floating-point at
least once in your life.
I'm going to go over it
again, hopefully quickly,
and a little bit differently
than you've seen before.
So, if I want to represent
a real number with bits,
I can do that with
some integer significant
times 2 to some exponent, right?
So 5.25, is 21 times
2 to the negative 2.
Let's define two quantities.
We have p which is the number
of bits in the significant.
This is pretty straightforward,
it's just the precision.
Then we also have this quantity
that I'm going to call n,
which is the exponent minus 1.
So you can think of n like
if this number is inexact,
you can think of n
as like the index or
the place of the most
significant bit,
whose value I don't know.
So, all the bits below n, I
don't know what those are.
Those are my numbers exactly,
in which case they're 0.
What's cool about these concepts
is that we can actually use
them to explain how IEEE
754 rounding works.
So for doubles, we choose
some maximum p and we
choose some minimum n,
53 and negative 1075,
and we say in order to
round into a double,
if I have bits pass
my max p or or
bits with significance below
my minimum n, run those bits off.
So, that minimum n is
not a random number.
It's actually
the minimum exponent
the double can have
minus the precision.
What's cool about this is that
subnormal numbers just
fall out naturally.
I don't have to do
anything special.
Subnormals still have the
same n. We're just letting
the high bits of the
significant go to 0, right?
So, that's IEEE 754.
For sinking point, what we're
going to do is actually
determine what these values
should be dynamically
per operation,
so that we get rid of
the precision that we
don't know how to compute.
So, we need to come up with
some rules for addition
and subtraction.
We're actually going to be
bounded by n. If I have
two numbers and I have
some low bits and one
that I don't know.
If you think about it, if I add
more bits over top of those bits,
I'm not magically
going to learn what
those bits are, right?
The granularity of
the number that I know to
the least absolute place
is the granularity of the output.
So, we'll take among
our n inexact inputs,
choose the largest value of n and
use that as the rounding
for our output, right?
Then we're actually not really
constrained by p here, right?
So, we can say
if we want to just emulate
IEEE 754 floating-point,
just use it over
the maximum for the format.
So, for multiplication
and division,
it's the other way around.
We're not really
constrained by absolute n,
but we are constrained by p.
So, if I have two inputs,
then the precision of
the result can't be
more than whichever
inexact results
had less precision, right?
Then hopefully, powers and
roots are like multiplication.
This is still being
developed, so maybe not,
we'll find out in testing.
For more interesting functions,
Floor is kind of cool, because
the result is actually
exact, right?
As long as I know I'm in
between two integers.
If I have functions that
are periodic like F mod,
then my rules become
more interesting.
F mod is really
a subtraction, right?
So, we have X mod Y.
It's the remainder.
So, my dividend is
going to supply n,
because I'm using it
in a subtraction,
and then the other thing in
that subtraction is going to
be my divisor times some integer.
I can get the integer exactly,
so I can actually get
the effect of using the N from
that multiplied divisor by taking
P. You can talk to
me about it later.
And then sin is also really fun.
With sin, we're not bound by n,
but our precision is
actually bound by
the granularity with
which you know our input,
relative to
the granularity of Pi.
So yeah, these are the rules.
Still work-in-progress.
It's cool because these are
very simple rules and
they can give us
some confidence that we
have meaningful precision without
having to do
a whole bunch of real
analysis or anything like that.
Note that there's still
an approximation,
it's not a sound guarantee.
The idea is that, it should have
relatively low like
false positives, right?
If I say you have
bogus precision,
there's probably
something wrong in
your computation and hopefully,
I won't say any point well not
in most cases cut-off
is unnecessarily.
So, this is part of
a larger project
I'm working on called Titanic,
which is a tool for building
and reasoning about systems.
Oh, by the way this is
Neural Style Transfer.
This is a neural
network art, yeah.
Also thanks to Dan
for the titles.
His help has been
very evident here.
So, Titanic is a tool
for reasoning about
and building things like sinking
point, [inaudible] floating point.
I'm also interested
in other stuff,
like reasoning about performance.
How much accuracy can we
throw away to get speed,
and also exotic types like
whatever Google is doing
in their new TPUs or I
know Microsoft has FP8,
which looks kind of interesting,
and then I'm also
working on FPBench.
Titanic uses this centers
from FPBench.
So, it's another cool project
started by [inaudible] ,
Zach and Herbie guys.
Props to them but,
yeah, talk to me about
any of these things.
>> Okay, let's thank the speaker.
Okay, we have some time
for questions?
Let's serve it, Ross.
>> Is it meaningful to compare
interval arithmetic
with sinking point?
>> Absolutely. So, when I
say specifically that like
we're still on approximation.
We're not trying to
write a sound guarantee.
The sound version of it would
be interval arithmetic.
You could imagine wrapping
sinking point in something
like interval arithmetic in order
to return rather than one of
these fast rules
which is quick to
compute but not
necessarily sound,
like an actual
interval arithmetic
supplied guarantee
that already exists.
It's John Gustafsson's
unamps essentially,
and the problem is that
with interval arithmetic.
You have a lot of, if you
will false positives.
For any large computation,
it can be really hard to
figure out what
the interval is, right?
With sinking point, we're hoping
to do the opposite thing.
We're fast rather
than sound and we
have low false positives
rather than uncertainty.
But yeah, that's
a very good question.
>> If you don't know how much.
>> Empirically, no.
But that's the main
evaluation thrust here.
What I'd like to do
is have a large set
of benchmarks and
go through and say,
"Look, here's the number,
here's the real result.
Here's the number
of bits we give you
does it correspond to
the place where they differ?"
And then, I mean,
interval arithmetic
depending on the size of
the benchmark would either
give you no information
or also to be
somewhere in there so.
>> Okay. One more question.
Let see, how about Becker. Yeah.
>> So, you mentioned this
has been modest overhead.
Do you think you actually grow
faster than regular
floating point by I guess,
dynamically swapping
to a lower precision?
>> Yeah. There's a really
interesting question here.
I see no reason why this
can't be implemented
in a hardware.
I mean, I'm not a hardware guy,
I don't like design adders,
but it is kind of cool because
as we chop precision off,
we can do less and
less work, right?
There's also potential for,
if you had a type that had
a variable sized exponent
and significant,
you could get
much more dynamic range
from the same number
of bits, right?
Because you could have
very small significant numbers
with a huge exponent. So, yes.
There's certainly
considerations here,
and Titanic is
a general framework.
So, it would be good for
reasoning about ways
that we can throw
precision away in order to
get operations to go faster.
>> All right, let's
thank the speaker again.
So, just a very brief
anecdote about how
small our world is.
So, when I started at Berkeley
in 1982 as a PhD student.
My adviser was Bill Kahan,
who is the inventor of
IEEE floating point,
and when I went into his office,
he had a calculator,
and he would tell you,
"I can show you calculate
as much like we just saw,
that don't work correctly
in the calculator."
And he made that, he solved
that problem for all of us.
So, that's a great benefit
but it shows you that
these issues are timeless and so,
just keep working on them. Okay.
>> I was in the planet all along.
>> Yeah. Yeah. That's right.
It's in state.
Okay, with that, we are going
to introduce the next talk.
We have the speakers,
Stuart Pernsteiner,
and he's going to
be speaking about
Verified Extraction with
Native Types and he's from UW.
>> Thanks. All right,
so I'm going to be talking about
extraction of code
from proof assistants,
and so typically the way
this is done,
is you write some code
and approve assistance
such as Coq.
You verify it, and then,
if you want to actually
run this code,
the way you do that is you use
a process called extraction which
in Coq's case turns
your Coq code into OCaml code,
and then, you compile that
with the OCaml compiler
to get a binary that
you can actually run.
Now, the downside
to this approach is
that it's completely unverified.
The extraction
procedures unverified,
the OCaml compilers unverified.
You link a thing with
an unverified runtime,
and so all of
those nice properties
that you proved about
your Coq program,
you don't actually have
any guarantee related
if they still
hold on to executable that
you're actually going to run.
Now, more recently,
there's been actually
a couple different
projects working on
verified extraction for Coq,
so there's uf, which is what
I'm going to be talking
about today and
there's also CertiCoq.
These made some different
trade-offs in a few places,
but they have
the same general idea.
You take your Coq code,
compile it down to C or C
like language and then,
run that through CompCert
which is of course verified,
and uf and CertiCoq
are verified as well.
So, you can be highly confident
that whatever Coq program
you feed into this process,
the binary that you get out is
going to exhibit
the same behavior.
This is really nice.
There's a big improvement
over previous ways
of getting sort of
a verified binaries
out of Coq programs.
So for example, in 2015,
Andrew Appel had to
do this extensive manual proof
of equivalence between
a Coq implementation
of SHA-256 and
a C implementation which
could then be compiled
with CompCert.
Whereas with something like uf,
you can actually do
this translation
automatically, right?
It will automatically
compile the Coq version of
SHA-256 down to a C version
and it's guaranteed to
exhibit the same behavior.
But sort of not everything
is perfect here,
and so let's talk
about performance.
So, if you run a C version of
SHA-256 on this
little tiny input,
it runs in basically
no time at all,
uses two megabytes of memory.
If you run the uf extracted,
verified SHA-256 binary on
the same exact tiny input,
it runs for four seconds and
uses three gigs of memory.
So, that's obviously terrible.
So, what's going on here? It just
comes down to a question
of data representation.
So, in the C version of SHA-256,
when it wants to manipulate
a word of the input
or of the the sort
of hashed state,
it uses a real 32-bit integer,
four bytes in memory if it's
in the register, right?
Doesn't get much
simpler than that.
The uf extracted version uses
the integer type
from the standard
library which is what was used in
the original Coq implementation
and that implementation
looks like this.
The Coq represents integers as
a linked list where
every node of the list contains
one bit of the number.
So, this is clearly not
optimized for performance.
It has some nice properties
for reasoning but
no one was ever intending
for this code to
actually be executed,
and of course it's very slow
and uses tons of memory.
So, what we would, and so
if there's no reason that it
needs to be this bad, right?
We're now working with
sort of unbounded integers or
anything wild like that inside
of the SHA-256 implementation.
SHA-256, the algorithm itself is
defined in terms of 32-bit words.
The Coq implementation uses
this sort of refinement type.
It's an unbounded integer
but restricted to
the range zero to two to the 32.
There's only two to the 32
possible values of this type,
and so we should be able
to represent this type at
runtime using
a real 32-bit integer.
So, that's the idea of
the native types feature,
specifically we want to
wage a map Coq types
down to custom data
representations of the C level.
So for example, we'd like
to map that word type from
the SHA-256 implementation
down to C's int type.
Just mapping the
types is not enough.
If you've ever worked
with custom extraction
sort of changes in Coq,
that just mapping the
types doesn't improve
the asymptotic performance
of your algorithms.
You'd also like to map
certain functions down to
custom C level implementations
such as mapping,
addition mod two to the 32 down
to a real addition expression
at the C level,
and then a couple other
design constraints
when we were building
this feature.
You'd like to maintain all of
our existing
correctness guarantees
so that all has to
still be verified.
We'd like to make
it extensible to
new types and new functions,
and in particular,
the reason for this is that
updating all of these compiler
correctness proofs
is a lot of work,
and we'd rather not have to redo
that work if we want to add new
types or new functions later.
So, our design for this
is we did not want to
just add Int as a new type
supported by the compiler.
Since then if we came back
and wanted to add
double later on,
we would have to change
this definition further,
and this is used in basically
every part of the compiler,
you have to redo tons
of proofs again.
So, we did instead is we added
a generic native type variant
here as parameterized by
this native type definition.
This native type
definition is basically
a big record that contains
all the information
that the compiler
needs to know to
compile values or
variables of this type.
So, in particular for
this sha256 int case,
we have to give it a high-level
representation of the type,
which is this word type.
This is just a core
type as used in
the sha256 implementation or
to give it a low-level
representation,
which is just sees int type.
And then, we have to prove
a bunch of Lemmas about
the relationship
between these two types
and just general properties
about the different
representations.
And the benefit of doing
it this way is that we can
actually define more of
these Native Type definitions
as many as we want.
And no matter how many
of these we define,
we never have to
go back and touch
that type enumeration ever again.
Our approach to doing
the mapping of functions
is very similar.
Instead of just adding
explicitly a new variant
for integer addition,
we add a generic one that
supports any native operation.
It's, again,
parameterized by one of
these operation
definition records
and take some number
of arguments.
And I'm not actually
going to show you
the native operation
definition record because it's
actually much more complicated
than the one for types.
And in particular, it's more
complicated because it has to
support the semantics of
every IR use in the compiler.
So, every IR is used
within the compiler.
In the semantics without IR,
there needs to be
some definition of what
happens when you step
over a native operation,
which means we need enough
information in the record to
support all of those variants
of this step relation.
There are a couple
of different ways
that we could have done this.
What we chose to do
is instead of having
one function in the record
for every single IR
in the compiler,
we just have one function for
every single value
representation in the compiler,
and there's many
fewer of those than
there are different IRs.
Now, an advantage of doing
it this way is that it
just simplifies the definitions
of these Native Types.
If you want to add
a new one, you just have
less stuff to write
in that record.
It also simplify some of
the compiler correctness proofs,
so a lot of passes.
It turns out just use the same
value representation
on both sides.
And so, when you get to
the native operation case
of the simulation proof,
it's basically trivial because
the left side and right side are
doing basically
identical behaviors.
The downside though is
this doesn't give us
a way to let native operations
call functions.
So, for example, you
couldn't implement map or
other higher-order functions
as native operations.
And the reason for that is that
different intermediate
representations,
even ones that use
the same value representation,
actually have slightly
different ways of representing
program state or function calls.
And so, there's no way to provide
all the necessary
information to do
those function calls for
every IR within the compiler.
The big downside of
this is that we can't
implement recursive illuminators
as native operations.
So, you can't, for example, write
a coq function that
takes an int and run
some code for every value
from zero up to
n since that would
involve calling
a function from inside,
or you could write that
function in cock of course,
but you can't translate it
as a native operation
to, for example,
a Y loop in C. But,
we got all this working and
it is pretty extensible.
We were able to add
not just ints and addition,
but we also added all of
the normal arithmetic
operators and
comparisons you would
expect to see on ints.
Also, some conversions
such as int_to_nat,
which converts to the coq
just nat type which is
very commonly used.
We also added
the double-type with
the same arithmetic operations,
conversions, comparisons.
And finally, our native types
and operations do support memory.
So, we can define double arrays,
which are heap allocated and
some operations on those.
You can allocate
them, you can get,
and you can do functional updates
on the contents of the array.
The proof of burden
is not too terrible.
It's 65 lines roughly for
each type definition and
a bit over 200 lines
for each operation.
And that includes
some groups of operations.
So, for example,
all binary arithmetic operators
on ints are basically the same.
And so, those only get
counted once in this average.
Finally, in terms of performance,
we got some pretty big
benefits from this.
So, back on that sha256 example
where previously,
we were running for four seconds
and using three gigs of RAM.
The version that uses int,
this much more efficient
data representation
is much faster.
It's 50 times faster and
uses 87 times less memory.
That's it. So, I'll be
happy to take questions.
>> All right, questions?
>> So, 87 times less memory
is really good,
but I note there's still
20 times more memory
than the C implementation.
So, what would it take
to go the next 20X?
>> Yeah. So, we haven't profiled
it in very much detail,
but I think the main source of
memory allocations is allocating
closures to make function calls.
A lot of those closures don't
really need to be on the heap.
Because for example, you
allocate the closure,
make the call, and then
never use the closure again.
So, you could allocate
that on the stack,
but we've not implemented
that optimization yet.
>> Any questions?
>> How tied this
is to targeting C
versus some other language?
>> Not very. It's
just C already has
a verified compiler that we
could use as the backend.
And with that as our backend,
we of course chose
C types and such as the target
for the Native Types.
>> Is it free memory like
if I have a long
running application,
will I just leaked
memory forever?
>> Yeah.
So, we don't actually
have a garbage
collector as you can say,
the oof runtime, but oof
doesn't really have a runtime.
There's no garbage collector,
which is I guess
another contributor to
the high memory usage.
We do have some
unverified slab allocator
that can do some region-based
memory allocation.
So, you can do some operation,
and when you're done
copy out the result
until all of the intermediate
server allocated,
but that's as good as
it gets right now.
>> It's kind of a little bit
of a vague question,
but I think one of
the arguments you're
making is a reduction
in TCB argument by saying you can
remove your Caml
compiler from TCB,
but you're Caml compiler is
already in your TCB, if
you're working with coq.
>> Yeah.
>> But, I see that you're
gaining something,
but I'm not exactly
sure how to pinpoint
exactly what
the reduction in TCB is.
>> Yeah. So, our argument
there is that when you link
the OCaml compiler
into your extracted
or the OCaml runtime into
your extracted code,
you have to rely on
the correctness of
that runtime under every input
that you feed to that code,
whereas when you use
the OCaml runtime
as part of your
compilation process
or your verification process,
you only have to trust it to
run correctly on your code,
the code that you're feeding
through the compiler.
You don't have to
worry about what
if I'm exposing this
as a web service and
someone is sending me
adversarial inputs
trying to break
the OCaml runtime.
You're only feeding it code that
you have written, essentially.
>> All right, let's
thank the speaker.
