hi there i'm really happy to be
 to have the opportunity to send this
video to this
uh to your purple seminar  my name is
nik swamy i'm a
researcher at the rise group at
microsoft research in redmond and i'm
going to
tell you a bit about some work we've
been doing in the past few years on
programming with proofs for high
assurance software
so what do i mean by high assurance
software: there's
software in a number of domains for
which the
correctness and security requirements
are at a fairly
high bar where system functionality
really depends critically on getting
this code right
for instance in cloud infrastructure
things like
the virtualization technology used to
secure the platform
for instance at microsoft it's microsoft
hyper-v
securing the azure cloud or
in secure communications things like
crypto protocols like tls or
wire guard a kernel vpn and linux
and in other domains like financial
technology or
even e-voting software plays a critical
role
in ensuring system correctness and
security
and getting this code right is really
important but
of course as we know there's bugs
everywhere in
in software written today and
this manifests itself in a variety of
things
from say in the financial domain from
stock market flash crashes
to
confidentiality leaks in in cloud
providers
so what we've been looking at is you
know
a long-standing agenda can formal proofs
about software come to the rescue here
and allow us to build this kind of
software with really high assurance
now i'd be thinking well computer
scientists have been talking about doing
formal proofs about programs for
many decades and is this
can this really work at scale and i
think we've made a lot of progress in
the last decade or so
decade and a half where as a community
we've begun to
be able to apply our program proof
techniques
to genuinely large systems and at
msr
in the last few years we've had this
project called project everest where
we've
been looking at building
verification libraries and a stack of
communication
components with full formal proofs and
these days we have in our ci system
we routinely build
multiple times a day a system that is
about
600 000 lines of code and growing with
full formal proofs of correctness and
security
and that's at that scale where things only
work if they're modular
and we do the modular
proofs
at a large scale and keep
chugging along so uh we're beginning to
approach a scale where
um i think it's it's not uh infeasible
to think that
in a year or two this will be a couple
of million lines of code
so using our our tools
the tool that i've been
with collaborators been building for
several years now is a
is a programming language and program
verification tool called f star
and f-star is the is the is the
verification engine behind our
verified code base
and our tools have been uh our code has
been deployed in a number of places so
in a sense proofs and programs
developed in f-star are being used by
billions of people today even if they
don't know it
so for instance in in
hyper-v we use code produced by f-star
for
securing communication between host and
guest
in the secure communication space we are
we have proofs about the transport layer
of quick
and tls and uh our verified crypto runs
in a number of places including firefox
embed tls
signal the linux kernel with wire guard
and
and in number of other scenarios
we have verified merkle trees for
enterprise blockchains in the financial
technology space
several third parties use our tools for
proving their code correct and in
e-voting um
microsoft has an uh an open offering
called election guard and sdk for
building
auditable uh elections
and uh we provide the crypto for them
uh for that sdk
and even in other domains that things
like verifiable multi-party computations
they're proven correct things like
proven correct high performance
verifiable key value stores
developed using our our tool chain so
this stuff is real and uh program
verification and program proofs that
are at a point where
this can be applied to real systems and
genuinely move the needle on improving
the security and assurance of the entire
system
this is based on decade a decade
plus work on on this our tool chain
building on the research that's come out
of the pl community
over several decades
for instance with star itself
is based on probably a 10 or so
popple papers we've repeatedly
improved the system trying to address
the needs of our applications
and simultaneously as we improve the the
foundations and the tooling we've been
developing
applications to to to drive
the tooling in a way it's been quite
synergistic um
and uh f-star broadly is a is a
programming language that
looks a bit like o'camel or f sharp
but it comes with a type system that
lets you do that that looks more like
cock or agda
in that it comes with full dependent
types and
it's it's backed by smt automation to
help you to
proofs in many cases
by smt rather than by by tactics
although you can use tactics too so you
write your program in f-star and you can
compile it by default to ocaml or
f-sharp
but we have one thing we've been using
f-star for as a lot
is as a framework to embed dsls
domain-specific languages so we have dsl
embeddings
in f-star for for several dsls one of
them is called low star
which is a dsl for c like programming
embedded in evstar
and if you write programs in low star
you can do proofs about
against a low level memory model dealing
with things like
manual memory management and low level
representations of memory stack
allocation
these kinds of things and extract your
code to
c or to web assembly we have
a dsl called veil that is intended for
assembly level programming in f star
this has been really crucial to get high
performance crypto both low star and
veil high performance crypto out of our
tool chain
uh we have a dsl dedicated for parsing
and serialization
called evoparse which we used
extensively
and that too produces a c code
and recently we've been kind of the
frontier of our of our tool chain these
days is
exploring concurrent separation logic
embedded in f-star for
concurrent and distributed programming
with proofs
and we are considering uh additional
back-ends potentially for instance
extracting our code to rust is something
we've been thinking about
recently this is based on an
a large team of
of people and collaborations across
many institutions
at it's too much for me to
just to speak to everybody here but uh
let me just highlight our
my colleagues at msr redmond chris
jonathan and tahina
and asim at msr india who's been doing a
lot of work on the tool chain as well
but this is collaborations with msr
cambridge indriya
cmu edinburgh rosario in argentina and
several visitors and and interns and
visiting researchers and so on um
that's a picture of us
a year or so ago at uh maybe two years
ago at
cambridge we welcome collaborations
and and contributions of this kind of
thing interests you
please reach out
so i thought maybe just in a minute or
two i'd give you a very whirlwind kind
of
taste of the some of the techniques that
we use so broadly we've been building
many kinds of secure communication
components and
broadly the structure of such an a
component is that there is an
application
it's trying to communicate some
structured message say
a key value pair across the network to
appear
so what it does is it uses a message
formatter turns this
structured message into some binary
formatted message
which is then signed and encrypted and
then
there's a second a wire format that
happens dictated by some protocol
a protocol format and then a wire format
encrypted message encrypted sign message
is sent across the network
on an untrusted network and the other
side reconstitute
reconstitutes the the high-level message
by
by parsing and decrypting and verifying
a signature and so on
so having built a number of applications
like this we have various
components notably in order to
orchestrate all of this we
need a library of you need a way to
write state machines
and deal with concurrency and you need a
way to do
message formatting so we have ever parse
which is this parser generator
to deal with the the formatting and
parsing we have a library of
verified cryptographic primitives that's
our main crypto provider called
evercrypt
and we've been developing tools to deal
in this new dsl called steel to deal
with concurrency and distribution and
state machines
ever passed broadly our goal there is
is is to build
to bring parser generators to
low level uh programming just as we use
password generators for
uh in in compiler development let's say
uh currently people writing low level
protocols
because they're either performance
there are performance constraints or
because of various other deployment
constraints like they must run in the
kernel or something like this
they tend to write their parsing code by
hand
and this is a can be a recipe for
disaster
where if you are trying to parse
adversarial input
manually you can easily get this wrong
and
at that low level any flaw can result in
a
system takeover so our goal there with
everparse is to kind of abolish
writing low level binary parsers by hand
and to instead
generate high performance verified code
from a high level
specification of message formats in a
way that integrates seamlessly with
existing code bases
and our toolchain produces code that's
um
memory save arithmetically safe
functionally correct and
also free from double fetches so broadly
what
you do is you you write a a high level
specification
and from the specification we have a
tool that will produce
f star code and proofs where you can
the the proofs establish the for
instance that a passer and the
serializer
are mutually inverse and and then you
can extract the code
to uh to memory save low level c code
that has the same functional
correctness property the result of doing
this is that you get parcels that are
that are low level and correct and
secure
uh and with performance that can be
remarkably fast
even when compared to handwritten
parsers so for instance here
in comparison to handwritten parsers
for
bitcoin
transactions written in c plus plus
everparse generated parsers can be
13 times faster than handwritten c plus
plus code
for evercrypt evercrypt is a
broad-ranging cryptographic provider
that provides a
a large suite of cryptographic
algorithms in a
customized to various platforms
implemented in a variety of
implemented in c and in assembly and
with full proofs of correctness and
security um
evercrypt is used in a number of places
ranging from linux to firefox to
various azure components
uh it comes out to about the library
itself at the end is about 43 000 lines
of c
code and 15 000 lines of assembly code
so it's quite a substantial
effort with full formal proofs and
we've really been
kind of aiming one criterion for
adding an algorithm to
evercrypt is to ensure that it is both
correct
and um as fast if not faster than the
best unverified implementations out
there
so recently for instance in
evercrypt we have a implementation of
aes gcm which is the
authenticated
encryption construction that secures
90 of tls connections um
and we recently produced verified code
for this that matches
and that slightly exceeds the
performance of unverified openssl code
for the same thing
um uh finally just a word about steel um
this is kind of where
the the cutting edge of where the
language design on 
the f-star is evolving. we've
been working on embedding a concurrent
separation logic
in f-star and we have a couple of recent
papers describing this which you can
learn more about
from that link we see this as a way to
scale proofs beyond what is done by
what can be done with smt alone so using
steel we do proofs use
in concurrent separation logic using a
mixture of
tactics for doing separation logic
proofs
and smt for automating
arithmetic and other parts of the proof
so just some takeaways we verify and
deploy
reusable critical software components at
scale
and our goal is to
fully verify or harden critical
subsystems of existing
in existing software while achieving
high performance and usability
in the future we aim to be applying
many of the techniques that we've
developed over the years to aim to
reduce the bar on building
program proofs further and we see that
kind of evolving in three directions
one is to do more
proof and code generation from
domain-specific languages everparse has
been a great example of that
to produce verified parsers at um
there are push button with push button
proofs we expect to do more of that
a lot of the code that we another big
hammer that we use these days is meta
programming
so that we write code once in a very
abstract generic way and then meta
program to specialize and partially
evaluate and produce many
a large amount of verified code from a
single very
generic proof object and
um another thing is as exemplified by
things like steel
we're aiming to sort of raise the level
of abstraction and allow you to program
against a a stack of verified
abstractions
things like channels and state machines
producing low-level code through the
tool chain rather than
being forcing the programmer to work at
the lowest level of abstraction
so do reach out if you're curious to
hear more about this
and thanks for the opportunity to share
some of our work
at your seminar thank you
