- Good afternoon, and
thank you for coming.
My name is Kostya Serebryany.
I work at Google in the Sunnyvale office,
and my team has been
dealing with memory safety
in C and C++ for the last decade.
Today's talk is called Memory Tagging
and How it Helps C++ Memory Safety.
I hope I will convince you that we have
a path forward with C++ and memory safety.
I will remind you what memory safety is
in case you don't remember.
I will explain the general concept
of what we call memory
tagging or memory coloring.
We'll discuss three
existing implementations
of memory tagging, including one
that has been just announced
by ARM eight days ago.
We'll briefly discuss how this affects
security of the C++ code.
First of all, a statement.
Memory safety in C++ is a huge mess.
Raise your hand if you agree.
Okay, so I don't have to
convince this audience.
We have all sorts of use-after-free
and buffer-overflow, and
heap-on-stack, and globals,
and we still basically don't have
any general protection against it.
If anyone thinks that
the newer C++ standards
give you any better
protection than before,
I have to disappoint you.
C++17 actually makes certain forms
of memory safety bugs less
obvious and more frequent.
This is the slide I've
shown a year ago at CppCon.
I don't think anything
has improved since then.
There is a class in
C++17 that is misnamed.
It should have been called
std::use-after-free,
and for some reason, it is
called std::string_view.
Raise your hand if you can spot the bug.
For those who cannot spot
the bug, I will explain.
The second line of the code snippet
creates a temporary string object
that is destroyed right
after the semicolon,
but the string_view captures the pointer
to the guts of the string,
and then you use it later.
This is a use-after-free on heap,
or use-after-free on stack depending
on how exactly the string is implemented
in the standard library.
We keep seeing these memory safety bugs
for many, many years now,
and the overall picture
doesn't change from year to year.
For example, if we just
take the Chrome browser,
which is a huge C++ application,
every time we release a
new version of Chrome,
we fix some number of security bugs,
and most of the
high-priority security bugs
are memory safety bugs.
This is a text from the
Chrome release in July.
Five out of five high-priority
security vulnerabilities
in Chrome were memory safety bugs.
This is not just theoretical.
We are facing this problem for real.
How many of you have heard
about tool called AddressSanitizer?
Okay, I can skip a few slides.
For those who haven't heard,
this is a tool based on
compiler instrumentation.
It inserts a few instructions
before every load and store.
Those instructions check metadata
attached to memory, and so we can catch
buffer-overflows, and use-after-free.
This slide probably, hopefully,
describes the general idea.
If you, for example, malloc 24 bytes,
then the user will get a pointer
to a memory that is marked addressable,
so this is three times by eight bytes.
There will be chunks of
memory adjacent to the buffer
to the left and to the right
that are marked unaddressable.
We call those red zones.
If the code touches p sub 24,
which is already unaddressable byte,
we catch the bug.
When we free the memory
or delete the memory,
we mark the entire chunk as unaddressable,
and then we try to keep
that memory in quarantine
as long as we can,
meaning that the following
many malloc and occasions will
not return the same address.
We improve the probability
of occasion use-after-free.
ASAN, as far as I can tell,
is pretty widely used as a testing tool,
but it has a very significant limitation
when it comes to using ASAN in production.
Generally speaking, you
cannot use ASAN in production.
Some people do, but this is
pretty complicated effort.
The main reason why it is so complicated
is because ASAN has two
x overhead, roughly,
in memory, CPU, and code size.
It is very hard to deploy anything
that expensive in production.
The overhead is caused by
the compiler instrumentation
that we insert around
every load and store,
and also by the shadow, by the red zones,
and by the quarantine.
It is also not a strong
security mitigation.
It is easy to bypass for the attacker
if the attacker wants to bypass ASAN.
Now, let's discuss the other approach.
The approach is typically
called memory tagging
or memory coloring, and I will try
to explain the entire
thing, the entire topic
of the talk, on just this one slide,
so pay attention, please.
First of all, this approach, unlike ASAN,
only works on 64-bit architectures.
If you care about 32-bit
architectures, no luck.
We have to have some magic way
to associate a metadata with
every n bytes of memory.
For example, we want to have
a four-bit metadata or tag
attached to every 16 bytes.
We can play with these numbers.
We can have more bits in the tag.
We can have larger granules,
larger chunks of memory
which have a tag, but
the idea doesn't change
if we change just these numbers.
The next important thing is that,
when we have a 64-bit pointer,
not all of the bits in the
pointer are actually used.
On today's typical x86 hardware,
you use 48 bits of the pointer,
and the remaining 16 unused.
Similarly on ARM and other
64-bit architectures.
When we do memory tagging,
we also use some of
the bits in the pointer
to store a tag in the pointer.
We have two tags, one
associated with the memory,
and one associated with the pointer.
Then, when you allocate memory in malloc
or when you enter a function
for a local variable,
you need to tag that memory,
and also tag the pointer
with the same tag.
Then, whenever you load a store,
you have to check that these tags match.
If they don't match, you trap.
This allows you to find most
of the memory safety bugs
like use-after-free and buffer-overflows.
Here's the workflow for malloc.
Suppose you have the memory
tagging system implemented
on your hardware, the
malloc will need to do
very, very little extra work
on top of its usual work
to support memory tagging.
First of all, all the memory chunks,
all the heap objects returned by malloc
have to be aligned by the
granularity of memory tagging,
so 16 or 64 depending
on the implementation.
Then malloc will have to choose a tag.
If a tag is a four-bit,
you basically choose
one of 16 values.
You can choose it randomly.
The next step is to tag the entire chunk
of memory that you're returning,
and also tag the pointer.
This is basically it.
No magic.
When you free the memory,
you can also optionally
re-tag the memory to some other tag.
Let me try to demonstrate it
with a little bit more colors.
By colors, I mean tags, of
course, but they're colors.
When you're requesting 20 bites from you,
the first step that the
memory allocator does,
it aligns the requested size
by the tagging granularity.
In this example, I'm using
the 16 bytes granularity,
which means that, 20 bytes,
we have to align up to 32.
Malloc allocates 32 bytes of heap memory.
It then chooses a tag.
In this case, the tag is
hexadecimal a, or green.
Malloc will put this
green color, or green tag,
into both the memory,
these two 16-byte chunks,
and the pointer, so the
pointer will look like this.
It has the tag in the upper four bits,
and then the usual pointer stuff.
The adjacent memory
chunks are already colored
by previous heap allocations,
and they have some other colors.
Since there are only 16 colors,
if we use four-bit colors,
it could be the same color, but
there is a large probability
that this color's different,
so when the code is trying to
reference p sub 32, it means
that we are accessing blue
memory with green pointer.
If blue is not equal to green,
we get a failure.
Very similar thing happens on delete.
Whenever we delete a heap chunk,
we just recolor all the
memory for that heap chunk.
If there is addending
pointer that is still green
and we try to access the memory
that used to be green, but
not anymore, it is purple now,
we will have a trap.
Any questions so far?
- [Man] How do you tag (mumbles)
- How do I tag the memory?
Hold on a bit.
I will explain how I tag the memory.
I will certainly get
back to it, don't worry.
As I already mentioned, the probability
of detecting a bug is not 100%
because we only have so
many different tag values,
and we have much more
different memory chunks,
so the probability of detecting a bug
depends on the tag size.
If the tag size is four bits,
then you only have 16 different colors,
and in 15 out of 16 cases,
you'll detect the bug, so it's about 93%.
If you have more tag bits,
the probability goes higher.
A few words about precision
of buffer-overflow detection
with this approach, if
you allocate 20 bytes,
it means that you are actually allocating
32 bytes by malloc, so if you are trying
to access incorrect memory
within the same granule
within the 16 bytes of
allowed memory, then no luck.
This approach doesn't have
this level of accuracy.
If you are trying to attach memory
that is in adjacent granule,
like p sub 32 in this example,
or p sub minus one, you get
the memory error detected,
and depending on this
software allocation strategy,
you can have the detection
of 100% probability or 90-something.
If you are trying to
reference this pointer
with some wildly off index,
then the probability is
roughly 90 plus percent.
The software could choose
different strategies
for allocating tags when
allocating heap or stack memory.
We think that the best strategy
is actually to just allocate a random tag,
but there could be
different other strategies.
You can have a dedicated tag
that says match-none,
meaning that this memory
is never allowed to be accessed by anyone,
or you can alternate odd and even chunks
to make the probability of
buffer-overflow detection
to be 100% for linear buffer-overflows.
All the same story applies
to stack-buffer-overflow detection
and stack-use-after-return detection
except that, now, the
implementation should not
just update the malloc function,
but it also has to do a little bit
instrumentation of the compiler.
I will not go deeper into this.
The same story for
global-buffer-overflows.
You have to tag your globals.
First thing that I want you
to know about this scheme
is that it doesn't have
any false positives.
If you find anything with
this scheme, it's a real bug.
Go fix it.
Now, how do I actually
implement this thing?
One of the implementation problems
is that we're using, you are abusing,
bits in the address.
Thanks to a marvelous feature
that exists in today's hardware on ARM64,
we can just do it.
The feature is called top-byte-ignore.
You can put any garbage into
the top byte of the address
on ARM64, and the hardware
will just not look at it.
On AArch64, this part of the
implementation is pretty easy.
On x86_64, it's hard enough
that I don't advise you to go there today.
It's pretty painful.
Before I move further into
the actual implementations,
I want to give a quick comparison
of memory tagging as an approach
with ASAN as an approach.
What are the differences?
First of all, memory tagging has
very small memory
overhead compared to ASAN.
ASAN is roughly two x memory overhead,
and memory tagging,
depending on the tag size
and the granularity, is either
several percent of memory
or less than 1% of memory.
It is really tiny.
Two other benefits of
memory tagging is that
memory tagging is more sensitive
to detecting buffer-overflows that happen
way beyond the bounds of the object.
Memory tagging is more
sensitive to use-after-free
when the use happens
much later than the free.
Memory tagging is more sensitive.
Optionally, as a side effect,
memory tagging forces all
the memory to be initialized,
so you don't have
un-initialized memory anymore.
ASAN still has its own benefits.
First benefit is that
ASAN is one-byte precise
for buffer-overflows.
Second, it is more portable.
It works on today's
hardware in 64 and 32-bit.
Now, let's move to the
existing implementations,
and I hope I will explain,
how do we actually tag the memory?
At least in general idea.
First of all, there is SPARC ADI.
It is an actual hardware implementation
that exists in the shipping CPUs
for at least two years.
They use four-bit tags
per 64 bytes of memory.
They have special instructions
that allow you to tag the
memory and read the tags.
The documentation doesn't tell you
where these tags are stored,
so I cannot tell you,
how do we tag the memory,
how does it physically work,
although everyone's suspicion,
that they're using ECC
bits to store the tags.
You don't need to know how it works
in order to use it.
To use it, you just use an instruction.
We have tested this
implementation pretty extensively,
and in short, it works great,
but it has two problems.
First problem is that the
granularity of tagging
is 64 bytes, which means that the malloc
has to over-align too much,
so we spend too much memory,
up to 20% more memory
because of over-alignment.
Who can guess the other problem
with this implementation?
Anyone?
Okay, the problem is that
this is an implementation in SPARC.
(audience laughs)
I don't have any SPARC in production,
or we don't ship any SPARC
CPUs in current devices.
This is the major problem for us.
The implementation is amazing.
We cannot use it.
We have actually implemented something
we can use today, on today's hardware.
We'll call it HWASAN, or
hardware-assisted ASAN.
It is currently in the
Clang/LLVM toolchain.
The tool is only really
usable for ARM64 because,
as I mentioned, ARM64 has
this magic feature called top-byte.
We use eight-bit tags
per 16 bytes of memory,
which means we have 6% extra memory usage,
but we use compiler instrumentation
to actually check the bugs at run time,
which means that we
have two x CPU overhead.
This is testing tool very similar to ASAN.
The biggest advantage is that
it has very low memory consumption.
We also have this for the Linux kernel.
Patches are under review.
Any questions so far?
Okay.
I'm very happy to tell everyone
here from this stage that,
eight days ago, ARM has announced
what they called Memory Tagging Extension.
It is an actual hardware implementation
in the upcoming ARM 8.5 architecture.
It doesn't exist in hardware today,
so I'm eagerly waiting for it to appear
in the real hardware.
They use four-bit tags per 16 bytes,
which is very good balance.
You have enough data, and you
have small enough granularity.
It is suitable for efficient
stack instrumentation,
so you get both stack
and heap-related bugs.
You don't have to over-align malloc,
so very little extra RAM overhead.
It has just a few instructions
that you may want to use to
set the tag in the address
and to set the tags in the memory.
Again, the instruction
set does not specify
where the tag's actually stored.
This is left to the implementation.
How do we tag the memory?
We use the instruction to tag the memory.
How does it actually works?
No idea.
It probably will vary
between the implementations.
As a side effect, and
as I already mentioned,
when we tag the memory, we can optionally
also initialize the memory itself.
With the hardware implementations,
it doesn't have any extra cost.
If you are using memory tagging system,
you initialize all your
memory at no extra overhead,
and you just don't have
uses of un-initialized memory anymore.
What is it good for?
Think that memory tagging
is a very strong alternative
to AddressSanitizer, ASAN,
because it consumes much less memory,
and it is also more
sensitive to certain bugs.
We believe that it is very useful
for crowdsourced bug
detection in production.
This is certainly true for
hardware implementations
like the existing SPARC ADI
and the proposed ARM
Memory Tagging Extension.
It is, to some extent, true
about our software implementation HWASAN.
HWASAN has large CPU overhead,
so not always, not everywhere.
We also believe that memory tagging
is a very strong security mitigation
that you can deploy in
production against attackers,
probably the strongest than
anything we have today combined.
Why do I think so?
First of all, it is precise,
100% reliable mitigation
against linear buffer-overflows.
Things like Heartbleed, and Ghostbug,
and lots of other
publicly very popular bugs
are linear buffer-overflows.
We can mitigate them completely.
We can also mitigate
user-after-frees probabilistically,
which means that the attacker
will have to launch attack many times
before the attacker succeeds.
Most attackers will be
discouraged, we hope.
There are some ways to
reliably bypass memory tagging,
but the bottom line is that,
in order to bypass memory tagging
to attack a memory safety bug in C++,
you have to have a
reliable exploitable bug
of some other class.
We have to deal with them anyway,
so there is a little bit of catch-22.
If you want to bypass memory tagging,
you have to have an exploitable bug,
but most of the exploitable bugs
are protected by memory tagging.
As a mitigation, it also works
for un-initialized memory.
We think that memory tagging will find
more uses outside of just memory safety
and security mitigation.
We're certain that memory
tagging can be used
as infinite hardware
watchpoint at very low cost.
Also think that the Java people
or garbage collection people
will find ways to abuse it
for garbage collection efficiency.
I'm actually very eager
to hear from you offline
after this talk, what else
can we use memory tagging for?
I would normally have a
summary slide, but I don't.
I have a home work slide instead.
I have two pieces of homework for you,
and I will ask all of
you next year in CppCon.
First of all, please ask
your favorite CPU vendor
to implement some form of memory tagging.
If your favorite CPU vendor
uses ARM instruction set,
then you can just point them
to the public instruction set extension.
My second piece of homework,
take a look at your favorite
C++ memory safety bug,
or exploit, or vulnerability.
Do the thought experiment,
is it detectable by memory tagging,
and is memory tagging a strong
mitigation for this case?
This is all.
Thank you.
Questions are welcome.
(audience claps)
- [Man] (coughs) Hi, sorry.
Question is the following.
Due to the limited number of platforms
where this is actually implemented,
and let's say you can build your framework
or any application that you are using
in the sandbox environment.
Let's say you, for example, with ASANs,
you might not use ASANs in production,
but you have a debug environment,
and you have maybe a
docker that sets it up,
and then you run it, and
then you get the output.
For those ARM instructions,
could you use a QEMU emulated environment
where you would apply your memory tagging,
and then you can catch, right?
Have all the benefits without
actual hardware support
on the production?
- It's probably not
worth using QEMU because,
if you're ready to hit the
30 x slowdown from QEMU,
you can just use ASAN, and you're fine.
It will be slightly less sensitive,
but at this performance heat,
you're just better using ASAN.
The major benefit from memory tagging is,
if it's in hardware,
if it's extremely fast,
and you can use it all the time.
- [Man] As I understood, it
provides better guarantees,
and has this probabilistic
independent of how many bits
it is that are from the
memory that they actually--
- That's correct, memory
tagging is more sensitive
to certain types of bugs,
but it only matters--
Really?
It only matters if already
it's in production.
You cannot use QEMU in production.
I'm told that the session is over.
Is it?
No?
Two more minutes?
Okay, two more minutes.
- [Man] Thank you.
- [Man] Thank you for the
introduction into memory tagging.
In order to understand
where to aim for the future,
what would you expectation,
for example, in five
years, should we expect
that it becomes a production
and widely-used feature
for any case, or it still
stays the same thing as,
for example, because
there is overhead on CPU
in order to use memory tagging validation,
and therefore, we use
it only for development?
In other words, would
CPU manufacturers aim
to re-architect their devices
in order to make it
cheaper to use everywhere?
- We will have memory tagging
everywhere in five years.
It depends on you guys.
If you apply pressure
to your hardware vendors
and your software vendors,
it will be widely used in five years.
- Thank you.
- If you don't,
it may or may not happen.
- [Man] I'm still digesting a little bit
how the tagging itself plays out.
I'm gonna try to ask one
or two questions here.
Maybe you can catch if I'm smelling
something appropriate or not.
My thoughts go to, I've
seen warnings all the time,
pointers, alias pointers, for that matter,
and pointers into raw memory
that are just random offsets
from other areas of memory.
Are there any opportunities
here for tags to be,
when we copy pointers,
do we copy the tags?
If I'm misinterpreting that the tag
is part of the data rather
than as part of the pointer--
- The tag is both part of the data
and of the pointer.
- [Man] Would we have
aliasing issues allow us,
if I try copying to another pointer
that's pointing to the
same memory location,
but isn't exactly the same pointer,
would I be able to run into issues there?
You said no false positives earlier.
- No false positives, it just works.
If you copy pointer, you just have
the same tag and the same pointer.
No false positives.
We run the entire Android platform
with our software implementation
of memory tagging.
It just works.
- [Man] Thank you.
- [Man] In the hardware-assisted
tagging scheme,
you said there was about
two x CPU overhead.
Does that include the time required
to generate the random tag?
- Yes, generating a random
tag is not very expensive,
and in our software implementation,
we don't really generate it truly random.
We just take some of
the random bits from SP.
In the software implementation,
all the cost is due to
compiler instrumentation
of loads and stores.
- [Man] Okay, so the random
source is from SP, okay.
- Yeah.
Yes, I am told the session is over,
but I'm here in case you
have questions offline.
Thanks again.
(audience claps)
