>>Tony Voellm: We have two 30-minute talks
back to back.
These are the security talks.
The best part of these talks is, these tools
that are going to be talked about are going
to be available to you.
So first is going to be Kostya from Soviet
Russia.
So he's come a long way to give this talk.
So Kostya Serebryany.
And one thing he said he likes to do is hike.
I don't know about him, but I have been hiking
up and down the stairs here at Google, so
that's hiking enough.
So with that, he is going to talk about AddressSanitizer,
ThreadSanitizer, MemorySanitizer, dynamic
testing tools for C++, and what we all like
to call ASan, TSan, and MSan.
So with that, here you go, Kostya.
>>Kostya Serebryany: Thank you, Tony, for
the introduction and for inviting me here.
And thanks, everyone, for coming.
So dynamic testing tools for C++.
C++.
I heard this word only once in the previous
talk, and not at all in any other talks here.
So raise your hands, who is interested, who
cares about C++ code here?
Okay.
Thank you very much.
Those who didn't raise their hands, wrong
answer.
You did actually care about C++.
[ Laughter ]
>>Kostya Serebryany: And here is why.
So all the major browsers are written in C++.
All the major operating systems are written
in C++.
All the major databases are written either
in C or C++.
Language interpreters, like Perl, Python,
whatever, are written in C or C++.
And even if you are using Java, you are actually
using one of the largest C++ codebases in
the world.
So this is why you actually all have to raise
your hands.
Let's continue.
I will be talking about three different tools.
ASan, TSan, and MSan.
They're similar in spirit, but they find different
bugs.
Let's start from the first, which is now more
than two years old.
AddressSanitizer or ASan, finds addressability
bugs in C++.
So these are buffer overflows, including stack,
heap, and global buffer overflows.
It also finds use-after-free bugs and (indiscernible)
pointers and some other interesting bugs as
well.
It is a part of two different compilers.
This is now in release versions of both LLVM
and GCC compilers.
The compiler model instruments all loads and
stores and does some tricky stuff with the
-- with the stack and globals.
The second part of the tool is the runtime
library, which is basically a complete Malloc
replacement.
It does memory allocation for the user and
it does some bookkeeping for -- to produce
good error messages.
So let's start from a few examples.
For those of you who don't code, never coded
in C or C++, C and C++ are not very safe languages.
For example, if you have a buffer of 100 elements
and you are accessing an element, say 101,
no one will stop you.
And this is like shooting your own leg.
But AddressSanitizer will actually stop you.
On this slide, you have global buffer overflow,
the smallest example I could write, four lines.
All you need is to compile this test with
a compiler with a special switch, and you
will see a report which tells you everything
about this bug.
You will see a type of bug.
It says global buffer overflow.
You will see where it happens.
And you will see the description of the buffer.
A similar example was stack buffer overflow.
The difference is that now it shows the stack
variable, which is incorrectly accessed.
The same for heat buffer overflows.
Now it shows you where the allocated buffer
has been allocated.
And the most interesting and the most frequent,
actually, in C++ is use-after-free bug.
Since C++ is not garbage collected, users
have to destroy the objects by themselves.
And sometimes they actually access destroyed
objects.
This is -- happens pretty often.
A few boring details about how it works and
why it is actually very fast.
We were playing tricks with the address space
of the application process.
We divided into three parts.
The upper part belongs to the application.
Application does what it wants to do there.
The lowest part is forbidden.
It's unprotected.
And the middle part belongs to the tool.
It occupies one-eighth of the address space.
And we store our own metadata in that -- in
that piece of address space.
So what do we store there?
We need to note that any aligned eight bytes
of the application may have only nine different
states with regard to addressability.
So N first bytes are good where N is between
0 and 8, and 8 minus N bytes are bad.
So since N is -- since 9 is less than 256,
we can encode this number in one byte.
And this is what we store in the -- in the
shadow area.
This is -- this slide shows what the compiler
does with the memory accesses in the user
program.
Given the memory access, whether it's read
or write, we divide the address by eight,
or shift it by three.
We load the shadow byte.
And if it's not zero, we report an error.
Very simple.
This is -- this slide actually shows the instrumentation
for eight-byte accesses.
Smaller accesses are just slightly more expensive.
They add, like, three arithmetic instructions.
So this is a marketing slide, my favorite.
First of all, for those of you who have heard
about Valgrind -- by the way, who have heard
about Valgrind?
Good.
So if you know Valgrind, ASan is ten times
faster, sometimes 100 times faster, and it
finds some of the bugs that Valgrind doesn't.
The memory overhead of the tool is also tolerable,
usually within two times.
And the best part about the tool is that,
actually, we have quite a few users.
And those users are happy and they're finding
bugs every day, and as we speak.
One of the major customers or users of the
tool is Chrome.
They have found many, many hundreds of bugs
in the recent one and a half year.
The tool is used not just for tests, like
we have -- Chrome has continuous integration
testing, as many previous speakers described.
And ASan is used there.
But Chrome Security team is also doing fuzzing
on 2000 machines, 24 by 7.
Fuzzing means more or less generating random
test inputs, smartly random, I would say.
And they throw these tests into the instrumented
Chrome.
This way, they found really a lot of security
bugs, which is -- Even more (indiscernible),
from my point of view, is that external security
researchers also do find Chrome bugs this
way.
And Google pays money for every such bug.
So if anyone here is a security researcher,
you are welcome to join the bounty hunt.
Of course, Chrome is not our only user.
We know that the tool found many bugs in Firefox,
FreeType, in both compilers, in Perl, in MySQL,
you name it.
I don't know any project that has been extensively
tested and which had no bugs detectable.
Okay.
Let's go to the second tool.
Data races are also well known to cause problems.
One of the problems is flaky tests, which
were discussed today a lot.
Unfortunately, data races are also known for
causing deaths in real life.
So it is quite desirable to find them.
Let's see what we have about the second tool.
It is also using compile-time instrumentation.
And it is also a part of both compilers, both
major open source compilers, LLVM and GCC.
The major part of the tool is, again, runtime
library, which is, again, a Malloc replacement.
It just has different algorithm inside.
Let's see how it works and what it does.
This is the simplest data race example I could
write in C++.
We have two threads that are concurrently
accessing a single global variable.
So no centralization are working, nothing.
And, again, if you compile this test with
a special switch, you will get a warning message
which will say that you have a data race.
It will say which variable the race is happening
on.
It will -- and it will show you two stack
traces for the first memory access and for
the second memory access.
What does it do -- what does the tool do with
your program?
It simply inserts function calls into your
program before every memory access and also
at the function entry and function exit.
Then we play the same exact tricks with the
address space of the application.
Well, very similar, at least.
We divide the address space into four parts.
The first and the third part are protected.
No one can touch them.
The upper part belongs to the application.
It's actually quite enough.
It's, like, 32 terabytes of address space.
You don't need more.
And the second part belongs to the tool.
We store our metadata there.
How does this metadata look?
We divide metadata or shadow into what we
call cells.
Every cell is an eight-byte -- is an eight-byte
word which represents one access to a given
memory allocation.
It has threadID.
It has so-called epoch, which is a scaler
clock.
It has few bits for position and size and
one bit to indicate whether it was a read
or a write access.
And this eight-byte cell fully represents
one memory access.
There is no additional information anywhere.
So for a given eight-byte word of the application,
we have four shadow cells, which means that
we have five times memory increase.
Memory usage increase.
Let's see what does this tool do with this
shadow.
Suppose we have this eight-byte word somewhere
in the application, and the user code is accessing
the first two bytes, writing something to
the first two bytes.
So we fill in the first cell, telling that
this is thread 1, epoch 1, the range is from
0 to 2, and this is a write.
Suppose later, another thread is reading last
four bytes.
Again, we are filling another cell.
Since there is no intersection between the
memory ranges within the eight-byte word,
there is no chance that this is a data race.
Then a third read -- access comes.
It is a read from the first -- first four
bytes.
And now we have intersection between the memory
ranges, between the first access and the third
access.
At least one of the accesses was a write.
And now we need to figure out whether a data
race is possible.
So the answer to this question is the same
as the answer to the question whether epoch
1 happens before epoch 3.
I will not go into details here.
But this is a very fast, constant-time operation,
which basically involves a couple of memory
accesses to the global state.
It is well described in literature.
So the -- one of the tricky parts of the tool
is to how -- how to show the stack trace of
the previous memory access.
Once you have a data race, you can easily
show the stack trace, but not so easily the
previous one.
We -- for this, we have per-thread cyclic
buffer of events.
Events like function entry, function exit,
and memory access, each event takes 64 bits.
And if the access happened too long ago, we
may lose the data.
But the buffer size is configurable, and the
maximum size is enough for any data race we
have seen.
So when a data race actually happens, we replace
this event buffer.
This is -- this operation is slow, but it
happens just once when you report a bug.
So the overhead of this tool is significantly
more than the previous tool, but still quite
tolerable, and, to the best of your knowledge,
it is the best in clays among similar tools.
So the slowdown is somewhere between four
and ten times.
And the memory usage increases five to eight
times.
This tool already has quite a few trophies
in Google.
We started deploying this tool in the server
side and have found many, many hundreds of
bugs in our C++.
What I didn't tell yet is that this tool also
works for Go language.
Raise your hands if you know about the Go
language.
Good.
Now you all know about the Go language, so
raise your hands.
The Go language is very interesting language
which doesn't have buffer overflows, use of
the free, or anything like that in C++, but
it still has data races.
And I think the tool more or less found all
of them in all of the existing Go code.
There is -- there is quite a bit of it, but
not as much as C++.
Since our code is using some of the open source
technologies, we also found races there.
For example, one bug in OpenSSL was worth
fixing.
And there were a few benign races.
I don't agree that they're benign, but okay.
So we're just starting to deploy the tool
with Chrome.
We already found a few races.
And I expect a steady flow of bugs in the
future.
Key advantages of the tool, of this tool compared
to all others.
We believe that it's at least an order of
magnitude faster than any other tool like
this.
And it natively supports atomic operations,
which means that if your code does very low-level
synchronization using atomics, we will still
understand it.
Most of the other tools don't.
There are some limitations.
This tool today only works on 64-bit Linux.
So it will be a bit hard to port it to some
mobile operating system.
And it doesn't touch libraries or inline assembly
or -- that's all.
Okay.
Let's move to the third tool.
The first two tools are already working and
they're finding bugs 24 by 7.
This tool is a work in progress.
We have not deployed it anywhere yet.
The tool is called MemorySanitizer, and it
finds uninitialized memory reads.
Let me show you an example.
And this is, again, a funny thing about the
C++ language, which I, by the way, love.
It's my favorite language.
You can create a buffer.
It will contain some garbage.
You can read this garbage.
You can use this garbage, and no one will
ever complain.
Well, the test may be flaky.
The -- the nuclear missile may go in the wrong
direction.
But, otherwise, no one will notice.
Except for MSan.
Of course, Valgrind also finds these bugs.
But the speed of Valgrind doesn't allow you
to actually use it as widely as we expect
this tool to be used.
Again, you compile the code with a special
switch, and it will show you where the initialized
access happened.
And it will also show you where the memory
was allocated which was never initialized
after.
Okay.
So how does it do?
Surprisingly, it also uses shadow memory.
This time, it is one bit to one bit mapping.
And every bit, if it is set, it means that
the corresponding application bit is uninitialized
or poisoned or whatever.
Initialized memory is returned by Malloc,
so once you Malloc the buffer, the whole buffer
is poisoned.
And initialized memory also comes from local
stack objects.
When you enter a function, if you have a local
stack object, it starts from being poisoned.
We have to propagate this poisoned bit through
arithmetic operations.
I will show you why later.
And once a constant value is written to a
memory location, the corresponding shadow
is unpoisoned, meaning that we have initialized
it.
So this is, again, the direct mapping, which
makes tool extremely fast.
If you want to compute the shadow corresponding
to an application address, you just subtract
some large constant from it.
Or you actually unset one of the bits in the
address.
This is the same operation here.
One of the problems of many tools that find
initialized threads is that they report what
they think are bugs very early, and what I
think are false positives.
Here is a simple example.
Suppose we have an struct in C language which
has a one-byte object following a four-byte
object.
So who knows what is the size of this struct?
No one?
>>> Eight bytes.
>>Kostya Serebryany: Eight bytes.
Correctly.
So this struct is eight bytes, because there
is a hole due to alignment.
You have one-byte object, then three bytes
of nothing, and then a four-byte object.
So when you allocate this object, it is all
initialized.
Then you initialize X and Y.
It gets unpoisoned.
But the hole itself is still poisoned.
And if you're doing some things like memcopy
with this object, you are copying uninitialized
bytes.
And some of the tools are reporting bug at
this point, which makes the tool unusable.
Valgrind is one of the few tools that don't
do this.
And our tool doesn't do this, either.
So we have to -- we have to propagate the
shadow bits through some of the arithmetic
operations like addition, bit-wise end, and
so forth.
Enough about this.
Also need to know who is actually -- who actually
allocated the poisoned memory.
Like, you have two calls to Malloc, then you
add the results, putting it into some third
value, and then you figure out that this value
is initialized.
You need to know which Malloc is guilty.
And MemorySanitizer does it by using yet another
part of address space.
It stores there locations of the -- of the
-- it stores IDs of the allocation site like
malloc or stack.
The memory sanitizer has very moderate overhead.
It's like three times slower than the native
run and it uses two times more memory.
If you want to get origins, it's twice worse
but still quite tolerable.
The tricky part about this tool, which is
not about the first two tools, this tool actually
has to see all the memory accesses in the
program; otherwise, it will report false positives.
So we need to compile everything, libc, libc++,
all the libraries that the program depends
on.
And this is actually quite doable in a wide
range of cases.
Like we can handle LLVM itself.
For larger programs like Chrome, for example,
it's not that easy.
Chrome depends on 77 trade libraries so we
have to do some dirty tricks to instrument
the libraries at runtime.
As I said, we didn't deploy MSan anywhere
significantly but it already has some good
trophies.
We tried it on more than a million lines of
code of C++.
It was not tested by Valgrind previously.
We found 20-plus unique bugs in just two hours.
We then tested it with Valgrind.
Valgrind finds the same but in more than 24
hours.
And MSan gives better reports for stack memory.
And we have found a few bugs elsewhere.
So to summarize what I just told.
We have three tools for C++, and one of them
also for Go language.
ASan is, we believe, a must use for everyone
who is developing or testing C++ or C programs.
Today it works on Linux, OSX, Chrome OS, and
Android, on x86, on ARM, and a little bit
on Power PC.
Work in progress is iOS, Windows, BSD clones,
et cetera.
And this is a part of two major open source
compilers, Clang and GCC.
TSan is a must use if you have threads because
then you most likely have data races.
Today it works on x86 Linux and is also part
of two open source compilers.
MSan is a work-in-progress tool which finds
users of uninitialized data.
Stay tuned if you are interested.
At this point I will be happy to take questions.
Thank you.
[ Applause ]
>>Tony Voellm: So I learned two really important
things here.
One, Kostya loves C++ because it keeps him
employed.
And two, if you don't use his tools you are
shooting yourself in your leg.
So use the tools.
I am going to give you one pre-announcement,
then we're going to take a question.
We are going to have drinks after GTAC today.
We are going do the same thing we did yesterday,
shuttling everybody up, and then stay tuned
for Claudio's talk after the questions here.
It's going to be about XSS which affects everybody
here.
So with that, please, first question.
>>> Alec Monroe.
To bring things back to, I think, where the
languages most of us work with, I know, say,
Python, for example, has Shed Skin and Cython
that can compile down to C++, and while generally
when programming Python we don't worry about
address, wrong address or memory sanitization,
but we do worry about thread issues.
We do have data races there.
I am just wondering if there has been any
work to look into whether you could use something
like thread sanitizer to actually test Python
code for data races.
>>Kostya Serebryany: My knowledge about Python
is quite limited, but I know there is something
called Global Python Log or something like
this, which means that Python is actually
single threaded.
If you try to remove that Global Python Log,
then you are in big trouble and TSan is your
friend.
But other than that, you probably don't need
it.
>>> Thank you.
>>Tony Voellm: Thanks for the question.
We have one up here on the moderator.
It says does address sanitizer catch accesses
outside the current buffer but within another
valid buffer?
>>Kostya Serebryany: That's a good question.
So we might have two buffers, one which was
expected to be accessed and one which is actually
accessed because the index was so far away
that it jumped to another buffer.
And there is a red zone between them.
So if you are accessing, like, one kilobyte
outside of the buffer, will not detect it,
yes.
But the most frequent accesses are within
the first 16 or 32 bytes, and this is what
we catch reliably.
>>Tony Voellm: Great.
All right.
Kostya, thank you.
>>Kostya Serebryany: Thank you.
[ Applause ]
