- Okay, my name is Chuck Wilcox.
I work at MathWorks.
I'm the Boston's C++ Meetup organizer
and today, I'm gonna present about
zero-overhead compiler pessimization.
I can think of at least three cases
where you would wanna do this.
The first one is atomic operations
where you're telling
the compiler not to do
certain optimizations that would interfere
with inter-process communication.
And anyone's who used the
standard atomic library,
this is effectively
what you're telling the
compiler to partially do.
The other case is the new attribute,
likely and unlikely, on branches.
You're actually telling the compiler
to do something that it thinks is against
its understanding of what's the best thing
for the system performance.
But, I'm not here to talk about
either of those two today.
I'm here to talk about benchmarking.
Usually, when you're
doing microbenchmarks,
the compiler can kinda do tricks on you
to make you think you're
measuring something you're not.
Specifically, when we
write microbenchmarks,
we write a little snippet of code
that really doesn't affect the rest,
the state of the program.
And therefore, the optimizer,
when it goes through,
it says hey, it's not changing anything.
I'm just gonna get rid of the whole thing.
Additionally, if it's
not doing side effects
in the abstract-state
machine sense of the word,
it's not interacting with
file system or other things,
it can say there's no effect.
So, most of the microbenchmark
frameworks out there
have different ways to inject
cause-entry code
to make it look like you're
doing a read or a write,
and that the compiler is
not going to be able to get
rid of on you.
So these are a couple from Google,
Facebook, and Celero.
But, let's see if we can take a look
at what's really going on.
Here we go.
Alright, down to 90 seconds, awesome.
Alright so,
alright so,
we go to Compiler Explorer,
and we try to write the world's simplest
benchmarking program.
We have an int, we get the clock time,
we increment it, we get the clock time,
we subtract the two and return.
And if you look at the assembly,
we have the zero initial value.
Here's the add of one
and the rest of the math
is just stack operations.
But, for those who might
have been paying attention,
it wasn't actually optimizing
and when you turn the optimizer on,
all the code went away
except for your two
calls to the clock time.
That is not what we wanted.
I don't have a lot of
time to explain these
but,
alright so,
if we put in a fake call to read
after we're done timing,
you'll notice that one magically appears
in a register but we don't
actually have the increment.
So we have to add yet another call.
And, now we actually get the add
to show up in between
the two calls to time.
First one, okay.
Anyway, the moral is this is
actually hard to do correctly.
If you're ever doing benchmarking,
these are some of the issues
you're gonna run into.
Thank you very much.
(crowd clapping)
