- Good afternoon everyone.
I'm Titus Winters, I do not
love doing my own introductions,
but there's a couple things
that I do want to say.
I've been leading most of the
C++ Common Library efforts
at Google for eight years now, dear lord.
I'm a maintainer for
Google C++ style guide.
I founded Abseil.
I'm now the chair for Library Evolution,
that is the group at the
standards committee level
that does API design for
the Standard Library.
The new study group on tooling,
I've done tons of guidance,
I write a lot of the
tip of the week series,
I work professionally on providing
good guidance on API
design, and because of all
of the code base wide
refactoring work in Abseil
and all of the other teams we've been on,
we are subjected to the pain of fixing it
when we get these things wrong,
which is to say hopefully you can trust
that I am not making all of this up.
So, when we talk about design,
why do we talk about design?
I think we do this
because we want to ensure
that the things that
we produce are usable,
that people understand what you mean
when you write out an interface,
when you write out a function,
when you write out a class,
that they know how to use that.
By looking at the things that work
and the things that don't we
find ideas and design patterns
that are easy to follow and that make
the resulting APIs easier to work with.
In the end, though, design serves us.
This is largely not about math
or fundamental principles of the universe.
These are not rules written
out on stone tablets
and brought down from on high.
I am not bringing you commandments here.
I am bringing you stories, best practices,
things that I have found seem to be
the way that we do things,
things that I have found
seem to be the right
way to express things,
but this is all going to be
evolving over time, right?
And I do want you to think
about all of these things carefully.
Don't just take my word for it.
There are a few places where there are
underlying math source embolic logic,
and those things will
help inform good design,
and that'll be good.
We'll sort of call that out.
There is also this question,
are we prescriptivist or descriptivist?
We can approach design just like grammar
from either of these views.
Do we see the rules as
they were written down
on those tablets and value
the rules over all else?
Or do we see that, oh
hey, we've made a mess,
and some things work
and some things don't,
and try to produce rules
that describe what things did
and what things didn't to
like encourage the good ones
and nudge us away from the bad.
And you can sort of take
either viewpoint here,
but I do sort of prefer
when we approach things
in a descriptivist fashion
of this seems to work,
this seems to not work,
and here's why, right?
It is really important
to me that you all think
about these things and understand why.
This talk will be in roughly three parts,
starting small with the
basic units of design
and working our way up
to big questions like
is this an acceptable
design pattern for types?
There's a spectrum here.
As we go forward in the talk we'll go
from talking about syntax
to semantics to theory.
This is scheduled in a
two-hour block here at CppCon.
And I'm gonna cover the
first of these in this talk
and the higher level pieces,
types and type design,
in the next slot.
So we'll wade in starting with the smaller
and hopefully more understd
part of the design spectrum.
First, a question.
What is the atom of C++ API design?
That is, what is the fundamental
small chunk of API design?
It might not be the smallest chunk,
but it should be the small
thing that we reach for
or that we think about most often.
And if you asked me this a
year ago I would have said,
well, it's the function, right?
After all, that's the
piece that we use the most.
Free functions, member functions,
special member functions, et cetera.
But recently I've started to think
for maybe the last year or so,
maybe functions are actually our protons.
The better unit of design
is slightly larger.
The better unit of design
is an overload set.
When you have a well-designed,
when you have a reasonable,
when you have a good overload set,
and it turns out there's
actually very solid agreement
amongst all of the experts on
what is good and reasonable here,
overload sets are a much
better unit of design,
especially as we move
to a richer type system,
richer set of vocabulary types, concepts,
and even deeper understanding
of move semantics
and move semantic designs.
Pop quiz, what does this mean?
What does this simple function
signature like this mean?
By the end of the talk
I want you to understand
that this question is bogus.
This question is ill formed.
You really need to know a
bit more about Foo, the type,
and quite a bit more about f
and everything else
that is named the same,
everything else that is named f.
I will say, if f is
appearing all by itself
and isn't part of an overload set,
what we've got here is
the function signature
for maybe move.
This nugget didn't actually
fit anywhere else in the talk,
and I really find it very important
so I'm just gonna say that right now.
You can repeat it three times
to yourself under your breath.
It is maybe move when you see
a function signature like this.
Okay, overload sets.
Somewhat formally, an overload set
is a collection of
functions in the same scope,
that's namespace, class,
et cetera, et cetera,
of the same name such that
if any one of them is found
by name resolution they all will be.
That captures the syntax,
that captures what the
compiler cares about,
but not the semantics.
That is what a user will care about,
of a good overload set.
The core guideline says
very good things about this.
Core guidelines has two rules.
What is overload options
that are roughly equivalent,
that is if you have two things
that are doing roughly the same thing,
name them the same.
And also the flip side,
overload only for operations
that are roughly equivalent, all right?
That is, if you have two things
that are doing something
very, very different,
please name them differently, right?
This should not be shocking.
The Google C++ style guide says
use overloaded functions,
including constructors,
only if a reader looking at a call site
can get a good idea of what is happening
without having to first figure out
exactly which overload is called, right?
You shouldn't need to do
overload resolution in your head
and know all of the
symbols that might show up
through transitive inclusion.
Like, what's everything in your program
that might have the same name, right?
You should actually only
name things the same
if it doesn't matter to
the reader which of those
is actually gonna get picked, right?
If it's gonna do the same thing.
We're definitely lacking
a solid theoretical way
to describe that same thing,
'cause it's sort of squishy, right?
Like you can't really say, like,
give me the semantic definition of
I have a function of two
arguments and a function
of three arguments, and
they do the same thing.
Like, that's gonna be just
weird to try to come up with
any sort of formal
definition of that, right?
But we sort of can see what we
mean with some examples here.
So for instance, we can overload on arity.
How many parameters the function takes.
And a great example of
this is StrCat from Abseil.
We've had a variation on StrCat
in our code base at Google for many years.
Pre C++ 11 StrCat was an overload set
of something like 25 or
26 separate functions
to go from arity 1 all
the way up to arity 25.
And it didn't matter, right?
You don't need to know which
of those you're calling
because what StrCat does
is take all of its things,
convert them to string, and return you
the concatenation there.
And even after we switch to C++ 11
and moved this to being
a variadic template,
still doesn't matter, right?
It's one thing, because
even that statement
of it being a variadic template
is slightly a lie because the
first I think five arities
are hand rolled free functions
for optimization purposes
to make it easier on the compiler,
and none of that matters, right?
Because you see a call to StrCat,
you don't have to count them,
you don't have to know
which one is called.
It just does one thing, right?
So you can clearly overload on arity
in some cases like this.
You can also overload on types,
usually for types that are similar,
and the most common example
that you're gonna find
is for legacy like stringish overloads.
There was some old
function in your code base
that accepted const char*
and someone got tired of that
and they added an overload
for const string ref.
And this is a great example
of a well-designed overload set, right?
You've got some sort of stringish data.
The user that is calling this function
or the reader of some code
that is calling this function,
this overload set, sorry, excuse me,
doesn't need to know which
type is being passed exactly
or which function is being called exactly
because we can see at a glance
that the semantics are the same, right?
In this case, one is implemented in line
in terms of the other.
We see overloads throughout
the standard library
for optimization as a
result of move semantics.
For instance, there's vector push_back.
This is an overload set.
This fits and slightly
expands our definition
of these things have the same semantics
and I don't need to know
which of these is called.
At the call site, the
user doesn't have to care
whether it's the Lvalue or
Rvalue version of push_back.
At most, they need to watch
out for use after move,
but that's true irrespective
of what API you're calling.
You always need to watch
out for use after move.
This also helps flesh out
what we mean by the same semantics.
It is the same post
condition on the vector,
not necessarily for the
T that was passed in.
However, we don't actually really care
about the post condition on the T,
because the T is either
const ref, not being changed,
is a temporary, which case we don't care,
or withstd moved and it's
definitely not our problem,
see previous result, right?
Does that all jibe?
And it's also worth noting
that the calling code,
so long as it obeys this restriction,
would be the same behavior,
not the same optimization,
if we removed the Rvalue
push_back overload.
The semantics for all of our callers
are totally the same, right?
Nothing's actually gonna change.
So with those sort of examples,
if we can overload on arity
we can overload for optimization,
we can overload for same types-ish,
same platonic notion of
types like string-ish data.
Let's look at the overload set guidance.
We can say properties
of a good overload set,
you can judge the correctness
without having to do overload resolution,
and I really like the second option here.
A single good comment can
describe the full set.
For StrCat that comment
would be something like
take all the provided arguments,
convert them to string
in the default fashion,
and return a string formed
by the concatenation
of the stringified arguments.
For the string thing for Foo it'd be
do x on the given
string, whatever that is.
For vector push_back it
would be something like
add this T to the back of the vector.
Right, we don't need to have a comment
on every individual
element of the set, right?
It's probably the case that one comment
describing the overload
set as a whole is actually
more explanatory of what that
overload set does, right?
And probably much clearer for a user.
So this pushes some of this squishiness
of what is a good overload
set back a little bit
onto what is a good comment,
which is still squishy, I can't define it,
but I'll know it when
I'll see it sort of thing.
But practically speaking,
nine times out of 10 you
can spot the bad comments
when the comment is encouraging you
to do overload resolution, right?
Is that like, you've all
seen comments like that?
That is definitely a code smell, right?
Does this all make some sense?
Good overload set.
Any questions?
There are mics.
Please feel free,
I will not be able to
like see you probably,
but please feel free to chime in.
I would love to hear from you.
It's kind of awkward.
When we start consciously
treating overload sets
as the basic unit of design,
then we start seeing them
in other places, right?
The most important overload set of all
is one that we've discussed a
lot over the last few years,
but usually not specifically in terms
of it being an overload set.
Any guesses?
Copy versus move.
I really, really like the
formulation of copy and move
as an overload set.
This actually has huge
conceptual ramifications
when we reconceptualize along those lines.
The type trait for movable
isn't stupid anymore.
It's always really bothered
me that is move constructable
didn't really you whether
it actually moved.
It only told you if syntactically
you could construct it
from an Rvalue, right?
That was like, ehhhhh.
Whereas now when you recognize
that move and copy are an overload set,
all that actually matters
is that you can construct
from a temporary, you can construct
from an Rvalue, right?
'Cause they're an overload set.
It doesn't matter which one you pick.
It is up to the type author in that model
to ensure that move is efficient
whenever it's plausible, right?
It is up to the user to
ensure that move is used
wherever it's relevant or important.
And the user doesn't
need to know if a type
has a move constructor because you don't
need to know which member of
the overload set is chosen.
This also requires that the
semantics of copy versus move
must be the same, at least with
respect to the destination.
The type, the object being constructed.
This matches the way
that the standard library
is behaving more and more.
This matches the way that
concepts for the standard library
is being defined.
This matches the behavior for papers
that I've been writing about
what the standard library promises.
More on that later in the week.
Move is an optimization of copy
is what I've been saying for a few years,
but I think the better way
to phrase it is move and copy
must be a well-designed overload set.
Does that make sense?
Interesting, explicitly
conceptualizing everything,
even constructors like move and copy,
as an overload set gives us
some guidance on things like explicit.
When you view your constructors
as an overload set,
then you start having a better idea
of when explicit applies.
Does a user need to know
which constructor was picked?
If so, make that constructor explicit.
Viewing it another way, copy and move
are the canonical constructors
that at least take parameters.
We know they're semantics.
They take a T and they make a
new T that's like the given T.
That's the canonical constructor behavior.
But if your constructor doesn't take T
but takes some other type or
maybe some other types, right?
If you'd usually be
comfortable passing T and U,
Foo and Bar, const string
ref and const char*
as an overload set, then
you're probably fine
having constructors for
both of those, right?
You could have a constructor
that accepts Bar in your Foo.
If it would be an acceptable overload set
for both Foo and Bar, right?
And that's most commonly
the case when T and U
represent the same idea, right?
These are two different types
of the same sort of canonical data.
If it is, on the other hand,
merely the case that we can construct a T
from some bag of parameters,
but those aren't basically a T,
right, this is vectors constructor
that takes a T and a size, right?
Okay that is not the same as a vector.
That is a recipe for creating a vector.
Then your constructor
should be explicit, right?
Does that make sense?
Any questions?
I'll wait just a second.
I find that we wildly, wildly
underuse explicit on constructors.
And I think the standard
library is as guilty as that
as anybody.
Like almost all constructors should
probably have been tagged explicit,
and we kind of screwed that up,
but we're good, okay.
All of that said, in overload
there's another really
common pattern that I see,
which is people attempting
to use overload sets
to enforce certain types of behavior.
And my high-level guidance
is don't use equals delete
on a member of an overload set.
Is that a question?
Is that a question?
Nope, all right.
So I somewhat regularly see
people try to delete a member
of an overload set to enforce
lifetime requirements.
Show of hands, anyone seen
someone do this in their code?
Yep, a few, yep.
Looks like maybe 5% of you.
The problem here is that no temporaries
versus things that have
the correct lifetime,
things that have the lifetime
that I actually require,
are not actually synonyms, right?
That Venn diagram like
overlaps a little bit,
but it's more disjoint than not.
Like generally speaking,
if the lifetime requirement
for a parameter to your function is not
as long as this function
call, like the default,
simple, obvious thing,
then it's gonna be a little
hard to pin down, right?
It could be this function must live
as long as this function,
or this variable must live until
the next time you call this function.
This variable must live
as long as this object.
This variable must live
until this thread completes.
This variable must live until
this call backfires, right?
All of those things are complicated.
Certainly much more complicated
than just no temporaries, right?
Those are different levels of complexity.
Starting asynchronous work
is particularly challenging.
And when you do this, the
most common workaround
for most people when
they say, like oh, geez,
you've equals deleted this,
I can't pass a temporary,
I guess I'll make this not a temporary.
Nine times out of 10 they do this
by pulling out an automatic variable.
And technically now that's gonna build,
but the odds that that,
the lifetime requirement
of that automatic variable
actually matched the requirement
of your kooky API are pretty slim, right?
In practice, your API that
is kicking off async work
or storing a reference, right,
is going to require you to
have a pretty detailed comment
in its API saying exactly
what the lifetime requirement
on that reference is.
The freeform nature, right,
the arbitrary boundless
possible complexity of that requirement,
is a whole lot more complicated
than even C++'s type system.
Right?
Equal's deleting a thing here
doesn't solve that problem.
All of which to say, the solution
to documenting lifetime
requirements on borrowed references
is either a, don't make
it a borrowed reference,
or b, document the actual requirement.
The type system like this
cannot do it for you.
If you want to equals delete
it on top of that documentation
I guess that's fine.
But it's a half measure at best.
It's a quarter measure at best,
and it gets really messy
and it's misleading,
and it's a false sense of security.
And I would not accept it in code review,
but I guess your mileage may vary.
Good?
(coughing)
Excuse me.
There's also using equals delete
or just (mumbles) a function
from an overload set in some cases,
in order to force the user
to use the move version
of a function instead of the
copy version of a function.
And in simple cases,
maybe even in most cases,
that looks fine.
That could be fine.
But in general you don't
know all of the ways
that your API is going to be used.
That is fundamental to the whole business
or providing an API in the first place.
While it might be the case that you know
that many invocations of your function
should be done via move, not copy,
you can't know that for everything, right?
If I wanna do two separate scans
on a slightly modified chunk of DNA,
it's less efficient to
call this on temporaries
'cause I have to do
the modification twice.
And if you make me contort my code
so that I only do the modification ones
but can't call your move only interface,
I can do that, of course,
but it is a little bit more awkward.
My point being, for functions
you can't really know
that nobody ever is going
to need the copy API.
And when you provide it as an option,
the calling code is certainly simpler.
Sort of at a very high
level, don't be judgy, right?
You don't know all of the ways
that your code is going to be used,
be accepting.
If you do somehow know that
copies must never ever happen,
that is almost certainly
a property of the type,
not of the function that you're
passing that data to, right?
Make a DNA class in this example.
So if you're that worried about
accidental copies adding up,
make it a separate class,
don't use string, right?
And then probably still make it copyable,
just with some special name, all right?
Maybe make it more explicit
and hard to trigger accidentally.
I've sort of snuck in here
a pass by value design sink.
Here DNAScan is accepting
a string, the DNA,
which is presumably a very large string,
by value, ooohh.
Other things like vector
push_back from earlier
do this as an overload set.
Which one is right?
That is to say, is vector's push_back
a well-designed overload set?
Should everyone always be doing that
when you're accepting a value to sink?
Or is just accepting a value fine?
And there's been a lot
of discussion on this.
In fact, I think one of Herb's keynotes
the first year or two of this
conference had a long section
that was touching on a
lot of the same things.
Spent a lot of discussion
and partial guidance on this,
and a lot of that guidance
does not agree, right?
And to some extent that is because
there are a lot of possible questions,
a lot of different scenarios
that you might be optimizing for.
So really we should be
asking some questions
before we try to come up with perfect,
all-encompassing guidance.
And the questions that you might think
that you might need to be asking,
is this a generic or am I
sinking a particular type, right?
In the DNAScan example,
I'm sinking exactly string.
That gives me some knowledge.
Or I might be sinking
exactly DNA strand, right?
And that gives me knowledge about probably
the relative costs of copying and moving
versus what the function
I'm about to do is.
Is it a question?
Although it might be good to,
I can try to repeat, but either way.
- [Man] Can you clarify the
sink, what you mean by sink?
- Can I clarify what I mean by sink?
So there are a lot of functions
where you pass a value in and
it's read and returned to you
and nothing else, right?
That's sort of normal.
Then there's things like vector push_back,
which is a sink.
It's passed in and
copied and stored, right?
So for any function that
you are accepting the input
to then copy either for storage
in the very common case,
usually like vector push_back,
or even in some cases I'm accepting it
in order to make a copy
that I'm gonna mutate.
So, you could imagine a silly function
that is print everything
capitalized, right?
Which might accept a string
and need to make a copy of it
so that it can capitalize it
before it prints it, right?
So sometimes you might have either storage
or I just need a copy.
Does that capture it?
Yeah.
Yeah, and there's also the question of
relating to whether it's generic or not.
How expensive is the function compared
to a copy or a move of that
type or those types, right?
And if it's generic, like
vector push_back, right,
all you're actually doing is
making a copy or doing a move, right?
There's basically no
overhead above and beyond
the cost of copy or move.
For something like DNAScan, right,
if I need to sink it I'm
probably also going to do
a whole bunch of work on it.
And that whole bunch
of work, it's probably
much more expensive
than a move on a string
or a move on a DNA snippet, right?
So you need to maybe weigh
those things a little bit.
But there are more questions.
There's the question of are
there multiple parameters
that are being sunk, right?
If I need to sink two or
three or four parameters
then the cross product
of const ref and ref ref
for all of those parameters means I have
a combinatorial explosion of
elements in my overload set.
And that just might
not turn out to be fun.
Certainly not fun to maintain.
There's a question of over time
as I maintain this library,
as I maintain this code base,
do I know that this is always going to be
a sink of exactly T or do I
just want T-ish things, right?
In the case of accepting strings,
you might if you have a
lot of not actually strings
in your code base but things
that convert to string_view,
you might make your sink in
terms of string_view instead
so that there's one
clear conversion point.
And then there's the question
which I think Herb raised
in his keynote a couple years ago of
can allocation reuse dominate?
And this is a case where if I have a type
who has a member variable
like a log or something,
then it could be the case that
as I pend data to that log,
maybe it's a string, it may
have to resize and reallocate
as I append more data to it.
And if I sink a new log,
sink a new string into place,
the allocation of the old one is lost
when I move the new one in, right?
And if I continue growing again
then I'm gonna have to do all of that
reallocation over and over again.
That seems like a fairly rare case,
but is not by any means unheard of.
So it is actually a thing
that you might actually have to consider,
like when you're deciding how
to accept your sink parameters.
There may even be other questions
above and beyond this five,
but I think that's a
reasonably complete set
and already very complicated.
But I will throw out the following
sort of very general guidance.
I would probably personally
provide this as the guidelines.
You probably want the overload
set of constr ref and ref ref
if the implementation of
your function is small
compared to move constructing a T.
It is a little bit more complex.
It is worse error messages,
it is worse compilation performance,
and it is probably too much of a pain
if you have multiple parameters.
Right, there's that
combinatorial explosion.
You could sink by value
if the implementation
is largely, larger cost
than move constructing a T.
Like in the DNAScan example,
I'm about to walk through everything
in it, DNA snippet, right?
That is bonkers more expensive
than moving a string.
But it also does constrain
you a little bit.
You want that to continue to be a T
and exactly a T for all time.
You don't want conversions in there.
And then there's const T ref is actually
never a terrible choice if
you don't know the answers
to these questions because it's simple
and it gives you flexibility.
It's well understd, right?
It's hard to get wrong.
That's how I would simplify that.
It's also worth noting that this gets
a little more complicated
if you're dealing
with strong exception types
and sinks that may throw
if DNAScan may throw and DNA needs
to be strong exception safe,
then you have an additional
set of constraints.
Practically speaking sinks don't usually
throw except for allocation.
If exception safety is
your primary concern
you may have to reevaluate
this a little bit.
Mostly don't pass by value
for types that are strong exception safe.
When I'm talking about non-sink overloads
historically I find that we're
talking about const char*
and const string&, I
mentioned that a little bit,
these tend to have a similar look.
In modern code we tend to
replace that overload set
with string_view.
And once we start
talking about string_view
as the string like parameter type,
then we start looking at other common
non-owning parameter types, like span,
these have unusual designs,
there are sharp edges.
There was a whole talk on
that already this morning.
Span even leads us to
a bigger can of worms,
because unlike string view,
like string view does one
thing, it is character data
span tries to be a general,
any contiguous range of type T,
but there are lots of
contiguous ranges of almost T
that you're reasonably
likely to work with.
For instance, there's pointers
versus smart pointers.
We can easily publish guidance to say,
don't pass smart pointers
by const ref in general.
If you wanna pass a pointer
not the ownership wrapping information.
So we get it ingrained, don't do this,
like identify this in code review.
Suggest const T* or even const T ref.
But types don't actually decompose, right?
A vector of unique pointer
T is not convertible
to a vector of T*.
And if you've got a
vector of owned pointers
and need to invoke a
function with a vector of T*,
there just isn't a good way to do that.
So modeling based on span, it is not hard
to imagine producing a more
generic span of T-ish things.
I've seen this in my code base as AnySpan,
which I don't love the name,
but I do increasingly like the type.
It effectively type erases
a contiguous container
of things that can be
converted to T* or T ref
in a fairly clear fashion.
And we can go further and
further down that rabbit hole.
Maybe it doesn't need to be contiguous.
Maybe it's just some form of range.
Maybe we can do this for
associative containers
and we have a map view or a set view.
Stepping back a little
bit, C++ is a language
that is all about types,
more so than basically anything else.
Overloads for non-owning
reference parameters
like string_view and span and AnySpan,
are about getting closer to duck typing,
in terms of what types are accepted,
which, give me anything
that looks like a duck
and quacks like a duck, and
I will use it as a duck.
Bjarne was talking about this
in the keynote this morning with concepts.
It's a language approach
to a very similar problem.
And those are the two main conventions
that are emerging in this space, right?
We can, in the library,
build non-owning reference
parameter types like these
or we move to more generic
code and use concepts.
And when it comes to that question
of which of these will emerge,
I don't think the community
has enough experience
to provide particularly deep guidance yet.
My suspicion is that this will come down
to whether the library of
types like string_view and span
are found to be sufficiently expressive.
If the library providers of the world
build a rich set of such types,
we'll probably go that way.
This approach has a headstart, after all.
If we invest a similar effort in concepts
and, important, we find
only a comparable set
of sharp edges for concept
usage versus view usage,
then it may be that
concepts comes to dominate.
That is a pretty significant shift
and with a lot of unknowns.
And it's unclear yet
whether everyday programmers
can write in a generic
and duck typed fashion
efficiently and safely.
We will see how that turns out,
but interestingly we already have a trial
of that happening right
now without concepts
in the form of callables, std function.
Even without concepts we
can write fairly reasonably,
something that takes in a callable
in either a library or a language fashion.
Both of these have their uses,
but I think when we're
writing everyday code,
most of us are going to
reach for the library form.
And that seems telling.
Until we have erasure
and storage for concepts,
I think we're probably going to reach
for the library solution.
If I had to guess about the future,
I'm gonna guess that we'll
devote a fair amount of energy
to both approaches and we'll
wind up with a powerful,
very useful set of concepts
and then those will be
type erased and provided as library,
like with library types that wrap them.
And most user code will
deal in those library types.
It's just a guess.
Even still, std function is a
little unusual in this class
of type erased parameter type.
'Cause when we compare to
string_view or other view types,
std function is simpler in a
couple very important ways.
First, it's only erasing one thing, right?
If I accept the std function
I'm accepting one callable thing.
Nearly every other commonly
discussed type erased type
is erasing a collection of things.
String_view and span erase
contiguous sequences.
AnySpan it erases a contiguous
sequence of not quite T.
Map view or set view
erased the ordering details
of some associative container,
et cetera, et cetera.
When you're type erasing a single thing
it is much easier as std function does
to make that an owning type.
A type where you can copy it
and not have any requirement
that the original outlive the copy.
When we do type eraser over a container,
on the other hand, over a collection,
then we generally don't
want to actually copy
all of the things in that container.
And we rapidly wind up with types
that are very, very easy
to make them dangle.
And then we get two
big schools of thought.
We can have non-owning
reference parameter types
only as parameters, right?
Have string view only as a parameter type,
never use it anywhere else.
And this school of thought will say,
non-owning reference
parameter types are okay
as long as they're only
function parameters.
And then there's the
use with caution school
of use non-owning reference
parameter types just carefully.
Like, yep, there's sharp edges there.
Just stay away from the pointy bits.
Always question storage of any such type.
There is also a third school of thought
that these types are
all completely garbage
and too hard to use and we
should throw it away entirely.
I don't see that happening,
but I have been surprised before.
Even with just these two options we have,
as a community, a
difficult choice to make,
especially in a language
with such lofty goals
as do not pay for what you do not use.
Because there are absolutely use cases
for non-owning types like string_view
above and beyond just as a parameter.
Consider your file name processing.
You could imagine a
path processing function
that takes a string view for the path
and returns a view into it for the suffix
or the file name or the directory.
But note that we're looking at this.
Using string_view on input here
means that we don't have to
overload on char* and string ref
and whatever user provided types
might be contiguous and useful.
String_view does all of
that overload work for us.
That's the point of vocabulary types.
That non-owning parameter
type as a replacement
for an overload set is very powerful
and it is why we are talking
about this right now.
We could make this design
a little bit more palatable to some people
by changing that return value
to string instead of string_view,
but forcing a copy there and
changing that return type
feels a little awkward,
especially if this wasn't
suffix but was directory, right?
If you deal in very long file names,
those might start to
actually be large copies.
That might start to add up.
Don't pay for what you don't
use, that's C++, all right?
If you can use this
style of design safely,
that sounds like a very C++ thing.
But it is awfully easy to misuse.
Take a glance at this slide.
Half of these are bugs,
and they are awfully close neighbors
to code that works just fine.
All of which is to say if
we continue to build views
and other non-owning reference parameters,
there's going to be a tension here.
I think that the basic language,
like design and evolution principles,
are gonna say yes, it's
fine to use these carefully.
If a user hurts themselves
on that sharp edge,
that's on them.
But we're definitely going to see
a lot of very caring
people offering guidance
like never use these except as a parameter
or even never use these at all.
And that is a hard tension width.
These are going to be
the most efficient ways
to express that overload set, or instance.
Personally I've been using string_view
for quite a while and I
find it pretty easy to spot
questionable use in code review.
Anytime that it is used as
anything other than a parameter
I ask why do we know
that the underlying data
will live longer than this view.
That does not work so great if you
are an almost always auto person, sorry.
But this is all sort of a long tangent
on doing type erasure for parameter types
and duck typing and a library form.
There's open questions here.
We'll see how this all plays out.
But we need to pop back
the stack a little bit.
We're done looking at
overloading on parameters
or producing parameter types
that hide that overload set for you,
and instead we're going to look
at the other important
dimension for overload sets,
which is method qualifiers.
This is a really important
variation on overloads.
You can overload member functions
based on method qualifiers,
either ref qualified or const qualified.
Overloads that vary in const qualification
tend to be of the form
access this underlying data
in a const appropriate fashion, right?
You see this in vectors,
operators, square brackets, right?
If you have a const vector
you get a const T ref.
If you have non-const vector
you get a non-const T ref.
All right?
Simple, easy, obvious overload set.
Overloads that vary on ref qualification
tend to be about optimization.
You can do one thing safely,
the Lvalue qualified version,
and if we know that we are
operating on a temporary,
or operating on an Rvalue,
we can more aggressively optimize
by leaving the object as a whole
in that dreaded, valid,
but unspecified state.
So for example, in C++
20 the string buf type
will gain an overload for str,
a ref qualified overload for str.
Here a ref qualified overload means steal.
So you can change your code
that is returning buf.str,
which has to copy out of the buffer,
to return std move of buf.str to say,
I'm done with this buffer
and because I'm done
with this buffer you don't
have to copy that string out,
you can move that string out.
When we use a pattern like this,
we don't need to worry about scary naming
for destructive member operations.
With just consistency
with higher level rules
don't operate on moved from objects
does all of the warning
that we need to do, right?
That's very handy, like you rely
on existing user experience,
and understanding and forming
these performance when
available overload sets
is also a nice way to
be future compatible.
We can all start writing this
return statement right now.
It doesn't hurt anything
and it expresses a reasonable intent,
I'm done with this buffer.
When the underlying
standard library catches up,
it'll just optimize a
little better, right?
So a future compatible design,
that's always nice to see.
When we combine const and
reference qualifier overloads,
we can keep const correctness
and provide good optimization
like in the case of optionals value.
These types of overload sets still meet
our general definition
for good overload set.
A user does not need to
know which one is called.
A single comment can describe
probably more clearly,
the behavior of the whole overload set
without having comments for
each member individually.
While we're here we
should talk a little bit
about method qualifiers on
their own without the aid
of an overload set.
So what do ref qualified
methods mean when not part
of an overload set and
what do const methods mean?
If you've got nothing but an
Rvalue-ref qualified function
that means to do once.
This is a great design
for destructive operations
and things like call this
function at most once.
It should only be used, however,
when the Lvalue equivalent
semantic would be buggy or
break the design of your type,
not just because of inefficiency, right?
This goes back to the
don't equals delete things
just because you're being judgy.
It's perfectly reasonable
for me to provide
only the Rvalue version here
because the whole type is
this is a one-time callable.
On the flip side, Lvalue
qualifying a function
says don't do this on temporaries.
This comes up almost never,
outside of overload sets,
but it does have one case
that I have been seeing,
which is we should maybe
be Lvalue qualifying
our assignment operators in general.
Like you can currently
assign to a temporary
of most user-defined types.
You currently cannot assign
to a temporary of an int.
Like we are not doing as ints do.
But if you ref qualify it like this,
then the compiler will
catch that that assignment
is probably nonsense
and not what you meant.
And practice, I don't think I've ever
actually encountered
that bug in real code,
I don't actually care that much,
but from a design consistency perspective
that's maybe an actual use
case and it sort of expresses
what the intent is.
Moving away from references,
what do we really mean when
we const qualify a thing?
Hypothetically if we marked
every method as const
and every member as mutable,
this class builds just fine.
But this is going to be an absolutely
rotten type to work with.
Const should mean const.
But there are types that
have mutable members,
and those aren't actually a problem.
But there's some question,
there's some connection there.
How do we use const and
mutable well in design?
And I suspect that there are
a couple ways to view this,
but the one that has
given me the most mileage
is the tie between const
methods, mutable members,
and thread safety.
The standard has some
things to say about this.
It says it in a very obtuse fashion.
I'm 95% sure that's the right citation.
If you squint it talks about read access,
write access, modification,
and const arguments.
According to the person
that claims responsibility
for that wording, it's horrible wording,
but the intent is roughly this.
Const accesses to standard
types do not cause data races.
Standard types are thread compatible
unless otherwise specified.
Here we have to define thread compatible
as a very hand wavy definition,
concurrent invocation
of const methods on this
type do not cause data races.
Any mutations of an instance of this type
means that all of accesses
require external synchronization,
as opposed to thread safe
where concurrent invocations
of const or noon-const methods
do not cause data races.
That's mostly things like mutex.
There is, of course, also a
thread unsafe classification,
but you should just not do that.
More on that in the next talk.
It's interesting to note
that if you build your types
out of thread compatible
or thread safe types,
and you don't use the mutable keyword
for your member variables,
then you're probably thread
compatible right outta the box.
There are some scenarios where
pointers are shared around
and that isn't quite true.
But more on that in the next talk.
In this model of things
const is less about
I am changing the internal values
and more about it is safe to
call this method concurrently.
And with that model of
things we can see at a glance
that this class is thread
unsafe unless response
is inherently thread safe.
And usually what such a
design requires is a Mutex.
But what just happened?
We started talking about
properties of types,
which means that we're
finally ready to move on
from low-level API design and
talk about higher level stuff.
But it is also important to
note that this is a bridge.
There is a bridge between these domains.
Cons is both a promise about your values
and a promise about the
ways that it is safe
for your type to interact
with the rest of the program.
And that makes that a
topic for the next talk.
And we have lots of time for questions.
I will leave this up to jog your memory.
There are microphones in both places.
- [Man] So you were talking
about the qualifiers on methods.
I'm not sure I understand
the meaning of a const Rvalue
reference type method.
- Yes, the optional value
overload set has a const ref
ref in it's overload set.
And I am 95% sure that that is only there
so that it works nicely
in generic contexts,
but like semantically it
doesn't mean anything.
- [Man] Okay, so I'm not crazy
that it sounds meaningless.
- Yeah, you're not crazy.
Yeah, it is, the first time that I took
a good hard look at it
I'm like, wait, wha?
Huh?
Yeah, you know you're well spotted.
Yeah.
- [Man] Kind of in the same
vein with ref qualified members,
you talked about the star
member of string buf.
And you talked about how interactive
with the guidance that we
not use moved from objects.
Now imagine that we had a
type that was like string buf,
but it had separate buffers
for input and output,
and had ref qualified members
that allowed us to
retrieve either of those.
If we follow the guidance
not use moved from members
and ref qualified them
both we could extract one
or the other but not both
in a destructive manner.
What are you're thoughts
on that kind of API design
there a type is safe to be
used after it's moved from
so that you can extract other
members from it destructively.
- I think it would be
really hard to express the,
I think it would be really difficult
for that to actually play out in practice
because the move constructor,
no that's not quite right.
I would be deeply skeptical to start with
because the very, very
high level principle is
don't touch it after you've
called std move on it.
Right?
Except in very, very unusual circumstances
that you do not want to get into.
And so I think you would
probably be better off
with some other naming
for those types of things,
and I haven't actually seen
a whole lot of value types
where there's multiple logical parts to it
that you would want to be consuming.
I think, perhaps, a more
accurate thing would be
that you wanted
you wanted an accessor for the
input and the output individually
that you could steal from.
And then it would be a
std move on that member,
but you'd have to sort of
make that member public,
and I don't know, it's gonna
be kind of a weird type.
- [Man] Thank you.
- Yeah.
- [Audience Member]
Actually, in the same vein
of move from types, I
guess you're saying that
advice is to never touch
a move from object.
There are cases definitely with
the standard library objects
where you potentially could reuse them
with certain constraints.
Like if you build up a
vector that's a member,
and you, once it's built
up to a certain point,
you can move out those values
but then start building up
your vector fresh again,
as opposed to having a
unique pointer to it.
I mean, do you see the
standard keeping that
kind of generic advice or do you see
certain standard types providing slightly
stronger guarantees about what you can do
with move from objects?
- I mean, you will always be able to,
in the next talk we will talk
a lot about the precondition,
like preconditions expressed
on the APIs of a type.
And you will always be
able to call any function
that has no precondition
after it has been moved from.
Whether you should is an
entirely different story, right?
It is definitely well
understood that when you move
from a unique pointer now
it is definitely null,
and so you could call reset on it
and when you move from a vector
you don't know what's in
it anymore and we all-
- [Audience Member] And were
assigned to it or something.
- Right, you could assign to
it, you could call clear on it.
You could ask it its size, right?
But you should not make any assumptions
about data being there or not.
But practically speaking, the
likelihood of encountering
a scenario where the clearest
way to write your code
actually has you reusing that
zombie husk seems rare, right?
And you're probably better
off not causing the wait wha,
of your reusing it after move.
Like, just don't poke that bear.
Like, yeah--
- [Man] I have a
personal example which
I'll talk about later.
- Yeah, I mean, yes, technically
speaking, it will work.
But there is a higher level
requirement on everyone of
don't produce code that
makes your reader go wha?
'Cause confusion costs
more than CPU cycles.
All right?
Over here.
- [Man] Would you mind
to show again the slide
with the results of function
return in string_view?
Taking the string_view
and returning part of it.
Yeah, the one with like red and,
yeah, it's beautiful.
I'm afraid it's not very safe.
I mean you marked option four as good,
and I believe it's undefined behavior.
- No.
- [Man] Your argument of
destroyed--
- Not, no.
- [Man] Your string actually--
- Temporaries are destroyed
at semicolons.
By the time the string's
copy instructor runs,
actually by the time the string's, yes,
copy constructor, move constructor?
Copy constructor, by the time
the copy constructor runs
the temporary is still there
because the temporary doesn't
go away until the semicolon.
Like, I guarantee this is fine.
John.
- [John] Hey Titus.
- What's up?
- [John] So you were
talking in the beginning
about how overload sets should
define a group of functions
that are all semantically
basically the same.
- Yep.
- [John] And you were also
talking about five minutes ago about
how it's really important
for const to be meaningful
and especially in thread's
safety situations,
and there are pretty
commonly used overload sets
like operator brackets is like this,
which often will overload on const-ness
even though giving a read only reference
and giving a mutable reference
are really, really different,
especially in a thread safety context,
but I don't think anyone in the room
would argue that that's a,
that like operator brackets
is somehow completely
a broken design like on
a vector or something.
So how would that fit into the
advice that you're giving us?
- Well so the advice is
like, at a very high level
the advice is it is
probably a good overload set
if you can have a single
comment for it, right?
And a comment for that
const non-const overload set
on vector is give me
the specified T, right?
Like give me that object,
maintaining as much const-ness
as you can if you wanna be
really wordy about it, right?
But like, that is a reasonable definition.
- [John] Awesome.
- Yeah.
- [John] Thanks.
- Yeah.
(audience member yelling)
And yes, and I will
pitch Jeff Gromer's talk
on thread compatibility and
thread safety on Thursday.
That's actually in my script
in the next part of the talk as well,
so everyone that sees both of these
will get that pitch twice, but yes,
go to Gromer's talk, it'll be great.
- [Man] I'm not gonna try to
start a debate, I suppose,
on edge cases, but I do
have some curiosities
regarding perhaps some
guidance you might offer
on how access modifiers when
used with different types
of constructors and more
importantly non-const references
when past functions,
how would you recommend
this mechanism as a tool to
prevent implicit conversion
from types, particular in my example,
I suppose const char* to standard strings,
but plenty of times where that has come up
with other situations.
- I think that actually is
the third bullet point here,
make explicit any of your constructors
that aren't an obvious easy overload.
Like, I think knowing what we know today,
we probably would have made
the const char* constructor
for string explicit so that you can spot
the fact that oh that
is an expensive copy.
- [Man] Sure, you would
encounter the same scenario
with assignment operations as well
when you're not dealing with a constructor
at that particular point as well,
but you'd still end up having
to encounter implicit conversion
for the type provided.
- I think I lost you, sorry.
- [Man] I may just be
blowing hot air I suppose.
- No, like, it is, code is very hard,
and it is much easier with examples
instead of verbally, so
come find me afterwards
if you wanna talk.
Yeah, I just can't quite do that one live.
Yeah.
- [Man] You had a slide about taking sink,
from data as sinks and
about taking it as value
versus ref versus Rvalue ref,
and the guidance was based on sort of
relatively complicated
evaluation of whether
one operation's gonna be
more expensive than another.
Is there a fundamental
reason why that's a decision
that I have to be making
as an API designer
and the compiler can't decide for me.
- The,
in this language the compiler
can't design for you.
We have too much legacy stuff,
like we can't change these behaviors.
I think in theory it is the sort of thing
that might be amenable to optimization,
or to automation, but that
would be a mad science project
first off in order to figure
that out a little bit,
because among other things, like,
it's going to change wildly
if you take a new text.
It's going to change wildly
if you call an RPC, right?
Like when you're sinking a thing,
like you need to know
the cost of those things,
and not every line of code
is equivalently costly,
and trying to teach the compiler, like,
which of these things is expensive?
That would be a neat trick.
So like in the presence of magic,
yes in theory that would be cool.
And until then it's gonna
be a little complicated,
and I don't know.
Use your best judgment.
It's hard.
- [Man] Hello, in a very early slide
you had really the general idea
of whether an overload set,
you know, if you're taking in a std string
and then it's a lightweight wrapper
around something that
goes to a const char,
pointer to a const char,
that was a good thing.
And then I think at a later point,
if I understood you correctly,
you started saying that
when you identified people
using const references
to std unique pointers, in
coder views you see that
as like an issue.
- Oh yeah.
- [Man] Is it kind of
like an issue in it's just
a nice wrapper around passing a pointer
down to some?
- It's not a wrapper
around a pointer, right?
- [Man] No, no, no, I
mean like when you make a,
add something to your overload set
just to make it easier
for people already using
unique pointers to pass
down the raw pointer?
- I,
no because the operation to
actually extract a raw pointer
from a unique pointer is a one,
whereas if you only had
a const char* overload,
well, no.
Yeah, now I see your point.
There is a logical inconsistency there.
I think it is that it is very common
for us in legacy code bases
to have char*'s floating around
and strings floating around,
and like it's nice if
you don't have to know
which one it is and
which one to care about.
Whereas, passing a unique
pointer by reference,
especially by const reference,
is just fundamentally a little silly
because you're saying
I can only invoke this
if I already have ownership of the thing,
but I'm not transferring ownership, right?
It'd be like, that would be
an okay function by itself
if you had to prove
ownership of an object,
which is a weird semantic.
I guess strictly speaking,
if it is an overload
of T* and const unique pointer ref,
I guess strictly speaking
that might be okay, but I don't know.
That's a weird, like,
I feel like that's the wrong result,
but I think you might be right.
(audience laughing)
So, yeah, I don't know,
I'll have to think about it.
But interesting, yeah.
We are strictly out of
time, but I will take Eric.
- [Eric] Hi.
- What's up?
- [Eric] Well Titus, I
think I heard you suggest
that you recommend
explicit on constructors
of more than one argument.
- If those constructors
aren't logically the thing.
Like, for any constructor
that is accepting
a bag of parameters from
which you can construct,
as opposed to the parameters it has
are platonically like the same notion
as what you are constructing.
So maybe in a ranges form two
iterators is a range, right?
But in a vector, a T and a
size is not actually a vector.
Did I head you off?
- [Eric] No I mean it's
a question I've had,
because I mean, C++ has
this language speaker
and I've never known what to do with it.
- Yeah, I think by default,
by default we should be tagging
all the constructors explicit
until you think about it
and we have the default
wrong as is often the case.
But yeah, like I really think
that explicit should be way more common.
There's a tip of the week on that.
- [Eric] Okay, thanks.
- So yep and we're outta time.
Thank you all very much and--
(audience clapping)
