- Yeah, I was really inspired
today by Herb's talk,
so I thought I'd write something
about Dangling in French and English.
And of course, I am Canadian,
and as you all know Canada is bilingual.
So I thought that I would
present some code in French.
So here is a little bit of code in French,
and you can see that there's a...
This is...
By the way, this is all about
a really, really subtle bug
and some of you may get
there way before I do,
but please bear with me.
So there's a vector of std strings there,
and of course the first thing that we do,
is push back some French.
So "Est ce une chaine?"
Which is, if you don't speak French,
this translates, well
at least Google tells me
it translates to 'is this a string?'
(laughter)
And then we grab a reference
to that first string,
and then we push back "hello", "world",
and then we...
std out, cout the first string.
And of course when you run this,
and you can run this, it outputs...
I can't speak French, sorry.
I'm supposed to learn it in high school,
but it says that this is string.
Someone out there must speak French.
So, because we are bilingual,
I thought I'd do this in English as well.
And so I said, instead of
pushing back the French,
I would push back the "Is it a string?"
And then I would do the exact same thing,
and take the first,
and then push back "hello", "world".
And of course, everyone here knows,
that when you actually run this code,
the English version of it is
segmentation fault, core dump.
So, what's exactly going on here?
Well, uh...
(quacking) Quoi?
So, if we take a look at
the lengths of the strings,
the length of the French string is 18,
because French is much more
complicated than English.
The length of the English
is only 15 characters.
And so, if we'd look at std::string,
there's a small string optimization,
and so if the length is less than 16,
the data gets stored in the small string.
This is just an
approximation of std::string.
Otherwise, it's stored in data.
So, if we look at the French,
we store the string as on the heap,
and we store pointer to
that data on the string.
If it's English, it gets
stored in the small string.
So when we look here,
we've got a std::vector
and we store the English on a string,
and then we push back on the vector.
And so when you push back on the vector,
of course that will grow the
vector and move the string.
And so, in French, we
actually have that blue box
is the std::string and it points to the
string on the heap, and then that
red box down there is char *first,
which is pointing to the same string,
which you get out from C_str.
And when you grow the vector,
that first string disappears
and we get a new string
and it's called std::move
and it moves it over.
Or if you've got something that's got
trivially relocatable,
I still can't say it.
It'll just memcpy it over.
And the pointer to *first
still points to the same place.
But of course, if you are
pointing to an English string,
it stores it in the string
itself as the small pointer.
Small data.
And then when you re-al lock
that *first is still pointing
to that scale memory,
and you end up with a tri.
So it's a really subtle bug.
It took us a long time to
figure out what was going on.
Because long strings worked
and short strings didn't.
And that's the reason why.
So, merci.
(applause)
