- Good morning.
And welcome to my talk
on Avoiding Disasters
with Strongly Typed C++.
Let's start with putting this up.
And in other words, the
sooner we find bugs,
or the later we find
bugs, the more expensive
and hard it gets to fix them.
Who am I?
My name is Arno Lepisk.
I live in Stockholm, Sweden, where I work
as a software engineering consultant
for a company named HiQ.
I've been working with
C++ for a little bit more
than 12 years in fields such
as defense, datacom, telecom,
video games, yeah, industrial
applications, and such.
When do we want bugs to be found?
One could argue never, because
we never write bugs, right?
Has anyone never wrote a bug?
Great.
Sure, the comment was
that you don't write bugs,
they just appear.
What if we find bugs in production?
That's expensive, right?
If you build embedded systems
you may have to do recalls,
or at least send out patches.
It affects your customer
relations and such.
In extreme cases, have you
heard of the Mars Orbiter?
That was a disaster in production.
Okay, we can find bugs
in quality assurance.
Not good.
System testing, if you have automated
system tests, even better.
Unit tests, do everyone use unit tests?
What about if we could find
bugs already at compile time?
So, bugs never even get to actual code.
Oh, (mumbles).
I'm going to talk today about
special kind of bugs that
come from mixing up types
and using types in the wrong way.
Type safety.
What's type safety?
One definition is that
the language is type safe
if it prevents type
errors, and by that we mean
undesirable behavior due
to type incompatibilities.
So, is C++ type safe?
Let's look at some examples.
There's a first example.
We have a function that sets a node name
for a node identified by an int
and if we call the method
and mix up the operations,
so we try to set a name or a string
where the method accepts,
expects int or vice versa,
we get a compile error, great.
Here we have another
method that takes two ints.
Disaster.
Because even if we named our variables,
size and int, the compiler doesn't care.
It just says, ah, it's an int.
Who has made mistakes like this?
Okay, what can we do?
The first thought, well, let's use
the type def or a using clause.
Does that work?
No, and why?
Because the type def only
creates an alias for the type.
So the compiler still sees int,
even if we may, even if we
call it a different name.
Let's put our values into structs.
Now the types get, are distinct types.
So the compiler can warn or
even it gives us an error
if we mix the types up.
Great.
We're there.
As you see here, these are structs.
So, we have to use the
curly brace initialization.
We can fix that.
We add the constructor.
And this is very much (mumbles) code,
or it actually compiles,
but if you do this
for real you need, you should add
more stuff to actually, yeah,
behave, make it behave sanely.
Simon Brown made an
excellent talk yesterday
about type wrappers,
you should check it out.
Okay.
But we, but now we have constructors.
Maybe run some other methods,
and it quickly becomes
quite a lot to write
if we have to do this
for every single type
we use in a program.
Enter templates.
If you look at this template code
you see that we have a type name tag
that actually isn't used
anywhere in the type.
So, it is just what we call it.
It's a tag.
So we can make distinct types
by having distinct tags.
This creates two identical,
or not identical,
it creates two distinct types
that for every other aspect are identical.
Now, how are performance for this then?
We put it into the compiler explorer.
We have two methods,
that one takes an ordinary int
and one takes a wrapped int,
and the generated assembly is the same,
provided that you put, that
you turn on some optimization,
and use a recent, a
reasonably recent compiler.
And here's another
example where we take in
a strong int, and pass it
on to a legacy function
that takes an int also.
Because the representation of a struct
that only contains an int
is the same as an int.
Great.
But say that we actually want something
that behaves like an int,
that we can add, subtract.
So let's create a new template
called arithmetic strong int,
and why don't we just
extend the first one?
Well, because sometimes
you don't want an int,
an int-like behavior.
How many have used the
raw socket programming
on a Unix system or a Linux system?
What represents a socket?
It's an int.
What's reasonable to do with that int?
You pass it to the socket functions, yes.
But adding to it?
So, for that you don't want arithmetic.
So, let's make these out of the operators.
But, this quickly starts
to get rather much writing.
So, someone must have
thought about this before.
So, a quick Google search
for strong type defs,
you, it turns up quite a lot of results.
I picked, oh, I picked,
I am too far, sorry.
So I picked two, Jonathan
Boccara's NamedType
and Jonathan Muller's type_safe library,
and let's see how we can
use those two libraries
to do what we want.
They work in a bit different ways.
The NamedType library works
like the one I showed recently.
You use the tag.
So, for this we can use, we create an int.
This has a skill or
something called comparable.
So this adds the comparison
operators to this type.
And the same for type_safe.
That uses a bit other constructs.
There you create a new
struct and inherit from the,
from a base type for all strong types.
Call it strong type def,
and then you use CRTP,
that's a curiously
recurring template pattern,
to forward your own name
down into the library.
And then you have to
use the using construct
to make sure that the
constructors and so are available.
Crystal clear?
So, let's make an int again.
For the NamedType library, again,
we say we want an int.
We use the tag my type, my int tag,
and specify which operations we'd want.
Printable, it's for making
it possible to output it
to an O-stream.
Addable, subtractable,
multiplicable, and comparable.
No division, for some reason
in the version that I checked out.
We'll come back to that.
And for the strong type lib we inherit
from this first from
the base strong type def
and then from output
operator, integer arithmetic,
equality comparison, and
relational comparison.
And there we get something
that more or less
behaves like an int, if we do normal
arithmetics on them.
No (mumbles) operations and such.
Okay, and how do we use it?
If we define it in this way we can use
both ints interchangeably.
So, we can create two ints with values,
and we can add them, and then we can use
the static assert to check
that the type actually get,
we get the same type back.
And we can output.
But if we try to add a ordinary int
in the commented out code
we get a compile error.
That's what we wanted, right?
Sometimes we need to get
the underlying value out
and there, there these two
libraries differ how we do it.
For NamedType, you get a
member operation called get,
then you get it out, and
for the type_safe you get a,
there's a freestanding function get
that you can use to get the value out.
This becomes a little bit important later
when we do adaptions for this.
Okay, and sometimes we
need to combine types.
Here we showed we can add
two of these (mumbles)
but sometimes we want to
interact between types.
Say, if we want to calculate
a price from a base price
and an amount, then we'll need to multiply
these two together, and get a new price.
So, how do we do this?
Well, the first the table
of what kind of operations
we would like.
We can add two prices and get a price,
subtract two prices, get a new price.
The same with amount.
Multiply price with price.
That doesn't make sense, right?
Skip that, and the same thing,
multiply an amount with an amount.
Doesn't make sense.
But if we take a price times an amount
or an amount times the
price we should get a price.
So, how do we accomplish that?
In type_safe it's already there.
We add a new skill, mixed multiplication
where we say that the price should be,
you should be able to
mix it with an amount
and then it renders a new price.
For NamedType there actually isn't
anything like this built in.
But it's easy, or for
some value of easy to add.
So we add a mixed multiplicable skill.
And all of these skills for
this are implemented like this.
Here we have added an outer template.
So we say, if we read
this from the inside out
we add an operator star, which takes self
and the template (mumbles),
template M other,
gets the underlying value
and multiplies them together
and returns a T, and T in
this case is the type we have.
And we have to do this
twice, because we want
to be able to have the
operators in both ways,
like price times amount
and amount times price.
And yeah, and there's
an added helper function
called getValue that's
actually, that if you put in a,
if you put in a NamedType
it uses the .get operation
to get the type, and if
you put in a plain value
it just returns the value.
We'll come back to why
that's important soon.
And now we use the skill like this.
MixedMultiplicable
amount colon colon type,
because the inner clause is called type.
Next we have an example
of positions and offsets.
You have a position, and
then you have something else
a bit away then there's
a offset between them.
One place this comes in into
standard C++ is you take
pointers and pointer
differences, for example.
Here's the tricky part in this case,
that we have to take two
positions, subtract them
from each other, and render
another type, the offset.
How do we do that?
NamedType, we looked
at NamedType recently.
It's about the same, same method.
We have a struct SubtractToType.
Didn't come up with a better name.
There's two hard problems
in computer engineering,
cache invalidation, naming,
and off by one errors.
SubtractToType.
So here we subtract the
two underlying values,
put them into the enclosing
type, and (whistles), we're in.
And then we use it in
a manner very similar
to the way we did before.
And here we also use,
I've added MixedAddable
in a manner similar to (mumbles).
All this code it will be
available for you to look at.
And here also you can see
I used MixedMultiplicable
with a basic type.
So you can actually multiply
offsets with plain ints.
In type_safe, it's about the same method.
It's just a bit like, what you call it,
implementation details (mumbles).
And I've cut away the no except stuff
because it's just the same
thing repeated once again.
And here we can use it again.
Subtract to type, you see
that on the next to last line.
It should have been highlighted.
Okay, and then we can use this like this.
So, we create the position
and then we can create
an offset by subtracting
another position from that position.
And if we try to add two positions
we get a compile error.
Nowhere (mumbles).
And here I, when I sat and developed this
after a while it got very tedious to see
which values were supposed to compile
and which weren't supposed to compile.
So, I write a helper that actually checks
if there is an operation plus
between position and position,
so we can use static assert for this.
It's here, I'm not going
to go through this.
It's here.
Look at the video set later
or check out my GitHub.
The links will come.
Now, length, and this is what actually
started my interest in this,
because my background,
educational background at least,
isn't really computer science.
It's engineering, physics,
and there we often
when we calculate stuff by hand
the standard method to
(mumbles) checking is, yeah,
dimensional analysis and
from that we get (mumbles).
Okay, what's the, what's
wrong with this method?
What do we mean by length?
Meter, what?
- [Audience Member] Units.
- Units was, was the comment, yeah.
Precisely, what unit is the length?
Is it meters, millimeters, feet, yards?
We could create a type like we did before
saying it's meter T.
But then maybe someone
else, it's more convenient,
is more accustomed to counting in feet,
and then you have to start making
conversion operations between them,
and then you have someone
who wants millimeters.
Then you have to do the third (mumbles),
and so on, and so on.
So the number of conversion
operations grow exponentially.
But let's think again, what do we have?
Whoa.
In the C++ standard we
have the Chrono library
which does about this
thing, but with time.
You can create seconds, and milliseconds,
and add them together,
and get correct results.
So in this example I create
one variable with seconds
and then convert it to milliseconds.
And that's the ratio, 1,000, between them.
So, let's just hijack the Chrono lib.
Define one meter as one second, and so on.
No.
No!
That would take away
all the type (mumbles)
because then we could add seconds
to the length or time to length,
and that just doesn't make any sense,
and that's how, yeah, bad things happen.
But we can look at how this duration class
is implemented and maybe reuse it.
This is the gist of the duration,
or the Chrono duration defines.
You have seconds.
It's the current iteration of,
the standard doesn't actually
say it has to be in 64.
It just says set some kind of lower bound,
some lower bounds on how many bits it,
you would have to use.
The two implementations I
looked at, both use int64.
And then we have milliseconds that use
something called milli.
What's that?
STD ratio, how many of
you have use STD ratio?
Good.
For everyone else, it's a standardized way
how to do rational
arithmetics at compile time.
So, what this means is
that milli is a ratio
one through 1,000, one thousandth.
And it also, it shortens the fractions
automatically if you want.
And it defines different, yeah,
arithmetic operations on such fractions.
So, how can we use this?
Let's start our new, own length class.
We define the length.
We let the template take
care of which underlying type
we want, ints or doubles,
or vectors, or whatever,
and then we use a scale.
We default the scale to one.
And then there's lots and lots
and lots of methods in there.
Then we can make some defines for this.
We say that a meter is the base type,
and millimeter is the one thousandth
and we use the standard provided prefixes,
milli, kilo, and so on,
and we can also define
inch and foot and mile.
If you do it, if you
start counting on them,
you see that one inch is 25.4 millimeters
and if we put it into whole numbers
that becomes, so, it's
an exact representation
of an inch in meters.
And then we can add operations.
We can add these together,
and for the case when
the ratio is the same,
in this example for plus,
it becomes very easy,
because the length one
meter plus one meter
is still a meter.
What happens if you have
different underlying types?
What type should that render?
If you take one meter in ints
and add one meter in double?
(audience member mumbling)
One (mumbles), no, but we can actually,
what happens if you take a basic int
or a built in int and a built in double
and add them together?
Precisely, (mumbles) the
int is promoted to double,
and how can we do that?
Use the decltype.
So, this will be a meter of double.
(audience member mumbling)
Yeah, but it behaves, the comment was
this can be typed unsafe.
In a way, yes, on the
other hand it behaves
like we expect it to be.
Or at least as I expect.
If I add an int to a
double I expect a double.
(audience member mumbling)
It will be promoted to whatever an int
plus a (mumbles) becomes,
thanks to the decltype.
Okay, if you have different ratios,
then it becomes a bit more complicated
because we have to bring the values
to some kind of common, if
you remember from school,
how you add fractions.
You have to bring them
to a common denominator
in order to add them together,
and then that we do by taking
the greatest common divider
between the two fractions, and that, yeah,
and that's that formula.
And then we can use add
two meters to three feet,
and then we get a very strange value,
because foot, feet are
expressed as one 381
through 1,050 meters.
The common ratio between
those is one through 1,250.
So, if we try to output this
we get this strange value.
Gonna show how we use this.
But if we can convert it back
to something we actually
know how to comprehend,
and why I use these a bit strange values
is because it becomes exact.
Five inches plus eight centimeters
is exactly 207 millimeters.
And that's very convenient
if you try to do
the checks with static assert.
We can add some syntactic sugar to this.
Literal type, and then we
can write things like this.
Next thing, dimensions.
I showed addition,
subtraction is the same.
But we might want to multiply two lengths
together to get an area.
Multiply it with our length
again to get a volume.
And even take a length through
a length to get a scalar,
or some kind of scaling between them.
How do we do that?
Well, we add a dimension
int to our length.
And now we can addition
between these, this new length.
It's still the same.
The dimension has to be the same,
because it doesn't make any sense
to add a length to a volume.
But if we multiply two length,
we get dimension, lengths
through the second power
that we can do by, if you check
the second part of this slide.
And in the same time, the
scales are multiplied together
with the ratio multiplying
method that I discussed before.
So then we can do something like this.
Add some fancy output
operations and we can multiply.
Three times four feet
becomes 12 feet squared,
or one point blah, blah,
blah meters squared.
All transparent.
Next step, physical quantities.
In the same way we can
tell length times length
to get an area, we can
take length through time
to get a velocity.
Or velocity through time to
get acceleration and so on.
How do we do that?
We expand our length, and
now I call it unit instead,
with three numbers for the,
three numbers for dimension.
And we can, in the same way
with length, we can make this.
And now we can define our
types, length like this.
It's a unit of int.
Length to the first power, mass,
and time to the zero-th power.
And the area, we take the second power
and then time (mumbles) over velocity.
We get a negative power.
We can also add constants
with this little trick.
Oh, there's a line missing.
There's a template-ized
struct somewhere before this,
and now we specialize it
for ints and for doubles,
and then we can pick out G in this part,
the (mumbles),
acceler-ization towards, yeah.
And then we can use this like this to,
to calculate the velocity.
And then there are other units.
If you just think about physical units,
we have current for
electricity, the temperature.
Temperature is tricky.
Because it's zero temperature is different
for different temperature scales.
So, you have an offset.
And now things are
getting complicated again.
What can be done?
Yeah, boost units.
How many have used boost units?
Or we can calculate voltages,
no, voltages, power, like this.
Take a voltage times a current.
Very nice.
And you can do the same thing
we did before with length.
It's a little bit more writing,
but the boost units, you can also
make your own units.
One example, if you might
remember my price example,
often you have prices
expressed in price per volume,
price per length, or price per weight.
So there, for example,
you could make your own
types like that.
The difference with boost units
is that it doesn't do all
this automatic conversion.
So you have to (mumbles), not cost it,
but you have to put it into,
if you want to convert foot to length
you have to explicitly
write the quantity length
of the number of feet.
Some links to the code I've shown,
and as I, like many programmers,
like to reinvent the wheel
I did my own safe type
implementation as well.
It's there in stype.
Some conclusions.
I think, oh, I think that
by using these strong types,
you can without actually any performance,
performance hit make
your programs more safe.
If you find a (mumbles), a performance hit
you could always use the safe types
in the bug builds and
then define them away
and use the basic types in release builds.
That shouldn't be (mumbles),
shouldn't be necessary.
So, please try them, so you
might avoid some disasters.
Thank you.
(audience applauding)
Any questions, please, at the microphone.
- [Audience Member]
Yeah, I was just curious
if there were any issue, like ABI issues
between debug and release on Windows.
I know Linux doesn't usually
have those problems, but--
- I don't, I'm not familiar with Windows,
but if you do, as a, you
define them differently
for debug builds and
release builds, then yeah,
there will be ABI issues.
- [Audience Member] Sure.
- Because yeah, you have ODR, I think,
you get the ODR violations between,
because it would, if you mix release
and debug builds and stuff.
- [Audience Member] Can you address
overflow of an int,
and that kind of thing?
- Yeah, this doesn't handle that, per se.
If you want that, you would
need some kind of safe
in the implementation, what is it called?
It's called,
oh, safe (mumbles), or
something like that.
I know there's another talk
about that later this week.
So, this only handles
conversion errors.
It doesn't handle overflow.
That's the same as you would have had
by using the plain types.
Anyone else?
Okay, thank you for coming.
(audience applauding)
