- [Presenter] Hi everybody,
I'm Nathan Sidwell.
I work at Facebook and
I'm here to talk to you about modules.
When I titled this talk I called it
Progress with C++ Modules
and I was thinking I
would talk to you about
how I've been implementing it in GCC
and where I've got to.
And also how the
committee has modified the
proposal as time has gone on.
But then somebody else read it said,
"oh, you're gonna tell
me how I can progress my
"own code with C++ Modules".
So, hopefully at the end
of this talk you'll see
how modules could help with your code
and the sort of timeline
we're aiming for to get this
stuff into the standard.
A friend of mine was recently
in Christchurch (mumbles)
I was sad to discover
that the container mall
is now no longer a thing that if,
they've moved on from
their earthquake recovery,
which is good so,
there we go.
Anyway, so I thought
I'd show you some code,
because that's how I tend
to understand things,
by looking at actual code
and what it might mean.
And here's a bit of a module,
or modules,
would like to import some
modules that already exist,
import Foo and import Bar,
and this is regular,
this non modular code itself,
it's not part of a module
because I haven't got
a module header file there,
a header declaration.
Rather than with a hash include,
if I had a hash include foo
and then I see some foo
definitions and stuff.
Here I just have import the two modules
and I can use some function from Foo,
and I can some type from Bar,
just as if they were
declarations in the text
that I've already seen.
But as we'll get into it,
there's reasons why we
prefer to do it this way.
Well this way should have some advantages.
One thing that it shows is that
modules are orthogonal to namespaces.
You can have multiple
modules all adding symbols
to the same set of namespaces.
And you kind of need
that if you think about
how you might modularize the STL
you wouldn't have one
massive module for the STL,
you might have pieces of
the STL as separate modules.
Unlike header files,
if I then try and call some function
that's exposed in a header file of Foo,
if this was a header file, may
have had to make that visible
because C++ header files.
With modules, this would
be an internal function.
It wouldn't be visible to
importers of the module.
They would get a compile time error.
So A, you're not polluted,
you can't accidentally
use implementation details
the module author didn't
want to make visible.
And similarly, if I declare a function
that happen all type or whatever,
that happens to have the same name
as an internal feature of a
module, I won't get a collision.
These are gonna be different things.
This declaration here,
is not the same thing as
an identically named thing
inside module Foo.
So again we got a nice interface
with a demarked boundary.
So what's this look like
for the actual declaration of the module?
So the concept here is you
got, a single module interface
that describes the things
that importers of that module can see
and zero or more implementation units
that provide the
implementation for the module.
And here's an interface unit.
You start by saying, export module Foo.
So this is part of module Foo
and it's the interface,
it's got the export keyword.
The interface file is the only place where
the export keyword will appear.
And I'm exporting
a function called frob,
text in, returns an
int, not very exciting.
Does something.
This function frob will
be visible to importers
of the interface.
That import Foo and they will see frob.
They will not see this declaration here
of accumulator and
similarly they won't see,
you wouldn't expect 'em to see
something with internal
linkage, so they won't see that.
This thing here, this accumulator,
has got a new kinda
linkage, module linkage.
It's visible inside the module itself,
so all pieces of Foo can
see that accumulator,
but nothing else can.
Now an implementation unit for this,
we will have a definition of frob.
So that's got part of module Foo
but it doesn't have the export keyword.
It's implementation unit.
And that piece of code,
that declaration there,
means it's then seeing all the
pieces of the interface unit
that have module linkage
or export linkage.
So this bit will not see this
unused static declaration,
but I will see the declaration of frob.
I don't have the export
keyword in front of it,
provided we will get a
definition of this thing.
I don't have to have a
declaration of the accumulator.
That's automatically seen
because I've pulled it in
by the module declaration statement.
That's all very nice
and I can have as many
implementation units as I feel like
for my module Foo.
I can only have one interface unit.
But I can do better than that.
In this case, this is
a very simple module,
but I've got two source files here.
Don't need to do it that way.
I can do it all in one file.
And I actually did a bit of it in one file
in the original example, but
you may not have noticed,
is that this accumulator
was already a definition of
a module linkage variable.
But now I've put in
the body of frob itself
in the interface file.
I can't do that with header files.
If I did that with a header file,
I'll end up with multiple definitions
of accumulator and frob.
Null replaces it, hash included it,
and then I've got an ODR violation.
So there's no reason why you'd
have to have split interface
and implementation units,
you can just have one file
with all the bits in it,
and explicitly mention
where thing are exported
and use a seam.
That's all nice.
Now, I've sort of covered
some of the reasons
for why you'd want modules.
But actually, when I was talking to
other co-workers in Facebook
and I was saying, well why modules?
What they were most interested
in was the build speed.
What really bites in building
large software systems
is the time taken to
pre-process and parse all the header files
that don't often change.
If you've ever looked at,
if you've ever had the fun
of debugging a compiler,
generally I take a pre-processed output
and figure out what's going on
and I get several hundred
thousand lines of header file
followed by 50 lines
of code that exploded.
So you can see that there's quite a
large amount of code that's
continually be re-parsed,
and the idea of the modules is that
that time will go away.
Clearly better encapsulation
stuff is all good stuff.
But the thing that got people interested
was how to improve build speeds.
So if modules are to improve build speeds,
we want to be able to not have to re-parse
the module interface units
in every time they're being imported.
So the main idea here is that
compilation systems do
use the compiling modules,
take your interface source file,
produce an object file in the usual way
that an object file would be generated.
Remember the interface
may have code in it that,
unlike a header file,
but also produce this
other artifact of binary module interface,
which encodes the pieces
of the module interface
that importers need to know about.
So it's some kind of serialization
of the export declarations
and all the types and whatnot.
And then when you import a module,
what happens is the user source
gets passed by the compiler.
It'll then have to go find
this binary module interface
in ways that I may get onto,
read that state in, and
then carry on compiling.
So the idea is that
the time taken to read this
binary module interface
into the compiler,
has gotta be less than
the time it would take
to reparse the interface
unit in the first place.
Otherwise you're kind of
slowing compilation down.
Now, I've shown this as
a compiler squirting out
two things as equal,
during the single compilation.
Different compilers may
take different choices
and actually have the
binary module interface
as some kind of stage of compilation
in the same way you get from source code
to pre-processed code to
assembly code to object code
in that kind of compilation system.
That doesn't really change the details.
So that's all nice.
If you get module working, hooray.
But there's a problem.
We don't have modules
now, we have header files.
And header files have
no kind of ownership of
the things that they declare,
where the modules, they have
some kind of ownership idea
behind them and they make
pieces visible and not visible.
So there's a problem
with how do we get to,
how do we add modules incrementally
to the code bases that we currently have
and still use header files for the things
that haven't been converted yet?
So here's a module interface unit.
We have module Foo and
wanna export something
that uses std::string.
So somewhere I've gotta put
hash include of std::string.
And I can't put it after
this module declaration
'cause if I did that, it's
textually included in here,
and everything inside
it will suddenly become,
get ownership of module Foo
and be a different set of
stuff to somebody else's
hash include of std::string.
So the TS had the concept
of a global module fragment,
where before you had the module
declaration statement itself
you had this bit of code,
that said module, this
is global module stuff
and you put all your old
legacy hash incudes there.
And this piece of code,
this piece of the source,
has no module ownership
associated with it.
And different module interfaces
could include different
or the same stuff here
as they felt like.
And when you include, when
you imported those modules
themselves into some other bit of code,
you would have to do some kind of merging
between declarations of the same type,
using the typing rules
and all that stuff that we know about,
would have to do the right things.
And each one, essentially
each one of these pieces
of global module fragment is
kind of like a different view
that may or may not overlap
with another interfaces view
of this unnamed global module thing.
You can think about all
code we currently write,
is currently in the global
module, if that helps.
Right.
Okay, so if we go back
to how we build modules
or build code with modules,
say I've got two modules here that depend,
each import Bar and Baz,
and Bar and Baz themselves import Foo,
so I import Bar and Baz.
Unlike header files, with header files
compilation is embarrassingly
parallelizable.
We can just have as much
compute resource we like
and we can just compile everything,
without regard toward
ordering of the pieces
of the source base.
But here, because compiling and interface,
in mix this BMI and we need the
BMI to import that interface
we have to build Foo or the Foo interface
before we can build the Baz interface
and we have to build that
before we build this.
So we're no longer, our
parallelism is constrained.
And you can see here, we've
got like a three stage
of building our code.
And if you think about what's going on
in all of these things,
in just one of these interfaces,
we're gonna spend, that
interface is gonna have
a bit of global module fragment
and a set of declarations that
it's exporting and whatnot.
And so the time parsing this
is gonna spend time reading
the string header file,
bunch of other pieces of stl,
'cause string is probably not isolated.
It pulls in stuff,
some random old library and what not.
Before you know it, you've
parsed several hundred thousand
lines of codes before
you even got to here.
And if you actually look
at actual source bases,
more than 90% of the lines that go
or declarations that go into the compiler,
into the parser proper,
are in this piece of code,
which you'll have to compile
in every interface unit.
So you're repeating that compilation.
And as I said, when you're
importing these things
and they imported things that
had a global module fragment
that overlapped with something else,
you have a de-duplication problem
in the loading mechanism of the compiler,
which also may or may not be expensive.
One of the things that popped
out of last week's meetings
about modules is that we decided
about one way of ameliorating that is that
only the declarations that are referenced
should be things in the BMI.
But has yet to actually
get to a paper stage
and be at San Diego.
Anyway, thinking about this problem,
we're going to,
well header files themselves
are kind of modularish themselves.
If they're a well behaved header file,
they're item potent,
they don't declare things
that are declared in other header files
and all that kind of stuff,
you tend to compile your header files,
use your header files consistently
across your source base.
So wouldn't it be nice
if instead of having,
if you could treat those header files
that you already have
as if they were modules
and then your build graph
gets more complicated.
You know, here's the the Foo,
Bar, and Baz, and x and y
modules that I had.
But I've kind of got
this sub-graph at the top
of the header files themselves.
Now okay, I've made this
like five steps deep
rather than three, isn't that worse?
Well the idea is that,
if the sizes of header files
are what I claim them to be,
the size, the time taken
on each of these steps
is much, much less than
it was in the earlier case
with the global module fragment.
And thinking on this
is what led to the proposal
that turned up earlier this year in about
May for the Jacksonville meeting
if I'm remembering correctly,
though it may be March,
Another Take On Modules.
And that was from work done
at Google with clang modules
which have this kind of assumption
that header files are
sufficiently well behaved
to be able to treat them as
modules to a certain degree.
And another take on modules
was accidental acronym,
which is why it's called
the ATOM proposal.
It's a useful name to call
it, so that's what it is.
It had the concept of legacy header units,
which are sufficiently
well behaved header files.
And you have these things,
then I can say, not only
can I say import Foo,
I can say import and then
a thing that looks like
what you give to a hash include statement,
except of course it's not
a pre-processor statement.
It's got a semicolon on the end.
It is a module import
statement, with a funny name.
And both, are allowed with
quotes or angle brackets
and the interesting parsing rules
because angle brackets and
header names are not exactly,
don't need to follow the
same rules as quoted strings.
They're implementation defined.
Anyway, the idea is that
that earlier example,
instead of hash including these two things
in a global module fragment,
I'd import them directly here
and then I can just go away and use types
or whatever that they export.
To generate the BMI from a
header file, like string,
you have a special compilation mode
and it's implementation defined
about exactly how you invoke
the compiler in this way.
Essentially everything in that header file
becomes exported from the
unnamed global module.
So there's no sort of module
ownership, again with this.
And it's blind to earlier hash defines.
So in this particular case here,
if old-lib for instance,
had two different modes of compilation
and it was sent to a hash define,
you setting that hash define
before you imported it
would make no difference
to the BMI that you got.
You control that kinda
stuff when you build the
old-lib.h as a legacy header.
And because they're legacy headers,
they're from header files,
header files export macros
as part of their interface
so the hash defined from
these legacy headers
are also become visible just in this code.
Okay, they're not exported from this code.
You can't re-export them
and there's certain rules
about making sure the macros
are consistent across compilations.
So that was one thing that
came out of the ATOM proposal.
Another thing that came out
was this idea of module partitions.
There's a difficulty with the TS,
with the idea of modules as
encapsulations is that...
Where we go, all right, yeah.
If I've got two mutually
dependent types or whatever,
the only way of dealing
with that in the TS,
was either to put them in the same module
or use a thing called proclaiming
ownership declarations.
And that kinda would
work, but it restricted
how you could organize your code.
But more importantly, if
you made them sub-module,
different modules and then you were
proclaiming ownership declarations,
and then you refactored your
code and moved things around,
you would start breaking
binary compatibility
with older versions of your library,
because the ownership of the module...
So these are not partitions,
'cause there's a dot there,
okay these are regular modules,
that just happen to have similar names.
If I move something from
part one to part two,
I've suddenly changed its module ownership
and that was reflected name
managing and all sorts of stuff.
And all of sudden, oh I can't
link with an older version of
myself or whatever.
So the idea behind
partitions is that they're,
they allow you to break a module up
into different translation units,
but all of those translation units
are part of the same module.
And here's an example of,
I'm exporting module Baz,
and consists of two partitions,
part one and part two
and I'm re-exporting both of those
from the module declaration.
And I can move things
between part one and part two
without breaking this ownership concept.
And here's how I would
declare a piece of part one,
I use a colon in the name.
And now you'll notice here the syntax,
I don't specify the module
whose partition I'm importing.
And that's important because then
module partition are only
visible within the module.
An importer of the
module, that is not module
will only see the stuff that the main,
the primary interface
unit makes available,
be that directly within it,
or by importing and
re-exporting partitions.
Okay now part two of partitions
is another of this issue,
is that I've got some type
that I want to make visible to users,
but I want to keep it opaque to users,
I would declare an incomplete
type and export it, hooray.
But then within the module itself,
I might, probably want to actually
provide a definition for this type
so I can manipulate it and give
it state and what have you.
And again, the way of doing
that was with proclaiming...
Hang on, yeah I'm going
and get confuse that.
Anyway, partitions allow you to do that,
because I can import, but
not re-export a partition
that provides an
implementation of my type.
So here's a partition
providing an implementation of my type.
I import it into the main module unit
so the other pieces of the program
can actually see it when necessary
or rather they don't all
they have to individually
import this implementation unit.
But importers of module
Baz, will just see it
as an incomplete type.
And it essentially
allows some much simpler
exported semantics rules
than what the TS specified.
And allows proclaiming
ownership declarations
which look like this to go away.
Those were unpopular,
but they were the best
solution at the time.
Partitions allows a better solution.
All right, yes, so the third
piece of the ATOM proposal
was this concept of module preamble
and that's being modified a bit.
But in all the examples I've shown you,
I've shown the imports
right at the beginning,
but the TS doesn't specify
that's a requirement.
The imports can be scattered
throughout the source,
so long as they're all
in the global scope,
where in ATOM required them all up front,
which had two important features.
One was there were no existing decls
for the declarations of
entities or what have you
for them to get tangled up with,
which makes implementation a bit easier.
But the other thing is once you've reached
the end of this preamble,
you know there are no further imports,
so you can sort of
finalize internal states
and some other stuff like that.
And you also can, if you're
trying to find dependencies
of a piece of code, i.e. what
modules it's gonna import,
you need to just pre-process
it until you get a thing
that isn't an import.
Now turns out that's a
little bit too restrictive
for how to get to a module
world from where we are
and it is useful to have
imports at other points.
But nevertheless, the preamble
is still a useful concept.
Right, there we go.
So, oh and right yes, the
legacy import hash defines
was another thing that
again got changed last week
is whether importing old-lib
would make its hash defines
immediately available
after this semicolon or at
the end of the preamble,
and we've moved back to
having it available immediately
rather than the end of the preamble.
Trouble with having it at
the end of the preamble
there are some horribly
complicated corner cases,
that to get right,
require backtracking in the pre-processor.
And I implemented that by essentially
re-execing the compiler with
a flag saying stop here,
because that was the way
of getting it to work,
which is horribly inefficient.
So we have the TS,
which was published earlier this year,
which is essentially a snapshot in time.
I think of the TS as a moving document
that's updated as issues are resolved.
And we have the ATOM proposal
which was another paper
presented at the Jacksonville meeting.
So the situation we are at the moment,
is that some of the ATOM proposal
went into the working document of modules.
At the moment, that's
actually a dif of a dif,
because of reasons,
but we kind of need to put
it in one consistent document
that's the now a dif against the standard.
So what's gone in though,
is the concept of legacy import units,
partitions, and some of
the preamble for modules
declarations themselves,
so that's the got a module declaration,
then you have a preamble.
But if you have regular
code just importing modules,
you don't nescessarily need to
have it all in the preamble.
Some of the TS pieces got removed.
One was the proclaiming
ownership declarations,
which I just had,
and it had some reachable
semantic properties
which was a way of
describing that if you have
an exported declaration, then
you add more detail to it,
which properties are seen by importers
and which are seen
internally to the module.
And although that was well defined,
it has some downsides
in making re-code
refactoring quite brittle.
The ATOM rule is essentially
that the visibility of type
seen from a module interface unit
are the cumulative set of types,
or properties at the end of
compiling that interface,
which is fairly similar to
the rules of type completion
that we have in C++ as it is now,
so that shouldn't be a surprise to people.
There are some pieces of the
document that aren't complete.
When I wrote these slides,
they weren't complete.
They are more complete now.
There will be papers in the future,
to actually nail them down.
One is to do with
argument dependent lookup
when instantiating templates
exported from a module
and there's a kind of parts
of instantiation concept
that you need particularly
for the legacy header cases
where you've not seen the type,
where the type that
you're instantiating on
is not at the definition of the template
and it's not at the initial
instantiation point,
but somewhere intermediate
in instantiation point.
And if ADL look up fails,
you will get a surprise
and that would be bad,
so that's why we're addressing that one.
One piece that went in
is Global Module Pruning.
If module interface units
have, include std::string
in their global module fragment,
do we really need to
write into their BMIs,
everything in the string header file,
which is an awful lot of stuff.
And the experiments done by Microsoft
show that that's like
several megabytes of stuff
and you may have used
like three bits of it.
If you do some pruning on that
and only write out stuff that got touched,
that several megabytes
reduces down to a few K.
So it's a good optimization there.
Simplification at the end preamble.
I mentioned that ATOM
rule required backtracking
in the pre-processor.
Let's not do that.
And inline partitions.
The example I showed of a type
that we wanted to make incomplete
to importers of the module
but complete within the module,
I needed two translation units,
which is a little
restrictive in source layout,
so it would be nice if we
could have a way of saying,
of here's the module interface unit,
and then at some point in
the file say right now,
I'm writing implementation stuff.
Everything seen below here is not visible
to importers of this module.
Expect all this to be resolved
in the next couple of months.
Right, so how do you drive the compiler
that I'm working on?
Well, you have to go build
it yourself at the moment.
There's a wikipage of where it is.
It's on an upstream branch,
the source code's all there.
I work directly on the upstream branch.
If you're feeling confident
about building GCC, go for it.
Invoking a compiler
use the fmodules-ts option.
That's the same option that clang has,
so it shouldn't be too much of a surprise.
So if this is a module interface unit
that does something
hello-y, you'll get .o file
as would in the normal way.
You have to link in.
And then you get a binary module interface
and that's new module system,
okay so anybody who was wondering
which is the binary module interface.
And compile the thing that imports that.
Again, I have to tell it the mod.
If you don't give the module TS key flag
you will get parse errors
and I think you'll get a
warning on that compiler,
saying oh by the way you're using module
and it's gonna be a
keyword, so something's up.
That will read in the BMI
and it's limits main.o which
you can link in the usual way
and off you go.
For ATOM, right, because
ATOM was interest experiment
I also implemented pieces
of the ATOM proposal
under a separate flag.
Instead of saying modules-ts,
you say modules-atom and
you get the ATOM semantics.
You can't mix them, because
I was feeling paranoid
and didn't want to deal with
the headache of mixing them.
I expect over the next few months
the modules-atom flag will go away
as I merge more of the pieces
of ATOM into the main flag,
as that's becoming more well defined.
Oh and legacy header mode.
So in legacy header mode,
you're taking your header file
and you compile it with -c,
this is a bit like building a PCH file
where you say PCH and
this is a -fmodule-legacy,
I'm building a legacy header file
and it's got some
various options on there.
Right, okay.
All right, I've got plenty
of time for questions.
I've got some spare slides as well.
Oh, right, yeah I may as
well go onto that one.
So time line is this is
kind of where we got to.
Like I said, that workshop was last week.
It was very productive.
San Diego meeting is essentially,
I think this is where
a lot of this stuff will
hopefully be finalized,
where we can go to publication next year
and ways that Herb Sutter and others
understand much better than I do.
Okay, right, yes, questions.
- [Audience Member] So
we hate header files
because they are huge
and they're huge because
of templates largely.
- Yes.
- [Audience Member] So to my knowledge,
second hand knowledge, it was attempted to
create compiler that allows
you to export templates
without actually putting
the need of headers
and from what I heard it was--
- Are you talking about the export,
exported template stuff of the C++ 98?
Right yes, so.
- No, now the question is--
- Yes.
- [Audience Members] Why is
this not problem in modules
or is it a problem and
how to import templates?
- It's the same problem.
The difficulty that I think
with exporting templates
is I know ADG who implemented
it in their compiler
ran into quite a few
interesting problem cases
and certainly David and
John have been helpful in
advising on this work
and being participating
in figuring out some of the semantics.
Like that path of instantiation thing
was a thing that came out of their work.
What may have make it
less of a problem now,
is like the world's 64 bits,
you know, we have much
more memory and the like,
then we had in the like,
the end of the '90s.
So the complexity is probably there
but the machines are
that much more advanced.
- [Audience Members] So it's
really problem didn't change--
- Yeah, I mean,
I don't think it's a harder problem
than reparsing the source code.
'Cause reparsing the header
files like we do at the moment,
the compiler has gotta do
all the semantic checking,
because it has no idea
whether it's correct or not,
whereas in the module stuff,
is it can essentially slurp in that BMI.
Now one of the things
that all three compilers,
Clang, Microsoft, and GTC
all do is lazy-loading of the modules.
The import itself causes a
small bit of bookkeeping work,
but then it actually does the heavyweight
deserialization when you start looking
at things in the module.
And I think that's another way
that this is gonna improve build speeds
is the fact that you can now
do lazy loading of things.
(audience member mumbles)
- Okay.
- [Questioner] So there's
but one interface file
for a module, correct?
- Yes.
- [Questioner] And so for any
reasonably large scale project
presumably you'd have
a number of partitions
and then your one interface
would just call on those.
- Yes, either you would break it up into,
well I call them sub-modules,
because the dotted sequence of names,
looks like it's sort of
some kind of hierarchy,
but it is in fact,
that dotted sequence is
not giving you any...
It's not a true hierarchy.
You can give 'em any names
and then accumulate them
all into one module that,
one interface that imports and re-exports
all the sub-modules.
And those could either be partitions,
in which case you get
this freedom of movement
between things and it's
more tightly coupled
or it could be this set of
modules in their own right,
which you just choose to
expose as a bundle of stuff.
- [Questioner] And also, I assume that
inlining is preserved all through this.
- Yeah, certainly, you know,
if in the interface file
you exported an inline function,
it would be very stupid implementation
that didn't write the
body of that function
into the BMI so that it could be inlined.
But one of the interesting
things you can do,
excitingly interesting
things you do in called,
in a header file, functions
are return landers,
they're generally inline,
because that's the only
way you can write them
in a header file.
But you don't need to do that anymore.
You could write a non-inline function
that returned a lander
and then the lander somehow
gets propagated into importers
and interestingly, exciting ways.
Whether this is a think
that people want to do,
I have no idea, but the
specification allows that to happen.
- [Questioner] Well I mean
I don't use the word inline anymore,
I just let the compiler figure it out.
- Yeah, certainly you know,
clang uses the earlier picture
where I said it was step of compilation.
So in their BMI they
essentially have the entirety
of the import source.
I've done it as a separate,
on the side thing,
so I only write out the
stuff that is needed
and rely on something
link time optimization
to do the heavy weight
optimizations later.
You know, implementation choice.
Okay.
- [Male] So you said at the beginning
that the compilation graph changes
because before it was all parallel
'cause you don't have any
dependencies on the header
that you have to call first.
- Yes.
- [Male] So now that we have modules,
and if we have models, hopefully a yes,
how does the compiler know,
well I guess the build system knows,
in which order it has
to compile the modules.
- Yeah, excellent question.
Everybody asks that
question, including me.
It's an open question.
Tooling is thinking about this.
The tooling subgroup has
started addressing this
and thinking of ways of doing it.
And Boris, who works on build too
has been thinking about this.
One of the things that I
have done in the last year
is actually an interface
between the compiler
and something else that knows this stuff,
and thereby made the compiler agnostic
about what this stuff is.
And it will be great if that ends up
taking hold and allowing
people to experiment
with how you do this stuff.
Again, some of the rules
that we had in ATOM
made it very hard to do pre-processing,
because as you pre-process,
you want to pull in,
and if you import a legacy header
and you carry on pre-processing
you should really have the macros
from that legacy import,
so you have to kind of
load a BMI during the
pre-processing parse.
That sounds complicated.
And it becomes very hard to
find the graph in general,
the dependency graph in general up front.
You sort of find it during compilation
and then it's too late
unless you were doing something different.
So I expect build systems
to experiment in here,
and we'll see what happens, but yeah.
- [Man] The last time I played
with clang's implementation
of the modules-ts or or
maybe darwin modules,
I don't know what it was exactly,
you had to, when you used
the module in some file,
you explicitly had to
parse the name of the file
for the BMI as well,
whereas in your slide it looked like GCC
was looking for the BMIs
that had already been compiled somewhere--
- Right, yes.
I did a sleight of hand there.
The clang modules, when you
provide these module map files
which are human written files that
know this mapping
and some human has figured out
oh, these header files
can be legacy headers
and the clang module sort
of does that by turning
hash includes into equivalent
of import declaration,
so it's slightly different.
Clang-ts is something else.
But I'm not sure of how that
is implemented at the moment.
Anyway, so...
The standard specified
no particular mapping
between module named Foo and
where you might find the BMI
or what the BMI's called
or anything like that.
So that kind of requires some
external agent to know that.
What I have in GCC is an
interface to a default thing
that provides a very simple mapping,
but there's this protocol
which is kind of experimental,
but interesting, and I haven't
written a paper on it yet,
so it's in the documentation,
but it's in flux,
where it can ask this kind a question.
It can ask this question of thing,
I'm importing Bob, please
tell me where it is
or I'm exporting Bob, please
tell me where to write it to.
And that's kind of the level
of stuff that's going on there.
But again, as with the
other question I had,
it's an interesting open question
which I don't think we have,
I don't think anybody has
a complete answer to it.
'Cause there are other users of module,
like you know static checkers
and analysis machines
that may not want BMIs.
They may want to look at the source again
or something like that.
It's not just compilers.
- Yeah, okay, thanks.
- Yeah.
- [Audience Member] Can
you go back to the slide
where you have the compile indications?
- Yeah.
That one do ya?
- The previous one.
- Not the--
- That one.
- [Audience Member] So if you
look at the way this works,
this is exactly how FORTRAN works
is that you compile and
then you get an .o file
and you then you get the mult file.
- Yes.
- [Audience Member] And
FORTRAN doesn't have macros
or any of that, but just having this,
writing a build system for this,
that's actually reliable and fast,
incredibly difficult.
- Yeah, I know I've talked to
one of the red hat developers
who had worked on FORTRAN
and his comment on how
he debugs FORTRAN bugs
is he invokes make sufficiently many times
for it complete successfully.
And this does not strike
me as a stable algorithm.
- [Audience Member] So
Intel's official documentation
also says that if your compiles fail,
just reinvoke make until it works,
and this is without any of the macros
or any of the things that
are difficult in C++.
So maybe this is not the best
possible way of doing this.
Just throwing the idea out.
- Right, yeah, like I say,
I think we're all focusing
on the same question.
Hang on, I think I've got a...
What have I got.
Yeah, so with header files,
we use the file system to do this mapping.
With modules, we end up with a magic box.
You know, actually,
I spoke at the GNU cauldron
earlier this month,
and I spoke both about the
internal of the compiler,
so if you wanna learn
about internals of GCC,
go have a look at that.
But I also had a separate talk
that talked about this problem.
That video should be available
in the next month or so,
which may or may not
be interesting to you.
- [Participant] As a library developer,
am I expected to, in a
modules-enabled future,
am I expected to produce a single module
to my users and use includes
for everything internally,
or am I expected to build little
modules internally as well.
- Yeah, I would thought you
would do modules internally.
I don't think the binary module interface
is gonna be very tightly
coupled to the compilation system.
I have it so couple to
it because I'm paranoid,
that it's coupled to the
build data of the compiler,
otherwise I would just go
insane with weird errors.
So I don't think,
the binary module interface
you should think of
as a caching artifact and
not a distributable artifact.
Having some kind of,
people ask about, you know can
we have some kind of common
interchange format
shared across compilers.
That is just too much
hard work to be reality.
So we have a way of
distributing source code,
it's source code.
So we should carry on
doing that if we need to.
- [Man] So the more I
understand about models,
the more complicated this looks.
Do you think it's even worth it?
- I'm sorry I can't
quite hear your question.
- [Man] The more I
understand about models,
the more complicated it
looks, this whole system.
Do you think it's even worth it?
- Yes.
- All right.
(audience laughs)
- [Attendee] Before modules
one way to improve build times
were to use some compiler specific
pre-compiled headers features,
so I wonder if we could maybe
you know take some experience
from the existing world
of pre-compiled headers
and actually standardize
this process and build
modules on top of this.
What's your thoughts about this?
- Yeah, that's excellent point.
Actually, quite interesting,
when I started on this project
I was talking to the engineers at Google
that tried to have
composable PCH files in GCC
and they were, Ian was
very concerned that I was
trying to make the
same, do the same stuff,
because it didn't work.
In general...
So although I said header files are a lot,
a lot of header files are well defined,
a lot of them are really quite awkward,
particularly some pieces
in the C library to do with
where types get declared
for historical reasons,
and it can't be kind of automated.
And PCH implementations that I'm aware of
are essentially a memory dump
of the image of the compiler
after the very first
include has been processed.
So they work fine if your
all programs start with
has include everything, end quote
and then get on with their stuff,
but if your source code
consists of a set individual
hash includes that vary
between different sources,
PCH tends not to work so well,
'cause it's not composable.
It is, like I say, it's
just a memory image.
And in fact, the GCC implementation of PCH
essentially leverages
its garbage collection
machinery and just
serializes all that stuff.
So this stuff that I've been doing
is completely separate from that.
- [Male] Since this is,
the goal of that is to
improve the compilation times,
are there some hard
numbers in large projects,
how many percentage this is
actually faster in the end?
- Yeah, I can't give you hard numbers
'cause this implementation that I have
is not sufficiently complete.
I did see Richard Smith
creep into the audience
and maybe he's got some
clang numbers on Google's
that the can share or maybe he hasn't.
Here we go, I'll just point at him.
No I can't see him.
I'm getting a light
shining right in my face,
so I can't really see anything.
I don't have hard numbers.
This development stuff
I work quite closely with the clang guys,
and they have experienced
some clang modules,
that yes it's worth it.
- All right, thank you.
- [Audience Member] Hey,
so from what I recall
of the export template feature
that has since gone away,
an exported template
effectively put the source code
and template into a pre-compiled file.
Wasn't really pre-compiled,
it was just a template
and then the declaration
of the export template
sort of an implicit include
of that source code.
- Yeah.
- [Audience Member] Really
accomplished nothing.
And I'm wondering, but for modules,
I mean so you can have an AST
representation of the template
but the instantiation of it
when the argument deduction
takes place and the optimization of code,
isn't that really the lion's share
of the cost of compiling a template
and then it's the cost of
compiling modern C++ programs.
- Right...
That is probably a significant cost,
but they maybe sharing,
because of course if I am
instantiating the template
in my code that's imported something,
there may well be an
instantiation of that template
in stuff that I imported already, right?
So I won't need to
reinstantiate everything
if I'm using the common set of types.
Now if you gotta unique type,
then obviously that argument doesn't hold.
So yes, I think the answer's in theory,
that you're correct, it could dominate,
but I'm going back to
the data I understand
from clang modules is that in practice,
it's still a win to do this serialization.
One of the things with
the export template stuff,
that was kind of exported per template
and there was no kind of,
and then you get to horrible,
awkward questions about,
well which piece is else
do I need to write out into
some kind of repository,
which turned out to be
a difficult problem.
This makes it much simpler,
'cause the program is explicitly saying,
this is part of my module
and these things are visible
to the template file,
imports or because the declarations there.
So yeah, I mean okay,
export template didn't work so well.
Are we going down the same road?
I don't think we are.
- [Participant] I have a
question about something
you sort of glossed over with respect to
the ATOM proposal.
- Sure.
- [Participant] How it defines
before you do a hash include,
modifying the actual,
you know effects code in the has include.
I know that is not
uncommon in some modules,
and I can think of one in particular,
the PKCS11 module, where
the header file is actually
structured with macros
to generate basically
two completely set, you
know, implementation files.
- Right.
- [Participant] And I
think you said that it's
generates the module of ATOM proposition
based on sort of the global state
of how it defines for the project.
How is that gonna work
with like real code?
- Right, okay, so the legacy header
in turning a hash include
into an import, relies on
well behaved code, using that header file
in a consistent way.
Either you turn all the
knobs on and it's fine
or you don't have conflicting knobs.
Now what you've just described is,
you can either go in mode A or mode B
and one's not a subset
of the other, right?
That one you have to still
use the global module fragment
where you hash include it
before the module declaration.
So in the TSs global module fragment...
- [Participant] So that's
still an option with the--
- Yes, that's right.
So in the merge declaration
has both ways of doing it for reasons that
you just described.
Unfortunately a lot of the module preamble
and all that stuff in the legacy headers
works well in some cases,
it doesn't cover all cases
and you've just described a case
which it doesn't work so well on--
- [Participant] And as
a follow up to that,
would it be possible in some semantic form
to capture the state
of the compilation unit
and perhaps generate
multiple anonymous modules
based on the state of
the compilation state,
assuming it doesn't get outta control?
Like it the binary A, B
it would be perfectly fine
if that generated two modules,
as long the compiler was smart enough
to import the correct one,
based on the compilation state.
- Yeah, probably again, in theory,
yes that sounds like a
large amount of work.
So I would be hesitant of
requiring that from the get go.
Yeah, okay.
Yep.
- [Man] Just to answer that question
about what kind of speed up we can expect.
So we've modularized a small library
and using clang-TS mode,
which is still experimental,
we've got about three times speed up.
So I think we can
probably expect something
from that or greater.
Just a data point.
- Thanks.
- That was right at the
beginning, wasn't it,
that was where it was.
- Hey.
- Hi know you.
- [Chandler] I'm Chandler from Google
and I was just trying to come up here
and report kind of our performance
experience with clang modules.
- Great.
- [Chandler] So we deployed
clang's header modules
to 10% of our header
files in our encode base,
which is like I think over
100,000 different modules.
And they were our slowest to
compile header files certainly.
So we picked very carefully that,
but I think we saw upwards of
between two x and four x
compile time improvements
just by only deploying
them to roughly our worst
10% header files.
- Okay.
- [Chandler] Hopefully that gives people
some intuition that there is
still very, very significant
compile time savings.
- Great, thanks Chandler.
Okay, so if people didn't catch that,
two to four times improvement
on the like 10% worst
set of header files.
- [John] Hi, my name's John
Lakos, I work at Bloomberg,
and I wanted to ask you,
by the way, you and I were
both at the modules meeting
and I wanna thank you for
presenting what is extremely current,
'cause it's only a few days old.
So thank you for doing that.
I wanted to ask you,
you said that the most important
thing for modules is speed
and I know that was a motivating factor,
but there are a lot of other reasons
why we would want modules.
We talked about them
and some of the architectural
advantages that they offer.
And I was gonna say, do you have any
that you personally think
would be sort of the big
tickets in addition to speed?
- Right, yeah,
'cause I was always selling it on the
isolation of the internals of module
do not pollute another
module in either direction.
And 'cause you know I've been bitten by,
users suddenly finding the
internals of your header file
going, oh I'll start using that.
Oh, no please don't.
You're now constraining my next version.
I put it the way that I did
because I was so surprised,
but oh I don't care about that,
just make my builds go faster,
then I might look at the encapsulation.
- Okay, thanks.
- Yeah.
- Okay, so I think we're about of time,
so thanks everybody and I'm done.
(audience applauds)
