- Today I would like to
talk about build systems,
specifically what the next
generation build system
for C++ should look like
or what we should expect
or demand from it.
So this is based on our
experience developing
a build2 chain for C++ so to make
these things concrete I'll sometimes
show you examples or tell you how we
do it in build2.
I don't think anyone will argue that
having a standard build system in C++
would greatly benefit the community
and I think you can all feel that
the change is in the air.
There's a lot been going on
in the past couple of years.
So perhaps this is our git moment.
This is our chance to
build a next generation
build system that everyone is more or less
okay to use.
So what's exactly driving this change?
Well, several things.
More C++ modules are on the horizon
and as we'll see later in the talk,
they are bound to shake things up
in the build system space.
I believe some more
antiquated build systems
won't be able to support modules properly.
Having to support multiple build systems
is also putting pressure
on our packaging efforts.
I don't know if some of you've seen
my talk on Monday, I
showed some statistics.
The largest package reported
in C++ has about 800 packages.
In comparison Arost added 2,000 packages
in the last three months.
So I think that clearly
shows we have a problem.
Also languages like Cross-Show I think
demonstrate pretty clearly that things
work much smoother and
with a lot less friction
if a build system is part of an
integrated build tool chains.
So if you want to work well with our
dependency management tools.
I also think we are at the point
where a distributed
compilation and caching
is no longer an option.
CPU progress has stagnated.
We have been adding cores but it
seems like this is
coming to an end as well.
And the C++ Standards Committee
is paying attention.
There is a study group, SG15, which is
focusing on tools and
dependency management
and build systems is a big part of that.
And by the way, mailing list is open,
unlike other mailing lists
for Standards Committee
so everyone is welcome
to join and participate.
So these are the key points that I think
will shake things up in
the build system space.
So which one, what is the
next generation build system?
I could tell you my opinion.
All of you have your own opinions.
We'll probably end up with 15 options.
So today I actually want
to try something different.
I want to analyze the overall design
of a build system for C++ based on
our core values as a C++ community.
So this idea is based on a talk by
Bryan Cantrill called Platform
as a Reflection of Values.
Anyone seen this talk?
Nobody, wow.
Highly recommend.
It's only a half an hour and
makes you think about things.
So Bryan listed these platform values
so I just copied them.
As a little detour, let's try to
figure out what are our core
values as a C++ community.
A value is something
that is important to us
and a core value is something
that's really important.
So some of them are intention and
if we have to pick then we'll choose one
from our core subset.
Also Brian suggested that a good way
to determine a value of
a platform or language
is to read an opening
paragraph from a book
by the author of this
language or platform.
So we're gonna do that
but we're gonna start
with C, just a little
philosophical exercise.
This is the opening paragraph from
the C Programming
Language, and I'll read it.
C is a general purpose
programming language
which features economy of expression,
modern control flow and data structures,
and a rich set of operators.
C is not a very high-level
language nor a big one
and is not specialized to any particular
area of application.
But its absence of
restrictions and its generality
make it more convenient and effective for
many tasks than supposedly
more powerful languages.
So I've highlighted some value words
in this paragraph.
So I think this should be
fairly uncontroversial.
So the core values are
portability, portability,
performance, portability, and simplicity.
We are not confusing simple and easy,
you've all seen the previous talk.
C is simple but might
not necessarily be easy.
Okay, moving on.
That was, I think, fairly easy.
This is the opening paragraph from
the first edition of the
C++ Programming Language.
So it's original C++, C++ '98.
So C++ is a general purpose
programming language
designed to make
programming more enjoyable
for the serious programmer.
Except for minor details,
C++ is a superset
of the C programming language.
In addition to the
facilities provided by C,
C++ provides flexible
and efficient facilities
for defining new types.
Again, highlighted some key words.
So it's based on C so we probably
carry over some values from there
like performance and portability.
I don't think anyone will argue
simplicity is gone.
In return we get extensibility,
the ability to define new types,
and compatibility with
C is also important,
explicitly mentioned in
that first paragraph.
Moving on to C++ 11, brand new language.
So it's is from the fourth edition of
the C++ Programming Language.
C++ is a general purpose
programming language
emphasizing the design
and use of type-rich,
lightweight abstractions.
It is particularly suited
for resource-constrained
applications, such as those found
in software infrastructures.
C++ rewards the programmer
who takes the time
to master techniques for
writing quality code.
C++ is a language for someone who takes
the task of programming seriously.
So some changes there, I
think you all will agree.
All right, let's see.
It's based on C++ 11, it's
an evolution of a language
so it'll probably carry over some values.
Performance, portability, extensibility.
Not just the compatibility with C
is not mentioned explicitly
any more but I think
we'll all agree that
without C compatibility
there would be no C++.
We've also got expressiveness.
Nobody will argue that C++ 11 got
a lot more expressive.
There's also talk about code quality,
robustness and it is actually backed
by what's been going on.
For example, exception
saved design, good example.
C++ 11 also got quite a
bit more approachable.
There are high-level
constructs and facilities
in the standard library and there's also
things like core guidelines which help
newcomers get started with the language.
So I would say these are the core values
of the modern C++.
I would list them in
the order of preference.
So performance is top priority.
Then we have extensibility
and expressiveness,
portability and compatibility, robustness,
and we have approachability.
So we can just take a little detour
inside the detour.
These are the core values of JavaScript
listed in Bryan's talk.
So as you can see, there is a little
bit of overlap.
We have approachability and expressiveness
in common.
So the reason I'm bringing it up,
I want to highlight an interesting aspect.
As the community grows we might
end up with a divergence of values.
So different sections of the community,
they start pulling the
community in slightly
different directions.
I don't know, I personally think we have
some of that in C++.
C++ now is again a popular language
and there are newcomers that maybe
come from a JavaScript background
and they have different values
or they assign different
priorities to their values.
So just a little thing to think about.
So these are the values.
The next two questions to ask,
well the first question to ask
whether they translate
to us as a community.
So these are the values as expressed
by Bjarne in his thought for the language
and as interpreted by me.
So the question is, are they actually
the values of our community?
Because it's entirely possible,
I mean Bjarne thought, I'm
gonna design a language
for one thing and the
community came and said no,
we're gonna use it for
something else entirely.
We have something else in mind.
I don't know, what do you think?
Does that represent why you are using C++?
For me personally it does.
So I don't see a violent objection
so I will just go with that.
I will assume that these more or less
represent the values of the C++ community.
So the next question is, does it
necessarily translate to our tools?
It's also possible that we will assign a
completely different set
of values to our tooling.
Again, that sounds a bit strange.
The natural thing to assume
that they are the same
so I'll go with that.
All right, so these are our core values
and what we're gonna do next,
we are going to analyze some high-level
method design questions
of a build system for C++
using these core values
of the C++ community.
The first question is actually,
do we need a build system at all?
If you look at Thrust for example,
they use what I would call
a build system-less model.
Put things in certain places,
everything is done by convention.
I don't think this is
going to scale to C++.
We have fairly complex projects.
In fact, it doesn't
scale for those languages
as they are finding out for their more
complex projects.
For example, I've heard the latest trend
in the Go Community is
go back to make files
for building their things.
So I think we'll agree that
we do need a build system.
It would be nice if for simple projects
we could approximate the
build system-less model
and as I will show later,
we can actually do that.
So we can have our cake and eat it too.
Okay, now is the egg-throwing question.
Native or Meta build system?
Or also known as a project generator.
So the idea of a meta build system,
on the surface, is actually quite sane.
So we have all these
different build systems
and different platforms.
Why don't we come up with a high-level
description language that will just
translate all of them and the users
of our language, they will be comfortable
with the underlying
build system and so on.
But if you look under the surface,
you will quickly realize
that a meta build system
is a race to the bottom.
It's a race to the lowest
common denominator.
If you have to target three different
underlying build systems,
the set of features
that you can rely on is basically
an intersection of them all.
And that hurts and some of the
major areas where it hurts are
C++ modules.
I believe, I hope I'll show you
towards the end of the talk,
that for a meta build
system to support modules,
it will require the support
in the underlying build systems.
It's pretty much similar to how
it is done now with here
the dependency extraction.
So if the underlying build system
doesn't support it, I don't believe
that a meta build system has a chance
of providing decent module support.
The other point of meta build systems
is generated source code.
Again, the same problem.
If the underlying build system
doesn't support then we have to resort
to things like pre-build steps and so on,
which have a lot of limitations.
Distribute compilation and caching,
the same story.
We have to resort to wrappers because
the underlying build
system doesn't support it
so the meta build system
can't do it either.
Compilation database,
again another example.
The meta build system does not necessarily
control or know how things are built
so it simply can not produce it.
There is one probably major reason why
meta build systems are popular
and that's IDE support.
You can use a meta build system to
generate your projects and then you can
work in the comfort of your IDE.
But if you think about it, we ended
up in a situation where a
tail is wagging the dog.
Our build systems are using our IDEs
to build our code, which is
a serial thing in a sense.
In contrast, a native build system
would have full control of compilation.
You can dump the compilation database
if we need to change how things are built.
We are in control.
Also you need form.
It can work exactly the same on all
the platforms with all the compilers
and, as a bonus, there is
no project generation step.
Let's take a look at our values.
So those were my opinions so here we
try to compare them.
So on the left I have meta build system
on the right native.
Meta build system will have reach
with performance.
I'm not talking about the build itself
or pre-build step.
That's a minor thing.
I'm talking about forward-looking features
like modules or distributed compilation.
A meta build system might not be
able to support it all.
It might take longer to make
it available to its users,
which won't be able to take
advantage of that benefit.
Extensibility, again, you
are in a strait jacket.
You can only use the
lowest common denominator
so it's hard to extend it with anything.
Robustness also suffer because
we now have more moving parts,
different underlying build systems that
we need to support, test, and so on.
The only major issue of a native build
system is compatibility, as I mentioned.
So if we have an IDE that we want to
generate a project so
that would be problematic.
But in a sane world I would say it
would be the other way around and
an IDE would use the build
system to build things,
which sounds very sane to me.
And there's actually
movement in that direction.
Visual Studio, for example,
provides Open Folder
functionality for this
kind of integration,
which has some issues.
My personal thinking is
that for a C++ Community
to standardize on a
meta build system would
be a fundamental mistake.
Moving on.
Black Box or concept of build,
or a conceptual model of build?
Just to illustrate what I mean,
there are two examples.
One is a make file that tries to build
a portable hello world program
and the other one is a CMake.
So if you're not familiar
with make language
the make file probably looks to you
like line noise.
It's basically gibberish.
But if you actually read the manual
we will start to understand
underlying concepts
in what's going on, the model,
how things are built.
So the first line on the left,
we have a target, on the right we
have prerequisites.
Next line is a rule that specifies
how these things are built.
Then the last two lines is actually
a patent rule, which is kind of
similar to C++ templates,
which can be used to
build several targets.
The idea here is that there's actually
a model underneath that you can understand
and you can adjust and
use to customize things.
Well contrasted to CMake
so this is how you would
build a hello example.
I personally don't know
what is add_executable.
Does anyone know?
Is it a function?
Is it a macro?
Anyone knows?
Also there's space after project
but not after add_executable.
I saw three tutorials, including the one
on the CMake official site and they
all have this difference.
I don't know, is it significant?
Okay.
So I think this illustrates the difference
between a Black Box and a
conceptual model of build.
In CMake you have this
something add_executable,
you don't even know what it is
and the documentation
is written in a sense,
if you want an executable,
put this thing there
and those things there and it will happen.
Good, so there is some
analysis against our values.
The major issue with Black
Box is extensibility.
But if you need another
to do something different
or completely different, the only option
is to create another Black Box.
You can not see inside,
you can not understand
how things are done.
The problem with a
conceptual model of build
is approachability.
If you look at this make file,
it's not really approachable but
it actually doesn't have to be.
It's not a fundamental property
and hopefully I'll show you that
it can actually make sense.
So I think it's also pretty clear
we need a conceptual model of build.
I don't think this is really controversial
if we want an extensible build system.
Implementation language,
another egg-throwing,
potentially, question.
So the question is can we afford to hitch
ourselves to another system,
be it a Python or Java
or some other language?
When I ask other developers that choose
other languages than C++
why they went this way,
the answer is usually, you have to use
the right tool for the job.
Which suggests that C++ is somehow
inadequate for the job.
I don't know, in case of build2,
for us it wasn't even
a debate or a question
which language we should use.
But looking back, that
was probably one of the
best decisions we've made because
nothing gives you as good a reality check
as using your own build system
for everyday development.
Which actually brings
an interesting aspect
to this question.
I don't think anyone will argue
that developing a general build system
is a multi-year full-time job.
So it's a full-time project.
So here say you're using
Python or Java for it,
you are actually a full-time
Python or Java developer.
You're actually not using C++ yourself
for anything real world.
In this position you're actually deciding
how things should be built for C++.
Back to our values, we have a little bit
of a bloodbath on the left.
I've chosen Java and Python as examples,
the most common choices.
Again performance suffers,
extensibility suffers.
By extensibility what I mean here is
that I as a C++ developer,
I don't necessarily
know the language that you've selected for
your implementation or
I don't want to use it.
There's also portability, we are
depending on other things.
There's robustness,
again more moving parts.
And approachability, again the same issue
that I as a C++ developer,
I don't necessarily
know your language.
Naturally doing it in C++ doesn't really
conflict with any of our values.
Next question, should our build
definition language be declarative
or scripted?
I personally believe that the complexity
and the importance of the problem
actually warrants a custom-built language.
Let's analyze it using our values.
So on the left we have scripted.
Performance will probably suffer,
expressiveness will also suffer.
You will probably wrap your scripting
language in some kind of DSL but
it's still not gonna match what
can be done with a custom language.
We also have issues with portability
and depending on another system
and approachability, the same issue,
I might not know your scripting language.
The problem with declarative approach,
extensibility, and it's actually a
pretty bad problem.
This value is pretty high on our list
and it's also affected severely.
So perhaps it's worth trying to fix that.
And perhaps what we can do is a hybrid,
mostly declarative language.
This is what we've done in build2.
So it is declarative, it looks similar in
a sense to make.
So you have declarations,
this is a target,
this is prerequisite.
You also have variables
typed, we have pure functions.
And by pure I mean they can not
modify the build state.
You can run gigs, analyze
chopped values and so on
but you can not actually enter
new targets programmatically.
We also have exclusions.
Know that I phrase it as
exclusion or inclusion
of a build fragment file
rather than control flow
so it's still not a programming language.
We also have repetitions, a for loop.
Again, you can repeat a fragment
of the build file several times and
substitute it with
different values and so on.
The extension is done by providing
custom functions and
custom rules also in C++.
So I think this is probably the way to go.
I think declarative language has
a major benefit but we can augment it
to be with some nice things
to make it more expressive and extensible.
So this is then the summary of our
high-level overview.
The next generation build system
is a native build system with a conceptual
model of build.
It's implemented and is extensible in C++
and has a mostly declarative
type-safe build language.
I'd also add that it probably should be
part of the larger build tool chain
that includes dependency management
and it should be a library
to ease integration
with IDEs and tools.
So that would be my
scorecard for a build system.
So if I were choosing a build system
for my next project,
I would try to see how
it matches these points.
Just for interests sake, this is the first
paragraph from the build2
Build System Manual.
I'm gonna read it.
build2 is a native,
cross-platform build system
with a terse, mostly declarative
description language,
a conceptual model of build,
and a uniform interface
with consistent behavior
across platforms and compilers.
build2 is an honest build system
without magic or black boxes.
You can expect to understand what's
going on underneath and
be able to customize
most of its behavior to suit your needs.
Also for completeness,
this is our make file
and CMake example.
So at the bottom I have build2 example.
If you look at it for the first time
you'll probably say it's
exactly the same as CMake
or at least a lot closer
to CMake than a make file.
In fact, again, if you understand
the underlying model,
it's actually a lot closer
to make file than CMake.
We just have a declaration here
with all the implementation details
actually handled by default rules.
So on the left we again have a target,
on the right we have
a requisite and so on.
I think this highlights that issue
with approachability doesn't actually
have to be unapproachable.
I think this is pretty fairly
clear what's going on here
even if you haven't
really any documentation.
The second half of my talk will
be about major features
that we should demand
or expect from a next
generation build system.
I'll quickly go over some that most
modern build systems already have
just for completeness and very briefly.
I think by now everyone agreed
we need out and in source builds.
I think we need both.
Most of the time probably for development
you will have multiple source builds
but there are situations where you want
to build in source where, for example,
you just want to build someone's package,
it's just more convenient to do that.
Wildcard patterns, this is a somewhat
controversial topic.
I personally was also skeptical before
we added it to build2 and converted
all our projects but
now I'm also converted.
The nice thing about wildcards is that
I hardly ever need to touch
my build files any more.
The problem with wildcards that is usually
brought up or the major issue is that
you can pick up some stray files
that happen to be in your
source directory and so on.
I personally never ran into it,
maybe because I use
version-controlled systems,
so for me a stray file will be immediately
shown by the next round of get stack.
But I think it's important to give
full control, so in case of build2,
you can do it either way.
So you can use wildcards or you can
spell your source files exactly
but I think wildcards
are actually pretty nice.
It makes the development
workflow much smoother.
This is also allows us to approximate
this build system-less
model for simple projects.
So if your source files are picked up
by a wildcard then, provided you put them
in the right place, then you can add,
remove them, rename them, and so on.
So that also works quite nice,
especially for someone who wants to start
using the tool chain without reading
too much documentation up front.
Cross-compilation, I shouldn't really
need to mention it but some build systems
still can not get it right so it should
be a norm and not an afterthought.
Once you actually have a build system
that has it done reliably and properly,
the next natural thing you want to do
is cross testing.
You want to run the test for
what you cross compile for.
And it's actually possible that probably
the best example is running
tests on the line emulation.
So we do it a lot and we've even coined
the term cross-testing, very handy.
So I, for example, can
run tests for Windows
in my Linux machine.
Modern build systems also rarely need
to support just building.
There are other operations.
There's tests, installation, preparation
of distributions.
It also would be nice if
configuration management
was integrated into a build system.
And this is a good
example when they actually
get the design or the
underlying model right
then you end up with some nice features
popping out that you didn't even expect,
hadn't even thought about.
Just to give you an example that we found
in case of build2, so
I have a little hello
example here, nothing interesting.
So I'm going to configure it
to use Clang with optimization.
So this is how you do it in build2
and then wrapping around the build system
they've used Clang, as you
can see, with optimization.
Let's say I made some
change and now I want,
I'm not sure whether she's
gonna be happy with it
so I want to quickly
re-test this just with GCC
for just one run of the build system.
It would be a pain if I had to
actually go and do it manually or
create a separate build configuration.
I just want to quickly test it.
So if your configuration
and actual build system
are separate, those are the only choices,
you either need to
re-configure or you need
to create a new one.
If they're integrated, in
case of build2 at least,
you can just override it for
a build system invocation.
Here I'm gonna change
it to GCC, let's build
a debugger.
These are GCC and these are debugger.
So the cool thing is now if I remove
those overrides, I tested
GCC, it works great.
I want to go back to Clang.
All I need to do is just
say move the overrides
and it rebuilds it to Clang.
So it's is a good example of once
you get these right,
interesting things happen.
Now moving to the fun part,
next generation functionality.
I think most existing build systems really
don't handle project
composition very well.
Just to give you an example,
let's say I have two independent projects.
I have a library and an executable
that I want to use it in.
They're separate.
I don't want necessarily
to have them bundled.
They're independent, potentially
independently developed.
Maybe I cloned someone's
project from GitHub.
So there's several scenarios or setups
in which I might want to use it.
So they can be just completely
independent projects
so it would be nice if
I could import a target
from one project to another.
Like for example, a library
that I could just link
and that things automatically
build and update
when I actually want to use that library.
So that's one scenario.
I might also want to copy this library
into my project so that I
create a dependency-free bundle.
So this is also useful,
not always recommended
but useful to keep.
So it would be nice that it used
the same mechanism as when I had the
projects completely independent.
And the third scenario is when you
want to consume an installed library
potentially built by a
different build system.
So again, in this case, it would be nice
that all these three scenarios actually
handled by the same mechanism.
That's how we've done it in build2
and I think it is the way to go.
I also think testing
needs some addressing.
If you look, for example,
at LLBM and Clang projects,
it's quite an interesting example,
they both have a scripting language for
running tests like for running compiler
and some for analyzing output.
They look nothing like each other,
completely different ideas
but they have essentially
the same purpose.
I think this underlying common need
of a portable language for
concise test description
that allows you to supply
input to your test,
analyze output using regular
expressions and so on.
So in build2 we came you
with such a language.
We called it Test Script.
It's also optimized for parallel execution
of tests in the same file.
So you'll have a file let's say with 20
different test cases and it will execute
them all in parallel.
Once you get that then the next natural
thing is incremental testing.
We already have incremental builds,
incremental compilation.
So if you change one file in your project
and it took a second
to update, to rebuild,
but then it takes 10
minutes to run the test,
it's no use that it took
only one second to rebuild.
So it would be nice to also have
incremental testing.
Basically execute tests
that were actually only,
only execute tests that were affected
by the change that you've made.
And this is actually tricky but
I think that that's something
we also need to address.
Next point is high
fidelity hermetic builds.
The idea here is that a change
to a source file is actually only
a small part of what can
affect the end result,
the object file.
For example, other
things like environment,
tools, options, and even source code sets
can all affect the end result.
This is actually an important aspect
to get right first because it affects
trust in the build system.
If you don't trust your build system,
if you think yeah,
everything is up-to-date
but just to be sure I'm gonna clean
everything and rebuild from scratch
because I don't quite trust it.
If you lose that trust it's actually
very hard to regain so this is something
that needs to be done
properly from the get go.
Actually that example that I showed you
where we once I removed the override
and things got rebuilt again with Clang,
that relied on the fact that build2
detected that I changed the compiler
and I changed the options
and it automatically
rebuild everything.
A somewhat related area
to the previous one
is more precise change detection.
Quite a few build systems already
don't just use timestamps
to detect changes.
They, for example, hash,
your translation units,
your inputs, and if
the hash hasn't changed
then they skip rebuilding things.
This helps, for example, with switching
branches and so on.
But we can actually go a step further
and ignore more things, like for example,
white space only changes
or changes to comments
and so on.
Again, just to show you an example, again
I'm in my hello program.
So I'm gonna go and just add a comment.
I haven't actually
changed anything semantic.
So if I run the build system,
it actually detects that nothing changed
and skips updating the target.
So I'll explain how this is actually done
in a few minutes.
Okay, fun part right, C++ modules.
So the two major challenges in supporting
modules appears to be discovering the set
of modules that a translation unit
actually needs as well as mapping
them to file names.
The first one is actually
a really hard problem.
The second one is actually
not that difficult
and I don't know why there's
so much noise about it
but for some reason people
like to complain about that.
The first one is a real killer actually.
We kind of have a chicken and egg problem.
It is getting better
recently but until now
it's been this way.
So the build system
vendors look at modules
and they realize oh, it's so complicated.
I have no idea so I'll just wait for the
compiler vendors and
the Standards Committee
to sort it out and the compiler vendors
and Standards Committee, they just wait
for the build systems to
tell them what we need.
So it's actually
important for build system
vendors to start looking
into it and provide
feedback to the Standards Committee
and to the compiler vendors.
And the best way to do that I think
is by submitting papers
for the Standards Committee
so there's some links to
some of the relevant papers.
There's also I gave a talk last year
about building C++ modules
and also Nathan Stalk
yesterday is very good,
if you want to understand
how things changed
and what is the current state of things.
Distribute compilation and caching,
again as I mentioned earlier, I think
we've reached the point where we need it
to be part of the build system and
it should be reliable and a
generally available mechanism.
We don't want to have wrappers and
a highly-controlled environment and so on.
Which brings us to the way we build C++.
If you look at these
items, we have C++ modules,
auto-generated headers,
distributed compilation,
and ignorable change detection,
what they have in common is that most
existing build systems don't support them
and it will be really difficult to do.
And to understand why we need to look
at how we build things currently
and how it needs to change.
So this is a current, or I'll call it
an old model of building things.
So we have a source file and we compile it
to an object file and as a
by-product of this compilation
we end up with a header dependency.
Basically a list of
headers that this source
file includes transitive.
So next time we need to decide whether
we need to re-compile anything,
we take the source file
as well as the extract header dependencies
and we make a decision.
If none of them change, we don't need
to re-compile, otherwise re-compile.
This is how most build
systems work these days
and it's relied on a couple
of clear observations.
But all of these points are actually
affected by this model.
This model makes, for example, handling
auto-generated source code painful
and that's why most build systems resort
to some kind of pre-build steps.
So I believe we need to change that.
I think we need a new build model for C++
and I think it'll look like this.
So we have a source file but instead of
actually going straight to an object file,
we're first going to
partially pre-process it.
So I don't know how many
of you are familiar,
for example both GCC and Clang,
they have an option
where you can partially
pre-process a translation unit.
And what that does is essentially
expands the include directives but does
not expand the macros.
So you have a file that has all this
bunch of includes and now you have a file
that is self sufficient.
It doesn't have any external references
to any other hubs.
So once we've got that,
also Visual Studio team is working
on a new pre-processor and I've asked
them many times to consider this feature,
so hopefully we will have
that in Visual Studio as well.
Once we have this partially pre-processed
translation unit, we pre-process it fully,
maybe in memory without actually storing
anything on disc.
Then we tokenize it using
the C++ pre-processor
rules, which is actually
not that difficult.
And then we shallow parse it
to module dependency information.
So we extract the imports and so on.
Yesterday on the panel someone said to
extract module dependencies you need to
implement the C++ parser.
That is not the case.
It was not the case in case of module TS
and it's even less of a case now
with the preamble and so on.
All of this will actually be provided
by the compiler most likely eventually.
So we've done it and it's actually
not that difficult to do.
One interesting thing that you can
also do once you've tokenized the input,
you can hash the tokens,
maybe the location information and so on,
and produce a precise change detection
check sound for your translation unit.
So that will all skip all the comments,
ignore all the white
space changes and so on.
So now we take all this information,
header dependencies, module dependencies,
and the checksum, and the
partially pre-processed
file and we make a decision whether we
need to recompile.
And if so we go and compile the
partially pre-processed
file into an object file.
I think you can all see how
the distribute compilation
will work in this case.
Instead of compiling it locally,
we just send this self-sufficient
partially pre-processed
file in case of modules,
probably together with the BMIs
that it needs, with the
binary module interfaces,
to a remote machine,
where it gets compiled
and it sends you back the object file.
If you observe how this new model,
you will realize that with a clean build,
with a first build, we're actually going
to pre-process all our
translation units up front.
That might sound crazy to some.
It's probably we're gonna take a huge
performance hit doing it
partially pre-processing
and then pre-processing again.
It turns out this actually can result
in fast overall build, surprisingly.
I think this is a good example where
our intuition does not
translate to modern hardware.
But if you think about it it makes sense.
So we have all the
pre-processing happening up front
at the beginning of the build in a very
small point in time.
And pre-processing is mostly reading
the same files over and over again,
your standard library, your
operating system headers.
So now this is all concentrated
in a small point of time,
which means that chances are pretty high
all this here is actually
sitting in memory.
Contrast if the pre-processing is spread
out throughout the compilation,
which is by the way a
memory-intensive thing.
Most likely those files get flushed
so that's why in certain
cases this actually
can result in faster build.
So this is the new model.
This is how it works, for example,
in the case of build2.
I didn't want to complicate the diagram
but you can probably see what happens
if we have a generated source file,
I mean generated header,
then we need to generate it
if we detect it at that point and then
we'll have to re-start the whole thing
and try again.
So we've implemented this is build2
and it works pretty well.
Just a summary of what I think a
next generation build system should
look like for C++.
It'll be native with a
conceptual model of build,
implemented and extensible in C++,
which will use mostly declarative language
and it will be part of a
dependency management tool chain.
That's it, thank you very much.
I think we have quite a
bit of time for questions.
(applause)
- [Audience Member] Just a quick question,
on the previous page, why do you not
use the fully pre-processed
file to compile?
Is it like a space issue?
- No, actually, well there's a theoretical
and a practical answer to that.
A theoretical is that apparently it's
not even valid to first pre-process and
then compile a translation unit.
Yeah, I was also surprised.
But the practical issue is that you lose
macro expansion information so the
error messages are not gonna contain
all of the information.
- [Audience Member]
Right, that makes sense.
- So they can not carry
the position information.
- [Audience Member] Thank
you for the presentation.
I would use this build system.
In terms of IDE support, obviously having
a native workflow would be
nicer from the standpoint
of not having to have this configure step
like meta build systems.
But of course you need to
get the IDEs to support it.
I'm curious if there's
been any success so far
in getting build2 in some of the common
C++ IDEs?
- We personally haven't
tried it but I think
Visual Studio Open Folder functionality
looks quite promising so if anyone
wants to take a stab at it, we'll help
and assist as much as we can.
It's in a sense, sorry, just to interrupt,
it's in a sense we did some thinking
and analysis about it.
It is similar issue to or if you look
at how Open Folder works and what the
metadata that it requires, it looks
very similar to the compilation data.
So it looks like there's some kind of
overall model of a tool needs to know
what include parts, what defines you have
in order to be able to like an IDE or a
static checker to
understand your code base.
So there seems to be
some commonality in that.
Hopefully we can maybe unify to some such.
- [Audience Member] And
then on an unrelated matter,
of course one of the things
with kind of starting
over is trying to get everyone to move.
Have you thought of ways
that that could be easier
than it has in the past?
Because usually it's pretty much like
well start over in your build system
and that's kind of a non-starter for
a lot of organizations,
especially smaller ones.
- Yeah, it is a problem but I don't think
there is magic.
I think the other options
are without automated tools,
it'll probably be brutal and then we'll
spend most of our time
probably just fixing that.
You just have to bite
the bullet and do it.
And our approach us is to provide
functionality that makes it worthwhile.
To give you an example, for example,
we are running a CI service,
so if you have an open source project
you can CI your project
on all the major platforms
and compile it for free.
That's our plan at least.
- [Audience Member] All right, thank you.
- [Audience Member] In the big builds
dependency checking for build projects
takes most of the time because you
pretty much for every file have to scan
STD libraries boost and
those just number of files
can be huge.
So it feels like the solution to that
would be a daemon mode where you have
some server running and using OS support
like iNotify to just
build the dependency tree
once and then you just watch what changed
and rebuild what's necessary.
Also this opens an opportunity for your
IDE to integrate using some standard API
and actually use things like remote build
and also not to be dependent
on the build system at all.
Did you think about development of build2
in this direction?
- Yes, we've thought about that and also
I think again going back to that point
of pre-processing and actually ending up
with a faster build, I think our intuition
is really not translating very well.
To me personally, if you
have a modern hardware,
for example, solid state
drives, a lot of memory,
this pushes the problem to really
extremely large projects.
So I personally think that--
- [Audience Member] Think about NFS.
- Yeah, well yeah, NFS is
usually what's brought up.
To me sounds like using NFS is a bad idea.
It feels like we are going to spend time
and add complexity to optimize for a small
subset of users that may
have made bad choices,
which sounds unfair actually to the
rest of the community.
But going back to that
daemon mode and NFTI,
I think that's a good idea and also
that's why I mentioned that I think
the build system needs to
be available as a library.
So theoretically you could
do a daemon out of it,
if you wanted to.
- [Audience Member] Thank you.
- [Audience Member] I'm not
unsympathetic to this model.
I think it's got some benefits.
I guess I'm kind of piggybacking
on something someone else asked,
what success have you had,
or have you had success,
and could you describe it,
with translating existing projects?
Because it's one thing to start from
the ground up and build but making
a system so that, not
necessarily byte for byte,
but as close as possible,
I have 100% successfully
built the same binaries and have some
sort of confidence with that.
Because I agree with what you're saying,
we shouldn't penalize
the rest of the community
for bad decisions in how people have
designed their build,
but I think everybody
in this room can agree that we all work
at companies that have made some horrible
decisions with build
systems so that's really
not the 10% or 20% case.
It's probably more like the 97% case
where we've all, that
people's build systems
are horrendous.
And this is kind of great but unless
we look at this from what is the success
rate from adopting this
and is it even tenable?
I don't see the compelling
reason to do this
so do you have examples of translations?
- Yes, we actually converted
some of our projects
some of the dependencies that we need.
A good example I like to give is a mySQL
client library.
If you look at it, it's
a pretty heady thing.
To start off with, it's
actually a C library
that has a C++ implementation.
So I personally haven't encountered things
like that before.
But it has a lot of things that are copied
inside and there's, of
course, a lot of variation.
And going back to the success rate,
I'm not exactly sure, so we've converted
mySQL and myADB client libraries.
I think one of them
they just outright said,
we don't support ? On Windows.
And we've converted and actually built.
So in a sense we've actually achieved
a higher success rate than them,
than their native build system.
- [Audience Member] Okay, but I guess then
that's an interesting question is
did you miss something in their old
build system that indicated, this doesn't
work on that compiler.
It'll compile but it won't really run
and you've kind of hidden the problem now.
Because yeah, it
compiles, we translated it
but that translation has missed something
that is intrinsic to that build system
that was ever there?
Those are the problems that I think
are inevitably come up with adopting
a new build system because like I
work on the core CLR and
no way are we changing that
for the build system because there's no
real way to detect it.
So kind of adopting and
moving to new build systems
without a way of checking
that is a really hard sell.
Again, not unsympathetic.
I'm curious as to what the success rate is
and how that works over time.
- I think the only way to check would be
if you have tests for your project,
that would be the way to do it.
- [Audience Member] Yeah.
- Which is how we do that.
So when we convert mySQL
and myADB because we
needed that for our project,
our project has an extensive test suite
and we just ran it against that.
We are pretty confident that at least
for our scenario it works.
- [Audience Member] So I have sort of
a meta question observation and it was
sort of glossed over a little bit
earlier in the talk when you talked
about the difference between what an IDE
might require in terms
of project management
structural organization versus what
a build system might require.
And it occurs to me that
there are several steps
in real-world software development that
have high overlap in that
conceptual kind of thing.
Like a manifest of what
all the source files are,
what they all generate, what
steps are supposed to be
done from the IDE, project
management sort of stuff
to the build system, to package
managers, to installers.
And it kind of seems to me like
a next generation build model would be
more encapsulating of a next generation
project management
manifest tracking model.
I'm wondering if you've thought about that
in the larger context of the other related
systems that have a high overlap in
what they need to consider and is there
any sort of a how to get there from here
where we have a lot of disparate systems
that all sort of need the same data and
none of them play nice together unless
you have a very integrated
IDE environment, sometimes?
- Answering your first
question, actually yeah.
There are two things that we've done.
First of all let me start with I believe
that a package manager and a build system
and a project dependency
manager, they should
be integrated in the sense that they
should inform each
other's design, but they
shouldn't be the same thing.
Because sometimes you just
want to use build system.
And a good example for that
is a distribution package
manager.
Like for example, all
of the DBA at Red Hat,
they just hate all these
Thrust people because
all of it is just tied together.
That's the prelude, I guess.
The two things that we've done to address
the question of different tools needing
mostly the same information,
so in the build system,
we made sure that you actually
have to list everything.
So, for example, you have
to list all your headers
that you use in the
project, which is strictly,
at least in the old
build model it's strictly
not necessary but we made it
a requirement for other reasons.
But also in part to make sure that
if someone else wants
to use this information
for something other than
building, it's there.
So that's one thing.
And going back to the packaging,
this is all covered.
Again, I believe that a build system,
build definition language, and a package
description language manifest,
they should be separate
but they are all specified
and they are there.
So for example, we specify a version
of your package in the manifest.
- [Audience Member] As a follow up,
as a quick, follow up,
how functionally consumable
is that in the real world?
The build system is
gonna generate X number
of object files, X number of executables,
libs, output dependencies.
How much of that is understandable by
other IDEs?
How much of that is understandable
and consumable by other project managers?
In the real world, how much of this
can I actually compose in
my overall project system
versus this being sort
of designed in a vacuum
just for the build part of it?
- You mean right now as
in are there any IDEs
or other package managers that can consume
this information?
- [Audience Member] Yeah,
what's the status right now
and what's the plan to get to something
that is more integrated?
Assuming that that's the end goal.
- The status now, I'm
not aware of any other
package managers or IDEs consuming it.
And the plan, as I said,
is to make the build
system a library and
have the manifest file
is easy to parse so it should all be
possible to extract very easily.
And there's a cross-compilation database
which an existing, it's probably a
compilation database is probably
the closest what we currently have to an
industry standard for interchanging this
kind of information.
- [Audience Member] Okay.
- So yes, a miss, in other words (laughs).
- [Audience Member] First
just a quick sidenote
to the earlier question about
the iNotify-driven daemon,
Tupp already does that.
So if you want to build using your library
or something, that
could be a good example.
In your earlier statement though about
wildcard globbing, you
mentioned the possibility
of picking up a stray source file.
Why would I see that as a negative though?
Why should that file be allowed to persist
in the source directory of a module
if it's not sourced for that module?
Shouldn't I see that as a positive
that it forces people to confront this?
- Yeah, I believe so but the rationale
that is normally brought against wildcards
is they might be source, I will build it,
I won't notice, I will ship this library,
I will package this and I
will put it in archive reports
which can not be deleted.
So it's easy to make a
mistake that won't be noticed.
But again, me personally
using version-controlled
system like Get, I can't
remember in my life
checking in garbage in my repository.
I don't know, maybe I'm an exception.
- [Audience Member] I have
actually a trickier question.
One of the trickiest things I've had to do
and had trouble with other build systems,
so I have a general
cross-process by using the
coverage or profiling
information from the unit test
to decide which source files the unit test
depended on so the next
time the source file
is modified, the build system knows to
re-compile and rerun this unit test.
Do you have an example
like that using build2
so we can see how it would be done?
Because a lot of times feeding back into
the next field description can cause
problems for some build systems.
- I'm not sure.
Can you maybe rephrase.
What exactly is the issue?
- For an example, reading
the GCOV information
that produced by a unit
test, somehow feeding
that back in to like a generated make file
or whatever that can
be included so the next
time when it runs, it knows this unit test
also depends on these other things
that got linked in, through
the static linker, for example.
- Okay, got it.
This is a more general
problem that actually
goes back to the generic source code.
You need to be able to
feed adhoc dependency
information into your build system.
In build2 we support that.
- [Audience Member] Is
there an example I can
look at on your site to show that?
- The way you would do it, you would
actually need to customize a rule.
You can not do it in the build file.
You would just need to
write some C++ about that.
- [Audience Member] Okay,
thank you, thank you.
- I think we are out
of time and thank you.
(applause)
