- Welcome, everyone, to my talk on writing
UEFI Applications with C++ or Modern C++.
My name is Morris Hafner.
I moved from Germany to
Scotland to eat haggis
and work on compilers.
I'm a software engineer
at Codeplay Software
where I work on SYCL, the Khronos standard
for heterogeneous programming for C++.
I'm also a post-grad student
at the University of Edinburgh
and unfortunately I also have a tendency
to break compilers, debuggers, linkers,
pretty much you name it and I like to say
that it's a curse, but it's probably
also because I want to
do very exotic things
like trying to make modern C++ happen
on exotic platforms like UEFI.
So what is UEFI?
It stands for Unified
Extensible Firmware Interface
and it's meant to be
the replacement for BIOS
on the PC platform.
So it was initially developed
for the Intel Itanium platform
in the '90s and you all
know how that went down,
but the technology works
well and was reused
for x86 and can also be used
for ARM and RISC-V computers.
It has many more features
compared to BIOS.
You have network access,
you have Secure Boot
which was quite controversial
in the Linux community.
You also have non-volatile RAM.
There's even Bytecode specification
if you want to share application code
between different CPU architectures,
but the most interesting thing about UEFI
compared to BIOS is that
you don't need to write
as much assembly code anymore.
So can I get a short hand,
who has written code for BIOS?
Maybe a boot loader or kernel?
Yeah, a couple of people.
So you might be all
familiar with this then.
So the BIOS was the
initialization system so to speak
for the original IBM PC and what happens
if you press the power button, until UEFI,
is that it does some
very basic initialization
on your main board and then loads
the first 512 bytes of code.
From the master boot record, it jumps
to some specific address
and then you're on your own.
And even very basic things like switching
from the 16 bit mode to 32
bit mode to 64 bit mode,
installing it you have to do it yourself
and this obviously will
require assembly code
because switching to 32 bit mode
isn't covered by the C language.
So it is much more
lower level and you have
to call interrupt instead
of having proper functions
like you have EFI.
So EFI executables are just
binaries on some FAT32 partition
with a very specific pile system code.
As you can already see,
it was heavily influenced
by Microsoft with concern to organization
because it uses the same binary
format as Microsoft Windows,
it uses the same calling
convention as Microsoft Windows,
and unfortunately it
also used UTF-16 strings,
which cause a headache.
And when you press the power button
on pretty much every PC
here in this very room
including MacBooks, it does
initialization of your hardware
like BIOS, but then it doesn't move
or jump to some address
but launches an executable
which is usually located
in EFI/Boot/bootx64.efi.
So this is usually the case.
You may change the default.
You may also have to deal
with program limitations
like the one on my laptop
here, which is hard codes
of part of the Windows boot loader
and this is an interesting exercise
if you want to install
Linux on this machine.
So how did we get an executable
to run on this platform?
So first of all, we're going
to need some kind of toolchain.
So the thing you usually want to use
is the TianoCore EDK Two, which is just
a truly massive framework
and contains functions
for all kinds of things.
It even has a port of Python Two
so if you're wanting to run Python Two
without an operating
system, download EDK Two.
But I want to have
something simpler than that.
So I went with the new
efilib, which is pretty much
just a set of headers, a standard library,
a linker script, and
some basic functionality
for certain C language
features in this ERT Zero.
As for compilers, you can use MSVC.
I am going to use the MinGW
gcc, but you cannot use clang.
If you try to use clang,
you get a nice error message
saying that freestanding COFF
executables are not supported
and you get a nice internal
compiler error afterwards.
Right, so the standard library.
So we have a SDK.
We have a compiler and now
we have a standard library
or we need a standard library.
Unfortunately we don't have one.
The EDK Two ships with an implementation
of the C95 standard library,
which is pretty much
just a C library with back
fixes and the ISO 646 keywords,
so and, or, but, and so on.
But we also don't have
a C++ standard library
so things like operate a
new dynamic cast, ITTI,
everything that would
require runtime support
is just off the table.
But in practice and this
is very, very hacky,
we can still read things
from our host implementations
like std::array or
std::tuple or the type traits
because those are just
compile time constructs
that don't have any
dependency on the runtime.
So it's hacky, but it works
and I just went with that.
And with that we can finally
compile an executable.
So your compiler invocation line
will probably look like that.
You want to invoke your compiler.
You want to disable the red zone,
which is an optimization in the x86-64.
Call a convention and this
allows certain small functions
that uses less than I believe
128 bytes of stack space
to not decrease the stack pointer.
Even though you're using stack space,
you use the stack pointer.
But we are on a freestanding
environment here.
We don't have an operating system
that would catch all the interrupts
and if an interrupt occurs on a system,
what happens is that the processor wants
to save its data on the
stack and if we did decrease
the stack pointer, we would
cause memory corruption
so we have to disable this optimization.
Next, so after that we are
officially an environment.
We have no operating system.
We are in a Microsoft universe
where wchar is two bytes
and there is a four bytes.
We don't have standard library.
We need to change the
entry point from main
to something else because the signature
is slightly different.
And then we tell the
linker that we don't have
a win32 executable.
We have an EFI application
and we are saving
this information by setting the subsystem
of our application to 10.
So otherwise Windows would just think
that this was a valid win32 application
and you could launch this on Windows
and it would horribly crash.
I can put this in a CMake Toolchain
and then we can finally compile the code.
So Hello World looks like that.
We put two headers, efi.h and efilib.h.
Our entry point is extern "C"
and you'll see Microsoft ABI
and the very important thing
is here the system table, which contains
all the services you can
access through the EFI.
For example, ConOut to print
out strings on the terminal.
So I use a 16 string with Hello World.
Unfortunately I also need
to cast the character weight
because the interface is broken.
But yeah, this is Hello World for EFI.
And you compile this with the command line
I just showed you here.
And then if you want to run
it, you can download OVMF
from the Internet, which is
an Open Source implementation
of UEFI and is compatible with QEMU
so you can just set it as the firmware
for your QEMU virtual machine.
You put your executable
inside of a virtual hard drive
with a FAT32 partition and
then you can hopefully launch
your Hello World application
inside of your virtual machine
or actually on real hardware if you would
like to do that instead.
So I've already shown you the ConOut thing
and there are obviously many more features
than printing something
out of the terminal in EFI
and if you would like to
have access to some service,
you usually need to create
for its existence using GUIDs.
Because, for example, if
I want to render something
on the screen, I need
to have a graphics card
installed on my system
and not every system
has a graphics card
installed on the system
so this might fail.
So we need to query it for the existence
of certain services and if that succeeds,
we get a nice struct back which contains
a bunch of function
pointers and this gives us
a nice object-oriented
style interface even in C.
- [Audience Member] Nice.
- Yeah, nice in terms of C.
We're obviously biased
here. (audience laughs)
And if you look closely
here, the signatures
are very, very interesting
because we always have a return.
We always return an
error code, EFI_status.
We have some input parameters,
we have mutable parameters,
and we have some output parameters.
So Windows might be already familiar
with those in, in out, and out annotations
and in EFI you have the same.
So that's very interesting because in C,
this is probably fine and
the thing you want to do,
but in C++ we are more used
to interfaces like that
where we turn all of our
actual output arguments
by value and don't have
and usually don't want
to deal with any error codes.
So what we would like
to have is a function
that takes all the
input arguments by value
or by construct for that
matter, the mutable arguments
by reference, returns
and expected of a tuple
or my error code, and then I
have a much nicer interface.
Because the other thing
about output parameters
on C is that you need to
create your output variables
on the stack right before you
make your call to the function
and those values in there don't make sense
and we can't mark those as const
because we only write to
them in the next statement.
So it's just an error prone
way to call the functions.
So how do we improve that strategy?
I like to create a wrapper
for all of those EFI functions
and for that I came up
with a function called wrap
that takes an EFI function
or EFI function pointer,
takes three integers
as template parameters,
the number of input arguments,
the number of mutable arguments,
and the number of output arguments,
and then I like to pass
in a function pointer
and then it returns what I get
and you lamp the back
on your function back
with my fixed interface that is more C++.
For this to work, I
need to do a few things.
First of all, I need to get the list
of all my argument types.
Then, I need to split this up
so it has my input arguments,
the mutable arguments, and the
output arguments separately
and then I can finally
create my EFI function
and then we can add some
error handling to that.
For the first step we need
to get the argument list.
My advice is don't bother with it.
Just do a boost callable
trait in your starting step.
It has a really nice
type alias called args_t
where you pass in a callable.
In my case, LocateDevicePath
or the type of LocateDevicePath
and then it returns a tuple
with all the arguments.
And then the next step would
be to split up this tuple
so we have our inputs, the mutables,
and the outputs separated.
And we can do this with a
constexpr function like that.
So my idea is that I
implement a one-way split.
So given a tuple, I
want to have a new tuple
that contains two tuples
where the first tuple
has every element from zero
to N and the second tuple
contains all elements from N to the end.
And C++14 with make index sequence
makes this actually very easy to do.
I just call make_index_sequence with N
for the first tuple and then
call make_index_sequence again
from N to the end for the second tuple
and format everything
to the st_impl function
that basically just copies
everything into my new tuple.
Note that I actually
don't call this function.
This is all just to get the types right
and here I do my final three way split.
So here I'm just creating
a bunch of type aliases
that don't cause any runtime code
and here if I have my argument list
and I want to get the first part
of all my input arguments,
I just do my split
at the number of input
arguments and to get
the mutable arguments
and the output arguments,
I need to split the second tuple
at the number of mutable arguments.
And then I end up with three
meaningful type lists here,
In containing the input arguments,
InOut containing the mutable arguments,
and Out containing the output arguments.
And with those three type devs, I can call
another constexpr function
called make_out_param_adapter
that takes three variadic templates.
The sequence of inputs,
the sequence of mutables,
the sequence of outputs.
And it takes the EFI
function as a parameter here
and returns a new parameter
that captures our EFI function
and this parameter only takes
the input and the mutables
as an argument, but not the outputs.
The outputs are just locally
allocated on the stack here.
So I remove the pointer type
and create a tuple of my result.
Then I create a second tuple temporarily
just to get my pointers back
to my result right here.
So now I have my inputs, my mutables,
and the pointers to my outputs.
I concatenate those three, pass everything
to start applying, and then
can call my EFI function
with my deconstructed tuple.
And then I can return the
tuple of my output arguments.
So I have a question.
Who thinks this causes any overhead?
We have one hand, two hands.
Okay.
So most people were
wrong, we have overhead.
So here I have a function f
that has a single output parameter
and then I came up with a wrapper
that I would probably write in C
so it becomes an integer,
allocates an r on the stack,
and returns the result.
And then the cpp function is something
that is most similar to
what we just metaprogrammed
so we allocate a tuple of int
and then return a tuple of int.
And you can see on line 11
that we have a dead stall here
so we move the zero to some address.
So we have some overhead
here and the reason for this
is that the tuple causes
invalid initialization
of your functions, which means in our case
that the integer is initialized as zero
whereas you can see this is
just an uninitialized value,
but the compiler cannot optimize that out
because we are wrapping
an opaque function pointer
and the compiler can't
determine that we actually
only write to this memory address.
We never read from it.
In C, that's easy to tell the compiler
because we just say,
"Okay, it's uninitialized
"so you just optimize it
out because it's undefined."
In C++, we set it to zero.
The compiler has to assume
that this may derive
from some other function.
It can't optimize that out.
How do we get around this?
We can just create a small wrapper
which I call uninitialized that has
an empty, explicit default constructor
and when we value
initialize this struct here,
this does only default
initialize our value.
So initialization
unfortunately happens in C++
and we can just wrap all
of our output variables
in this uninitialized
type because we are sure
that we want to have uninitialized values
and pass this to our function.
So minor change and this
instruction goes away, yeah?
- [Audience Member] If
you've got a equals default
instead of empty braces,
would you have gotten
the same results?
- I believe the overhead
still doesn't go away
because this still causes a
value to be default initialized,
but initialization...
Okay, Chris says no.
I believe him.
- Okay.
That's good, I know
you're pressed for time.
- Yeah.
Okay.
And now what we still need to
do is wrap the error codes.
So as I said before, we don't
have access to exceptions
and even then exceptions
cause some overhead
so I don't really want to use them.
Instead I want to use an expected type.
So if you haven't seen
Simon Brand's infamous talk
or Andre's talk from
earlier this conference,
an expected type is something
like a special variant
that either contains a result
value or an error value.
In my case, I'm using Simon
Brand's implementation
tl::expected and now we just
need to instance our wrapper
with a small if constexpr and we check
if the return type of our
function is EFI_STATUS
so it can return an error code.
If it can return an error
code, we call our function,
check the error code, check
if it's not EFI_SUCCESS.
If it's not EFI_SUCCESS which
means there is some error,
we return only the error
code as an unexpected value.
Otherwise, right, we have a success.
We just return our tuple
instead of the expected.
Then there's also this
case where EFI functions
don't have an error code that they return.
They just return void.
In this case, we can just
return our tuple directly
without wrapping it in an unexpected.
So question number two, overhead.
Who thinks this causes overhead?
No one.
Okay.
- [Audience Member] We're
unwilling to guess at this point.
(audience laughs)
Uh-oh.
- Oh.
Okay, nevermind. (audience laughs)
Um...
Okay, so I can't show you that,
but what I essentially did
was I put my entire wrapper
inside of Compiler Explorer.
I wrote the same wrapper
basically in C and C++
so I just called my EFI function and in C
I create my output variables on the stack.
In C++, I just called my
wrapper and then I check
my error code in C.
So in C if something failed,
I print out an error message.
If it's C, I print another error message
and do the same in C++ and the result
was that the generated assembly
code is indeed different,
but only because the compiler
inverted the jump instruction.
So unless you want to take
something like launch prediction
into account, it doesn't
really cause any overhead.
Right, so this was the
metaprogramming part,
but I still want to make
some things very clear.
I made a lot of simplifications
and assumptions.
So first of all, I only
have a 30 minute slot.
I can't cover everything.
Secondly, I'm repping C and not C++.
I made the assumption
that we are only wrapping
fundamental types and parts.
I believe the compiler
has a much harder time
optimizing things if
things are not trivially
copy constructible for example.
Also, the trick with the callable traits
only works because we
don't have any overloads.
I'm only repping C and C
doesn't have any overloads.
But you may want to add
overloads by yourself
with a future std::overload
when you're writing
a high-level wrapper, using
my wrapper for example.
Okay, so I've shown you a
bunch of metaprogramming,
but I still haven't shown you how to write
some more applications
that is not Hello World.
So we could write our own kernel now
or we could write our own boot loader now.
You just call ExitBootServices
and the machine is your own,
but I chose to render a couple of things
on the framebuffer instead
and because of time
I don't think I have
the time to show that,
but I can just show you
later after the talk.
But yeah.
First of all, I need to create an instance
of the graphics out protocol.
So as I said before, we are not sure
if our hardware actually
supports graphics.
So first of all, we need
to create for the existence
of some graphics adaptor and we do this
by first of all creating two wrappers.
So we need two functions,
locate_handle_buffer
and handle_protocol.
And with those two
functions, we can finally
create our instance.
So I call locate_handle_buffer
and if that succeeds,
I call handle_protocol
and if that succeeds,
I cast the result of handle_protocol
to EFI_GRAPHICS_OUTPUT_PROTOCOL.
And the nice thing is if anything fails,
I just print out fail
and I'm done with it.
And the nice thing is even
though we didn't have exceptions,
I could put in my error
handling inside of my slides
because I used expected.
Okay, so with our graphics
out protocol instance
we can create a framebuffer
where it's just iterating
over the available modes.
So different resolutions,
different call adapts, and so on.
In my code I just chose to
choose a very specific resolution
that is pretty much
available in every system.
If it doesn't work, well tough luck.
I just exit.
And yeah, we created our framebuffer.
We can actually draw to the screen now.
So in my case, I also wanted
to have some double buffering
so I emulated by just stack
allocating enough space
and then implementing two functions,
swap_to_screen and clear.
I'm also using std::fill and std::copy
because those don't have any
runtime dependencies either.
And with that I can render
things on the screen
and unfortunately I still
don't have access to the heap
because I was mostly
just too lazy to do it.
I could just probably reimplement malloc,
but I can't just open say
agilityf file from the hard drive,
do some mesh processing in memory,
and then render everything on my screen.
I only have my stack
that implicit surfaces
gives me an escape hatch.
Implicit surfaces give me
a functional representation
of my scene and with that,
everything is just a function,
our stack is allocated, and I
can just retrace that scene.
Unfortunately no time for details here.
I can show you the demo after the talk.
But yeah, this was
basically the talk on EFI
so I want to cover a few more things.
Everything that I've done
here was technically incorrect
and not comformant.
Partially because I made a bunch of hacks,
but also the standard.
Even if we have a standard
library available for EFI,
the standard says about
freestanding environments
that the available subset
of functions available
is pretty much just the subset
of the C standard library.
Now right now, yes?
- [Audience Member] So
that's not exactly true?
- Yeah.
- The way the standard
specifies the freestanding
mode is at least those headers.
So you're implementation
is absolutely allowed
to provide more than the bare minimum
that the standard specifies.
- Okay, so to paraphrase, he pointed out
that an implementation
is allowed to provide
more features from the
C++ standard library.
The standard just mandates
that it has to provide
at least more or less the C subset.
Was that more or less correct?
- Yeah.
- Yeah, okay.
So right now to fix that,
there is the SG14 meeting
happening right now and one
of the papers on the agenda
is Ben Craig's proposal for
freestanding environments
that tries to mandate more
classes to be available
in a freestanding environment.
Because when I'm writing this tuple,
why am I not allowed to use the tuple
when I want to make sure that it works
in every freestanding environment?
And there's another interesting
paper going on right now,
the Zero Cost deterministic
Exceptions by Herb Sutter
and those basically reintroduce
as something similar
to the exception specifications
and this will probably
allow us to write to clear
one's catch statements,
but everything will just
map down to a language
built in expected type so to speak.
So maybe in the future
we will still be able
to throw a catch even if you don't have
access with the heap or any
other language runtime features.
So yeah, those are my
references so you can check out
my code here on my GitHub
at mmha/efiraytracer.
If you want to learn more
about UEFI in general,
I definitely can recommend checking out
the OSDev Wiki on UEFI.
There is the Freestanding Proposal.
There is the Implementation
of Expected by Simon
and yeah, that's it.
Thank you. (audience applauds)
Yeah, so to much surprise we have
one and a half minutes of questions.
Yes, Jason?
- [Jason] At the very
beginning of your EFI main,
it looked like you had a C++ attribute
in your extern "C" EFI
main and I was just curious
if that was...
- So the question was about the...
Right, this gnu::ms_abi thing?
- [Jason] Yes.
- So the thing about GCC
is that most attributes
are also available as C++ notation.
So you can also write __attribute in GCC,
but I think that's ugly and I just chose
the C++ 11 notation.
- [Jason] So your compiler
was fine with that?
- Yep.
- Okay.
- Yes?
- [Audience Member] So
one thing you can do
if you use clang for it
is to build an ELF file
and object copy. (audience laughs)
It actually works and
you don't actually have
to worry about dealing with
Microsoft performance limits.
- [Audience Members] You
have to manually relocated
the symbols if you do that.
I know because I just did that.
- [Audience Member]
Object copy works if you--
- So the comment was
that I can also use clang
by using object copy?
That's even more hacked, I won't do that.
- [Audience Member] That's literally what
that program was created for.
- Sure.
Okay, yeah?
So thank you very much.
