- Web and C++ are not an iconic couple.
My name is Lukas.
I work as Full-Stack
Software Engineer at at PPRO.
Today we try to understand how C++ fits
into a modern web service.
How did C++ never find widespread
adoption in the networking segment?
A market dominated by C,
both no JS and Python
essentially implement a
networking stack with C.
Network applications being
prone to malicious input
makes this particularly ironic,
given C's security track record.
The vast majority of
today's security breaches
boil down to someone
blindly calling memcopy.
The days where such thing was
purely a stylistic faux pas
are truly gone.
Today these kinds of bugs
have an ever increasing impact
on real people's lives.
We all know modern C++ tries to
address many of those
historic shortcomings.
Surely this translates into applications.
If you got questions just raise your arm.
First we try to see where
C++ application makes sense.
Then a quick look at the client
side of a web application.
And finally we'll compare
implementing and example servers
using popular C++ libraries.
I'll assume that you
have some knowledge about
networking applications and of course
the stock is no definite answer.
It's merely subjective observations.
If all you want to do
is serve static content
and those five lines, oops, okay
those five lines of Python
will most likely be enough.
Import HTTP server from
the standard library.
Create a socket server listening on a port
and start event loop.
Hosting something like
a blog does not need to
be complicated.
In this talk, I want to
focus on web services
quoting W3C, a web service
is a software system
designed to support interoperable,
machine to machine
interaction over network.
What makes web services so popular?
The service is just one click away.
Basically everyone can use it.
The tools to interact with
the application are installed
and widely used on the vast majority of
today's consumer platforms.
That leads us into web development.
I want to take some time,
talk about advantages
and challenges of modern web development
coming from a C++ background.
Let's talk about advantages.
Feature-rich ecosystem
with NPM at its backbone.
A wide variety of tool coming from the
assemble at yourself world,
that a c++ tooling, the feeling
of just writing the word
debugger in your code and that followed by
a sane and intuitive debugging experience
inside your browser is a
welcome breath of fresh air.
After years of fighting tools to achieve
seemingly simple goals.
And all that without having
to install anything special.
Frameworks, react, angular, you name it.
There's so many that even
sites like to do MVC exist.
Okay challenges.
Given a system that was
never intended to be used in
today's manner and even
less at such scale,
no matter what you do
at the end of the day
your tool chain will be
spitting out JavaScript.
That leads us to a lack
of interoperability
with all the languages and tools.
However that might be a
completely different story
once webassembly gets thumb bindings.
The never ending story of browser support.
This brings us to the development cycle.
After editing your source files,
you compile to JavaScript,
oh wait a second,
okay, you compile to
JavaScript of a target
ECMAScript version and plain css.
The S version usually
dictates browser support.
However, even older browsers
can support modern features
with the help of polyfills.
Polyfills try to emulate missing language
or library features.
Usually they first check
if there's native support
and then use the appropriate version.
Of javascript you have the unique problem.
You need to send source code over a
resource constraint channel.
So there's a notable advantage
of shorter variable names
and no whitespace.
Of course we don't want
to write source code
where all variables have
single character names.
Okay now comes the step
where we use a tool
to uglify our code.
The name says it all.
It makes our code more ugly.
Popular choice is uglify JS.
A passer with the goal to produce
the shortest possible
JavaScript while it's
retaining the same logic.
On their GitHub page, they have an example
where the original file is 450 kilobytes
and turns into 220 kilobytes
from mangling enabled, Gzipped.
The original is 109 kilobytes
and the minified version
only takes up 73 kilobytes.
And finally we bundle the produced files.
This comes down to application preference.
Some people like single page applications.
They bundle all their
logic into a single file
often called main.js.
With this approach, we
can quickly interact
of the application, as
navigating to a different section
does not require fetching new source code.
However this quickly
leads to bloated websites
with a horrendous first load experience.
The average website these
days sends you more bytes
source code than you'd need for the entire
doom binary.
Another approach is to
split up your application
into a set of sections
and bundle only the code
required for each section.
That can help to drastically reduce
the average bandwidth usage.
To give you a practical
showcase of the technology,
we'll be talking about let us create
a small example service.
I want to focus on two major requirements.
Persistence, user interaction
will have lasting changes.
Otherwise it could just
chip a client only solution
which could be much easier
and we'd save server costs.
Bi-directional communication,
you want to keep
the user up-to-date without requiring him
to proactively request new information.
Okay now comes the moment
where you have to make
a leap of faith to trust me,
to visit this non HTTP address.
Anyone feels like doing it?
Also go there so that we
can take a quick look.
There we go.
As you can see it's a
small straw poll service,
and to stay impartial,
I'll vote for Turnip,
but I'll... Okay that's unlucky.
I'm not sure what happened there.
Probably let me reload
it, and see what happens.
Now can some of you go there
and try to vote please?
You can see it should
update automatically.
Okay there we go.
(participant talking off mic)
That's okay.
There we go.
Okay maybe one more vote
and then we'll continue.
Okay so I think you get the idea
people can vote and we
directly see what happens.
We don't have to like this side does not
fetch new information.
It's the server sending information
and the site reacting to it.
Let's go back to foundation.
Okay.
Persistence is easy as
soon as you have a server.
Bi-directional communication
is a little trickier.
Who of you has heard
about WebSockets before?
Okay so like half the audience, nice.
And how many of you have used
them in applications before?
Okay so that's like 10% or so now.
For those new to WebSockets,
they are thin wrapper around
TCP, introduced by HTML5.
Of course we don't want to
reinvent the wheel ourselves.
We'll use a WebSocket library.
Rather than using one library,
I want to compare three
popular open source choices.
Beast, uWebSockets, and IncludeOS.
We'll compare their
interfaces, the code needed
for example service, attack vectors,
and of course performance.
The three main requirements
are a request handler.
Each user request should
trigger a call to our function.
We want to send a message
back to the client WebSocket
and finally we want to broadcast a message
to all open WebSocket connections.
Let us start with a
batteries included x library.
uWebSockets directly provide support for
our three requirements.
The setup is easy and straightforward.
We construct a hub,
attach a request handler,
bind to a port and start the event loop.
Each time a user sends a message,
our request handler is called
with the WebSocket itself,
pointed to the message
buffer plus its size
and the message opcode.
Usually text or binary.
Before we continue we have
to talk about serialization.
Both the client and
server need some kind of
shared language.
Of course we could use JSON.
That would lend itself very
nicely in the front end.
JavaScript objects are
basically JSON objects.
However JSON and C++ requires a library
and we lack validation,
not to mention performance.
That's why I decided to use FlatBuffers.
FlatBuffers is a schema
based serialization library
and what makes FlatBuffers great
is the virtual lack of
deserializing and serializing.
Instead of deserializing
the message and constructing
an in-memory representation,
we offset jump to the
place we care about and
just read the memory.
Compared to JSON, it can be
three orders of magnitudes
faster and less memory hungry.
FlatBuffers are usually used
with a schema representation.
A schema allows for more
efficient message encoding
and also validation.
All this makes for easy
out-of-the-box message validation.
As you can see here,
however this does not
validate application logic.
It can tell us if a
vote is present or not.
It doesn't know whether
a vote is in bounds
or if the client has voted already.
Here see the entire schema
for our user requests.
FlatBuffers supports pods, structs, ENUMs,
unions and tables.
In this case, all we need is a TypeEnum
representing the request
type and optionally a vote.
Back to you WebSockets.
In order to send a message,
we need a WebSocket.
The message pointer and size here wrapped
with FlatBuffer ref and the opcode.
This call is synchronous.
That means the function will return
once the message was sent.
Sending a message to every
open WebSocket connection is
very similar.
Instead of calling send
on a WebSocket object,
we need to hub.
GetDefaultGroup returns a group object
representing all open connections.
Conveniently this group
object provides a method
to broadcast messages.
The call to broadcast and send
share the same signature.
Finally to tie everything together,
the request switch.
After validation we switch
based on the request type.
This works given that
request type is an ENUM.
To keep things simpler,
we'll not talk about
the poll data object for now.
Just note it takes care
of creating and managing
our FlatBuffer objects.
In case of a poll request,
we send back our constant
poll object.
In case of a result request,
it should come with a vote.
You try to register a vote,
and give it two callbacks.
The first should be triggered
if the vote isn't valid.
The second if the vote was
registered successfully.
The vote was registered successfully.
In our case, we want to
broadcast the updated
vote array.
Otherwise we send an error
stating unrecognized requests.
Of course we need a client.
In our case a static web application.
The two requirements we
have for our client are
WebSocket support,
technically not part of any
ECMAScript version.
Yet however has been implemented in nearly
every browser by now, giving us 93% of the
global web browser users.
And we need support for our choice of
serialization language.
FlatBuffers currently ships with bindings
for eight languages including JavaScript.
Okay, again we will
take a look at a setup.
First we create a new WebSocket.
Set its binary type to array buffer
and then add our event listeners.
A WebSocket connection usually has
three event types.
Open, the request triggered
once the HTTP requests,
upgrade and handshake were successful.
Message, the event triggered each time
the server sends a message.
And close, the event
triggered by either the server
sending a closed frame or
a failed ping-pong request.
Again we switch based on an ENUM,
this time of the response type.
There are three non response types.
Poll result and error.
Unless there's an error,
we'll just trigger
the val.errand object up there.
Okay join me on a short
expedition to the land of JS Wut?
Let's take a look at a highly
complex task of sorting
an array containing numbers okay.
What does options contain now?
Anyone willing to guess?
One two ten then? Okay.
No it turns out it's one,
10, two, and 21 okay?
Yup obviously sort,
reads numbers as strings
and compares them based
on the Unicode points.
Okay let's give it a lambda.
We want to sort right
like, this looks very
similar to C++.
We just tell it okay please
sort it based on those rules.
Okay so what does it look like now?
No it's either of those two depending on
your browser implementation.
The standard sort expects
a comparison function
to turn either a positive
or negative number
indicating larger or smaller.
So what we need to do
is subtract the numbers
from each other.
That'll do the trick.
Yes okay of course this
has historical reasons.
Like if the people that are
in charge of today's standards
would get to design this
again I'm pretty sure
they wouldn't do this
but like okay.
You thought C++ implicit
conversions are bad?
Yeah okay so what does this
valid piece of JS evaluate to?
Okay (mumbles) okay.
(audience laughing)
Forty-two? No.
(participant talking off mic)
Well let's find out right?
okay let's yeah let's
copy it inside the browser
so we can open to the console.
The string it's fine, oh it's not really.
Okay.
It turns out you can express
the entirety of JavaScript
with just six symbols.
Okay back to the occasionally
less insane land of C++.
Let us take a look at
our second high level
WebSocket library.
However IncludeOS is much
more than a WebSocket library.
Who went to Alfred's
talk earlier this week?
(mumbles)
For those unfamiliar with IncludeOS,
it's a young unikernel project.
It allows you to create applications
that are compiled into bootable
special-purpose operating systems.
All the relevant pieces
of the OS aesthetically
link into your application.
Result can be a bootable
image of networking stack
the size an area of a megabyte
that boots in milliseconds.
And to why someone would want this? Later.
First we include irrelevant OS components.
Retriever reference to the
OS internal network stack.
Attach our request handler
and bind to our pod
and again start the event loop.
This time we have to trigger
the HTTP upgrade ourselves.
We try to upgrade if
the request is both get
and comes from a WebSocket address.
I'm not entirely sure this is 100,
like our C compliant just
saw it on their example,
might be that there's some edge cases
where this is wrong.
Our request handler will only be triggered
each time a connection is established.
So we still need an event handler
for the client messages.
This is what a simple Echo
server could look like.
For those of you thinking,
they're having a deja vu,
yes sending looks very similar
to uWebSockets interface.
Again we write a pointer,
size, and opcode.
IncludeOS does not have
native broadcast support.
We learn how to implement
that once we look at
the Beast example.
Part of the upcoming boost
versions, Beast is by far
the lowest level library of the three.
Setting up a Beast server
seems harmless enough.
A couple of includes writing, creating
an IO service, subscribing our events,
and starting the event loop.
Let's look at listener implemented by us.
Before we do that, let's
first revisit event loops.
To recap the event loop
model, we'll take a quick look
at a synchronous Beast example.
And our top level loop.
Each time we create a new TCP socket.
Accept should block until
we get a connection.
Once we have a new session,
we spawn a worker thread
and transfer ownership
of the socket.
And you can see it's an endless loop.
First we construct the WebSocket stream
by moving in the socket.
Then we block until the
handshake was resolved.
Again we go into an endless loop.
The read function should block until
we receive a client message.
Here we just echo back the message.
Okay, of course we want
to take advantage of
Azure's Async interface.
For those of you unfamiliar
with the concept of Async IO,
the core idea is rather
than blocking until
an IO event is done, to start operation
and to query the result state.
Usually this means that
synchronization bubbles up
to some kind of event loop.
We need some kind of loop that
resumes at arbitrary points.
Now the for or while are going to cut it.
What can a loop be reduced down to?
- [Participant] Switch?
- Switch, more low-level, more low-level.
(participant talking off mic)
Yes, go-to's, there we go.
Okay only problem as children
we've all been told about
the story of like the big bad go-to.
So maybe there's a
better way of doing this.
Using cyclical call graph,
you can see bar calls foo
and foo calls bar.
This has the advantage that
we can suspend our loop
at arbitrary points and give back control
to the parent, which then can
resume it at a later time.
Of that let's take a look
at the actual Beast example.
Of course to keep things manageable
there's a lot of code missing.
Constructing listener
starts the event loop.
Async accept takes the
socket reference to move into
and the continuation handler.
Strand is an executor.
Okay this creates a new session.
Note that on_accept
always calls do_accept,
which in turn calls on_accept.
Anyone sees a loop okay?
Okay so let's take a
look at we have something
that requires response to
all the incoming connections,
but we still need a session loop.
That's what we've done what we did earlier
with spawning a thread and
telling it to loop endless again.
So this is the entry point for our session
and it should look familiar.
It's basically the same as
the listener constructor.
It kick-starts the session event loop.
The session event loop is more elaborate,
so I split it up to three core components,
this being the start.
Here the reading part of our
event loop after clearing
our target buffer, we
subscribe read operation.
Once the read is completed,
our handler should be called.
In this case unread.
Note that Async, Beast and
Azure used a much faster
error code pattern instead of exceptions,
because in the synchronous example,
there was an endless loop
and how do we access that,
of course exceptions.
Writing looks very similar.
One thing to note, Beast expects to write
a range of buffers.
However, they will not be represented
as individual messages.
The purpose is to avoid
copying buffers if possible.
Okay orders might seem
a little overwhelming.
Let's recap the session.
Let's recap the session event chain.
On_accept calls do_read,
which in turn calls
on_read, which calls do_write,
which calls on_write and
that calls do_read again.
And you can see this closes the loop.
Yeah.
(participant talking off mic)
Okay so the question was how am I not just
growing this stack?
Well the function does not, okay,
you saw this strand.wrap
and then you add a function.
That's basically how
you do the continuation
and that's an executor
right, so you don't,
it's not a linear call stack
we adjust at some point
to do stack the exhaustion.
No no, there's this element
in there, the executor
that takes care of like
managing who should call when.
And at the end it bubbles up
to the IO service event loop,
which then takes care of
like I go down this path
at some point, our loop
somewhere in there goes, returns.
It knows how to continue but
that's the part of someone else
to call that function again.
Okay earlier, I mentioned
manual broadcast.
We store all our sessions
in an unordered map.
That way we can quickly
add and remove sessions.
A broadcast queues a message
for all open sessions.
Of course now that we add
messages from the outside
and sending as Async, we
have to take care of the case
we want to send a message
but there's a write
already in progress.
In our application add message
is only called by broadcast
so that we can avoid
adding the message either
the client hasn't voted yet,
or if there is already
a write in progress.
But we assume that every
write after a vote is
a result response, which should refer
to the most up-to-date result.
Flush message queue is
called both externally
and internally, replaces
the position of do_write.
Conceptually it starts
a write event chain.
That should deplete our message queue.
If this event chain is already in progress
flushing is a no op.
Notice this that as soon as
we queue a write operation
with Async write, we set
our internal state to
write in progress.
Let's look at the after right callback
triggered by the write operation.
First releases our conceptual lock,
then calls on_write, which
depending on the queue size
calls either flush message queue again
or do_read.
Similar story for reading.
I abbreviate it as in read call
but conceptually after read
is the same as after write.
For this model to work it's imperative
that we avoid fragmenting our event loop.
For each session there should only ever be
one chain of events.
The whole chain forms one big loop
and at the same time, some of its segments
may temporarily form a
smaller loop with each other.
However, given enough
time it should always
unfold to our overall session event loop.
Ideally having built an infinite loop
there's a way to exit it.
Having completed a read, it
can set the error code to
WebSocket error closed.
In this case we call listener
and erase the session
with our Associated ID
essentially destructing ourselves.
Things like the error
responses or the poll response
are created once.
Given that our poll data
object is owned by listener
we guarantee a pointer
lifetime for those objects
that is the same as the
lifetime of the server.
Using a synchronous interface
averts many pitfalls.
Before sending an object we serialize it.
then send and extend called blocks,
until the send happened.
And we destruct our object via write.
With Async interfaces, this gets trickier.
We have to guarantee a lifetime until all
write operations are done.
Okay shared lifetime,
many would now suggest
using a shared pointer.
Yes that would work.
I believe we can do better.
Let's see how we can
avoid heap allocation.
The only dynamic response
in our application
is the result response.
New votes update the vote array.
However we know that
serializing this object
will always create an
object of the same size.
By copying the new object
into the previous location
any of our pointers we avoid,
any of our pointers
avoid being invalidated.
We avoid heap allocation
and as a result, write will always write
the most up-to-date result.
That way we can reduce
the amount of messages
we need to send, saving both
bandwidth and local resources.
Okay to give you a quick
idea about the effort
required to use each
library, I ran our example
implementations through word count.
uWebSockets sits at a comfy 1.6 kilobytes.
IncludeOS isn't far away
with roughly three kilobytes
and Beast however being
a lower level library
results in nine kilobytes of source code
that we had to write.
(participant talking off mic)
Okay the question was how
many lines of code is that?
I think the uWebSockets
one is less than a 100
and for Beast is more than 400.
Let us now compare a
big productivity factor.
The incremental compile time.
Note results on the 90th percentile
compiled with Clang four.
uWebSockets is linked as a static library
making it a breeze to work with.
I couldn't get IncludeOS to build as debug
so I measured the
default settings from 90%
being under three seconds.
Still nice to quickly iterate with.
Well Beast being part of Boost
carries the burden of the template heavy
header only approach right.
In terms of documentation,
uWebSockets barely has any.
It's a little better for IncludeOS.
Still you'll find yourself
reading source code
most of the time.
Beast on the other hand has
excellent documentation.
To keep the example smaller and simpler,
all our examples were
purely single threaded.
That gives me little
experience in this context.
However Beast has clearly been designed
with multi-threading in mind.
Still very complicated
and easy to get wrong.
Not sure about uWebSockets.
I've read conflicting
reports like people claiming
it's amazing.
People claiming it's
like breaks all the time.
And IncludeOS currently does not support
any kind of threading.
Both UF circuits and
Beast have native support
for SSL via open SSL.
However it should be easy to
put an IncludeOS application
behind the reverse proxy like Nginx.
That then takes care of the SSL.
Looking at possible attack vectors.
Both Beast and uWebSockets
are regular binaries,
allowing for a wide variety of exploits
paired with a mature set of
tools for reverse engineering.
IncludeOS's unique execution environment
makes that harder.
What IncludeOS can't protect you from
is application flaws.
Let's say there's a bug
in our registration code.
That way someone could
perform vote manipulation.
A unikernel is not going
to protect you from that.
What it can do is limit damage of things
like remote code execution.
Many exploits rely on finding LIPC
and then using it to spawn a shell.
It can mitigate a set of
known exploit propagations
like the aforementioned system call.
Remember the biggest attack
vector is and remains
social engineering.
Okay as my first time benchmarking
a network application,
please take my results
with a bucket of salt.
Again results at a 90th percentile
from running uWebSockets
scalability test locally.
Unfortunately I didn't manage
to benchmark IncludeOS.
I ran the benchmark in two settings.
First with 100 concurrent connections.
No big difference in terms
of user space memory usage
both sitting at around four megabytes.
On to the connections per millisecond.
At this small sample size,
this shouldn't be all
that representative.
Now with 10,000 concurrent connections.
If it's extremely special
purpose implementation
uWebSockets manages to
have an impressively
small memory footprint.
Using only an additional
two megabytes for 10,000
concurrent connections.
That rate can host Google-esque traffic
with less RAM than your phone.
Beast does well enough.
At this time scale your
application is much more likely
to become the bottleneck.
This laptop managed to request
and serve 10,000 connections
in a third of a second.
Of that kind of traffic we
should be able to afford
better hardware for sure.
So who's the winner?
Which library is the best?
Wrong question.
Three libraries are presented.
All aim to achieve vastly different goals.
The better question is what do you want?
To summarize I'll associate
each library of a key strength.
If you want an easy, fast,
and lightweight solution,
take a look at uWebSockets.
Albeit being a young project IncludeOS
shows promising progress
towards being a solid choice
for security oriented services.
If it's aimed to be as modular
and adaptable as possible,
Beast comes with an inherent
complexity and trade-offs.
Given its aspiration to become
part of the standard library
it could very well be core of many, okay,
it could very well be core of many
upcoming libraries.
C++ is seldom the best at anything.
Python has more convenient syntax.
C reliably beats us and the
computer language benchmarks
game Rust has better type-checking.
Elixir has more convenient multi-threading
and all the new languages have
a central package manager,
facilitating a shared ecosystem.
So what's C++'s niche?
Libraries like uWebSockets
show there can be a nice
batteries included experience
yet an outstanding
performance and efficiency,
fulfilling the long-standing
promise of safe, high-level
constructs compiled down to
efficient low-level components.
Paired with static type
checking, no need for a runtime
and mature compilers and tools.
That's a sweet spot of versatility
no other language hits.
Of course C++ does not stand still.
Let us look at existing issues
and how they might be
solved in the future.
Most glaringly the
complete lack of networking
in the standard library.
Addressed by the networking Ts.
Lack of execution control,
addressed at Executors TS.
And a lack of suspendable functions
addressed by the Coroutine Ts.
Also a lack of Unicode support
but there are a couple of papers
but it does not seem to be
a unified vision software.
You'll find a source code
for both the presentation
and example here.
If you're watching this
video and you want to tell me
how wrong I was, here you go.
Okay let's take a quick look
at the vote result again.
Maybe some of you voted right now.
Okay I'll leave this up.
Who won?
Never, okay so we all hardcore fans I see.
Well you can vote, the
address is down there.
Like it's not particularly
hard, just pull out your phone.
Okay.
Any remaining questions?
(participant talking off mic)
- [Participant] This IncludeOS
not supporting threading map,
is that on (mumbles) or you
just representing single,
event loop on a single thread?
- Can you repeat the question please?
- [Participant] You said
IncludeOS does not support
any threading?
Is that part of the software or you just,
everything is in a event
loop in a single thread?
- Okay to be honest I think
until you reach the limits
like it's fairly lightweight right?
You don't do context switching in there
and everything is basically like delegates
calling each other.
Yes at some point you'll
reach a limit right
but I'm betting there's
like maybe two handfuls
of companies on this board that actually
reach this kind of traffic.
You can easily handle
something like 10, 20 thousand
connections at a time
with still a single threaded approach
but they have a plan
to add support for it.
There's a talk at meeting C++ this year.
We will talk about it.
- [Participant] You mentioned FlatBuffers.
Does FlatBuffers make any guarantees about
binary compatible between
release to FlatBuffers?
- Okay so the question is if there are
binary compatibility guarantees
made between releases.
Okay I'm not sure to be honest.
I know it has like they know
about these kinds of issues
and I think the answer is yes because
each field is optional.
For example in the table
each field is optional
and if you just add ones
you can even tag old ones
we've deprecated, but
I think they guarantee
the same level, but I'm not
sure about between version.
It's still a fairly young project.
- [Participant] (low audio) 1.4 to 1.7,
something like that,
that they guarantee that
you can mix versions to
become different platforms?
- I believe so yes but please look it up.
Okay I think that wraps it up, thanks.
(audience applauding)
