hi I'm Eric and I'm here today to talk
to you about path agnostic binaries Co
installable libraries and how to have
nice things if those words don't mean
anything to you yet that's okay it's
because I just made them up this talk is
going to be about introducing some
terminology and more generally it's
about how software is packaged and how
that could be easier so the first thing
I'm going to do is talk a little bit
about how packaging is not a solved
problem just in case anybody thinks it
is
then in the middle introduce some
terminology to help talk about things we
could improve that I'm going to talk
about a bunch of existing systems and
package managers with that vocabulary
and then last and sort of scattered
about there will be some techniques for
like legitimately GCC flags that might
make your life better some of this is
line broken little strangely because
resolution all right
so packaging is not a solved problem we
really need the installation of packages
to be easier and we really need almost
everything about the way we interact
with packaging to be easier there are a
couple of big things that point this out
to us one of them I think is the rise of
containers in the last couple of years
in our industry and this is something I
talked about a bit more last year
actually and there's a whole talk about
that that was wonderfully recorded that
you should go see but one of the big
points from that talk was containers
gave us the ability to install more than
one version of a thing and people liked
this a surprise so this is something
that we can do now with containers and
it's caused a huge popularity of this
system this release of energy and
enthusiasm I think indicates that we
have room to go with the way we package
things and I also want to ask if
containers are necessary to do that and
I think the answer is no containers are
something that made it easier for us to
install multiple versions of a thing on
a machine and that's good but this form
of easier might not be the only form of
easier in the world containers have some
other baggage with them the other thing
I want to talk about that makes me think
packaging is not a solved problem
is something I've been meditating a lot
on the last year so is that there are a
lot of distres in the world and despite
the fact that we're all people trying to
work together we're all open-source
nerds trying to make the world a better
place in some way we have great
difficulty sharing things especially the
binaries that we produce from any of our
distros they're basically completely non
portable in any sense of the word if I
have a new Beauty machine and a Debian
machine these have almost the exact same
tooling almost the exact same packages
but can I rely ibly copy a binary from
one to the other maybe yeah
what I bet on it no if I have Fedora and
CentOS it's the same thing they're
mostly rpm and yum and some new acronym
lately can I copy a binary know even NYX
and geeks which are two relatively
recent Linux distributions and are
extremely similar they use all the same
build tools all the same linking
conventions can I copy a binary
absolutely not this is weird because
we're all trying to work together so
there's some some quiet force which is
causing us to become vulcanized like
very small communities that are unable
to work together and it's happening
without our intention and I think we
need to ask a lot of questions about
this and I think these actually
strangely enough have some shared root
causes standards for composable
installation is the thing that we don't
really have I don't think we have enough
language to talk about what we mean with
portability and composability
and I think we should work on that so
here's an attempt and here are some
definitions I think we should talk about
the ability to Co install things and the
definition that I would offer for that
is anytime I install one thing
installing a second version of the same
thing should not be any harder than
installing the first thing was and that
includes using it not just having it on
disk but being able to use it this
sounds trivial but of course nothing is
easy
the other term that I want to introduce
again from the title of the talk is path
agnostic and that means a user of the
system the person who is installing the
thing not the package er not the Builder
should be able to decide where it goes
any binary I have I should be able to
take the folder that that binary is in
use the MV command and then keep using
the binary this should not be hard path
agnosticism is also really nice property
because it quite trivially gives you Co
install ability if I have a binary that
I can move then I can take other
versions of that binary and put them in
any path prefix that I want and of
course it's trivial to install more than
one version right and if we could do
this I think this would fix a huge
source of that tendency towards
balkanization that Linux distributions
often find themselves in so there many
ways that you might try to implement
path agnosticism and something that I
want to introduce early is things that
you can do and things that you should do
are not necessarily the same thing so
for example we already talked about
containers earlier and containers
broadly speaking are a form of cheating
they're a form of change routes and this
is something that works but it's
something that has a lot of additional
baggage with it as well if we use fruits
as a form of packaging well fruits don't
compose very well right I can package
precisely one thing in a change route
and then that's kind of it I have to
package an entire Linux file system the
whole thing all the libraries in one big
old monolith and this is problematic for
a lot of reasons it's quite opaque the
tools that I use to do this are going to
have a large amount of side effects and
all I'm doing is bumbling them in one
fruit and this doesn't help me
understand right it doesn't help me diff
it doesn't there's a lot of limits there
another form of path agnosticism that
you might be thinking about is setting
up some environment variables like
somebody's probably thinking LD preload
that's a thing you can do but I think
it's very questionable whether we should
do it because this causes lots of rapper
scripts to show up
it also doesn't compose very well
because if you set LD preload or any
environment like that that's trying to
make things path agnostic all the child
processes inherit that too and that's
probably not what you meant and this
just it doesn't compose very well it has
side effects you didn't expect so the
kind of path agnosticism I think we
should chase is having whatever is in
your binary in your filesystem it needs
to explain itself it needs to be
context-free without any other
environments and this is kind of the
harder one so for some systems this is
easy if you're statically linking a
binary you've only got one file and
making a single file thing path agnostic
is pretty trivial it's not looking for
anything outside of itself so you're
done but let's say for some reason or
another we are convinced that we cannot
statically link the entire world so
we're going to do some dynamic linking
instead now if I have more files things
are getting a little more interesting
because if I have like one main binary
in a package let's say and I have some
other files around it I need them all to
be referred to relative to that main
binary if I'm going to keep the property
being able to envy the entire directory
around that's easy right no not really
so let's talk about this a bit more for
a second what happens when you try to do
this in practice with dynamic linking in
the world as we know it if I look at how
bash is linked on my assistant right now
this is the readout that I get LDD a lot
of people might be familiar with this
but if you're not it's a thing that
looks at which dynamic libraries get
loaded when you execute this program so
a Maya system this is what bash does
these are absolute paths so right out of
the box we can very quickly see because
there's a slash here this is not path
agnostic if I move bash or if I move any
of these libraries it's not going to
work correctly
so where does this come from this is
kind of a quick primer on how the
dynamic loader works for anyone who's
not familiar with it already
so these absolute paths come from
nothing in the binary itself radial F is
something that will read the executable
headers out of the binary and tell you
what it thinks of them so here it's
showing me the same library names but
they're not absolute paths yet the
absolute paths came from somewhere
further for me they come from this
lovely place called a slash Etsy /lv so
Kampf and this is of course another
absolute path and so now we finally hit
rock bottom these are all of the further
absolute paths that the linker is going
to look at when I run bash so this is
this is how this all came to be so if we
wanted something to be path agnostic we
would want our linker to be able oh
these object files from somewhere else
somewhere relatives of a binary can we
do that yes it's just a little arcane
you might want to take a screenshot of
this because where do you find those
Docs
I don't know they're somewhere but take
my word for it that's a thing you can do
and this would give you a binary in
which now if you read the headers you'll
see the same requirement for shared
libraries and then this new flag appears
and radio F is telling us it's going to
look for this library run path our path
that is relative to the path of the
binary and if I ask ldd what it actually
resolves it will do something relative
so we can have path agnostic dynamic
linking
it's not commonly done but this works
this is a feature that's been in LD and
the thing that interprets your binary
dynamic links for years for ages in
every form of LD ever as far as I know
there are no Linux distros which use
this common layer but it's absolutely
out there like go run an LED on an
electron binary if you've got one or
three or more on your computer it does
this so let's consider that whole
problem solved what I haven't talked
about yet is how we should actually
organize sharing of objects again so we
can have paths agnosticism if we have
path agnosticism we contribute have Co
install ability and now let's talk about
raising the bar even further we want
path agnosticism and Co install ability
and to be able to share things so this
requires us to do a little more
organization and there's more than one
way to go about this so I'm going to
introduce more terminology um the word
I'd like to use here is display and this
is a word for like if you're selling
something in a store you're going to
spread out things for display so here I
want to use the word display to describe
the way we spread out any shared objects
or dynamic libraries in a bunch of
directories in some organized way that
we can reference there of course more
ways to do this then I can possibly
count but they can be grouped into some
distinct categories so these are the
three major different ways I can imagine
you would ever splay out ivories the
first one is what I'm going to call a
precise display and this is simply when
I have some library and I want to know
what path I'm going to put it in and I'm
going to hash all the contents of the
library and put it in a folder with the
name of the hash I'll probably use a
cryptographic hash for this because why
wouldn't I
this is probably sounding pretty
familiar we also call this content
addressable
this is a nice way of organizing
information because it's completely
automatic
it's basically immune to conflict and so
this going back to the reason we're
talking about any of this if your sites
play something constant addressable
trivially satisfies Co install ability
if I have more than one version of a
library and I add however many more
versions of library
I will never conflict so this means I
can automate everything with this
organization you have one of these on
your computer it's called git we tend to
like this for all the same reasons
because you couldn't insert an unbounded
amount of stuff and it never generates a
name conflict in itself this is also
like automatically decentralized since
you're using cryptographic hashes you
also get an integrity checking for free
this is just a really good place to be
but it's not the only way you could
imagine displaying libraries so another
way that you could go is of course go
full manual assign names to every file
that you need more than one version of
you can do this but another way of
saying manual organization is basically
you're always doing conflict resolution
and so I think this is very difficult to
say is Co installable and this is of
course kind of the norm if you're
thinking that this looks and sounds like
my libraries on my system yeah it
probably does if you do an LS in slash
USR slash Lib on your computer you're
going to get tons and tons of symlinks
like this almost distros if you're a
Knicks or a geeks person of course you
have a very different life but most
distress you're going to get this you're
gonna get this very manual organization
and so if I was going to install a new
version of a library in here I could
give it a separate name using my human
brain seconds there's nothing automatic
here but remember our definition of Co
install Abel explicitly said not just
have the files on my computer
but be able to use them and these
symlinks will at this point betray us if
we have a link which says library name
dot so dot for and then it points to a
more precise version this is no longer
co installable then if i want to install
a different version i can give it a
different name i can have the file here
but can I use it as easily no not
without performing active conflict
resolution so this is not Co installable
the other most interesting category of
things you can do is what I'm going to
call a property-based display and this
is if you calculate some property of the
libraries you're going to share and then
use that as our index whether or not
this is Co installable can be an
interesting question so we're gonna go
over a couple of examples of these in
order to try to figure that out one
common form of property base play you
might have seen is anybody who's doing
things with docker images it's very
common to have a shell script which when
you're publishing an image tags it with
the source code hash and this is
something people do because the hash is
already there because of get thank you
get and so it's very easy to do but this
doesn't capture a lot of things right if
I do my build again with a different
compiler that of course is not
represented in my git source hash so
that's not covered in Maya splay then so
this I would say is again not Co install
able if I used a different compiler and
I want to install that thing on the
computer as the other thing with a
different compiler I have conflict
resolution to do I'm gonna skip this
slide because I'm running out of time so
what if I got better at this and I came
up with a description of a property
where I have not just the source code
hash but I have all of the other
executables on my path all of my
compilers as part of my property
description as well and this is what the
nicks and the geeks distros do this is
really cool
so this big hash in here
includes not just the source code but
all of the other tool chains that were
used in building it this is still
distinct from a precise way however
because that hash is not of the content
this can still get in conflict this
would be equivalent to a precise play if
we could assume that all compilers are
pure functions and all compilers are
deterministic this is unfortunately just
not true
I could prefer there are some people in
the room laughing yes that's a whole
other talk there's a reproducible builds
project and a reproducible build
community out there who is working on
this problem and believe me it's a
problem so if we want to share libraries
we can choose any of these categories of
techniques but if you ask me please
choose precise it's by far the most
correct so now I want to get all of
these properties back together I want to
have path agnostic and I want to have Co
install and I want to have shared
objects if we'd have some binaries that
our path agnostic and we could have a
splay of all of their dependencies that
is path agnostic then we could move both
of these things around together and they
would still be path agnostic and we
would still have shared objects and
everything would be awesome but how so
since we just talked about Nix briefly I
want to use Nix as a further example
because they do some interesting things
they use our path much like the thing I
mentioned earlier where we can use
relative linking but they don't quite do
relative linking this is what you'll get
when you read the e.l.f headers on a NIC
system there's actually several library
paths and here you can see and they're
joined by colons this is cool because
it's close to Co install able if you're
ignoring the whole determinism of
compilers part but it's also not path
agnostic this still starts with a slash
and anytime there's a slash in a path
like we've kind of lost
some people say that Knicks in fact can
be installed in any path and that's sort
of true again should questions come up
here a lot these paths as we saw are
literally embedded in the binaries so if
you're going to install one of these
binaries from NYX in a different prefix
path if you're going to try to make a
path agnostic you basically have to
rewrite this header either by
recompiling the whole thing or by using
some tool that patches headers so this
is not path agnostic and going all the
way back to the concept of Auto
balkanization this is a fascinating
example because the Knicks and the geeks
distres are almost the same except this
path on a geek system is different it
doesn't have slash nicks in the front of
it so despite almost everything about
these systems being identical that's
slash it really gets in the way so in
the last 60 seconds because I'm not
talking nearly fast enough I have a new
proposal what if we compiled binaries
with this our path origin we put all of
our libraries relative to the binaries
but these could be sim wings they can be
the full content or they can be symlinks
this is actually the exact same thing
you can switch back and forth between
these versions of linking and never need
to recompile this binary so this is path
agnostic and if you bundle all the
library full text and make a tarball of
this it's also path agnostic if you need
to patch things on a system that is
arranged like this you want to replace
the libraries separately from your
package manager go ahead it's just some
links this is easy we should try this we
might be able to make tarballs of
software which we can distribute and run
without needing a distro if we wanna
share libraries we can have a distro we
can have a form of organization which
makes it easier to do that but we
wouldn't need it one of the reasons I
think this would be cool is if we wanted
to dump a bunch of binaries into a
content address of
system for permanent storage and sharing
we could do that if its path agnostic
you can mount it anywhere it'll run if I
wanted to say add a bunch of things to
ipfs I work with ipfs a lot I could do
this this would work but only if the
compile is path agnostic this talk was a
lot about C style linking and I'd like
to apologize for anyone who doesn't do
see things imagine this with Python path
imagine this with anything else all the
same principles apply this thing I
offered at the end is just one possible
way of arranging some symlinks and stuff
you don't have to love that solution but
I'd like to talk about these terms more
and I hope that these are useful
concepts for exploring how we compose
software thank you
[Applause]
I have actually no time at all for
questions I am so sorry but one more
quick mention there will be a hack fest
later in the week if anybody wants to
talk about documenting what these terms
mean trying to make more concrete
manifestos may be tools let's talk later
thank you
