Paxos is a really fundamental algorithm in distributed systems. It was invented over a decade ago.
by Leslie Lamport
who is very famous - won the Turing Award which is like the computer science equivalent to the Nobel Prize.
And he has had tons of contributions this is really
one of the most important ones he did
during his time doing research
It's for getting computers who can talk to
each other to agree on something so
computers just like your computer at  home
they sometimes fail sometimes the
network gets cut off and yet we still
need to have a reliable system over that
so for example with your bank if you
transferred money from one person to
another and then the money kind of when
are your account will never arrived at
the other account because a computer somewhere had failed
you wouldn't be very happy about it so we need to be
able to reliably reach agreement between
machines even though they're silly and they
fail and they break and networks fail. Paxos has
two kind of main phases and the key thing to Paxos
is in each of these phases you have to
have the majority of nodes coming to an
agreement before we can move on so that
means in a system of 5 nodes you need
three nodes to be in agreement before we
can move on - so its democracy right - yes absolutely
and its super useful because it means that if
some of the nodes are down you just carry on
without them and that's absolutely fine
when they come back up if they come back
up we can help them along and tell them about what happened. Say
you're a client for example you're a person
who wants to get a lock and it's super
important for the system that only one person can
ever get this lock we might have a
system here of five nodes and we might have
a client over here let's call that alice and she wishes to procure some
super useful lock and we have to make sure
that only one person can ever have this lock. so she asks the nearest
node hey can you get a hold of this lock for
me. In this system we have five nodes and
so we need three of them at each stage to
agree before we can move on and there are two
main stages there is the promised stage and
there's the commit stage the node's
gonna generate a unique ID we'll say
their ID is 13 and they're going to
contact every other node and say hey
guys I want to make some progress on
this will you agreed to let me do that? And
they say yup that's absolutely fine and
they remember that they've agreed so I'll call it
P for promise that they'll remember that
they promised 13 would be the node that's
in charge of getting stuff done okay and
they're all happily - move freely
they've all  happily agreed and  so we can
move on to the next stage. In the next
stage node 3 on behalf of alice is gonna
contact everybody and say hey everybody
can you commit ID 13 the fact let's
say alice is the person is going to have
the lock and all of these guys remember
that it is alice. I'll just write a little "a" for each of them who has it, and they also
remember that it was 13 because the  ID and they'll
respond and say okay and now
node 3 will say alice here you go you can have the lock you can go on do
whatever you want to do and regardless  of
what happens in this system in terms of
failures we'll never give anybody else
that lock ok that's entirely yours - so
just give an example this lock is maybe
access to her accounts or if it were banking or something like that - yup yeah it could be access to an account it could be for example the ability to write to a file
so often in code we have regions of code
where we want to ensure that only one
machine person only one person is
ever executing that so maybe they're editing
a file and you don't want multiple people trying to do this and the
other thing is like the bank account you
might want to kind of have the lock you
know I'm the only person who can change this amount
right now and then when
you're finished give that up lots of super useful and they're
really kind of widely used abstraction so a
simple thing like this can then be used
loads within systems so you know here alice is a person but often alice would
be you know a device thats operating on behalf of a person. lets have a
look at what happens if one of these
machines was to fail for example actually
let's make it worse let's make two machines fail so here comes alice she wants to procure this lock she talks to node 5  and says
hey can you get this lock for me
so node 5 is gonna send out a message to
everyone saying say their IDs 15
we promise to let me in with ID 15 these
guys will respond and it will respond to itself saying
that's okay you can go ahead with that and they'll remember they promised 15 now this is now this is a system of 5
nodes as I said we need a majority and we
have a majority here because we've got
node 1 node 4 and node 5 that's 3 nodes so despite the fact that these two nodes are completely  offline that's
that's fine we can carry on five can then send
out a message to everyone again saying
can you commit the fact that fifteen that's
gonna be Alice and then all of these
guys will remember its Alice who got the lock
with ID 15 now in the future
these nodes might come back up hopefully you know sent for an engineer and they've come
fix the problem for you and nodes 2 and 3 they've recovered don't know anything about what's going on so
maybe someone else comes along
say to 2 hey i'm bob and I would like that lock
please node two is gonna send a message
to everyone saying I would like to be in charge
for 12 and each of these from before have
remembered that the last person they've promised
to had ID 15. Once you've made a promise, you can only break your promise to someone
with a higher ID than the ID you
promised so these numbers go up and up
over time so in this case 12 was smaller
than 15 so these guys are gonna say no
actually three will say yes because they don't
know what's going on with the system
beforehand so they'll say 12 and everyone else is going to say no. So in this case number 2 is going to have another go, try again, and they are going to generate
a new ID and they're gonna make sure that their
ID is bigger than any of the IDs they saw earlier. Say its
gonna be 22 now and try again now saying to everyone hey its 22 can I be in charge of what's going on. and 22 is higher than 15
they're actually all gonna come back and say yeah that's ok but the really important stage here is that if they
have committed a value before when they say
okay they'll include what the value was so
in this case it was node 1 4 and 3 when
they say okay they will also say that
previously it was agreed that Alice
had the lock with ID 15 and so that will be this guy,
this guy, and this guy will all say that. - So is Alice stuffed at this point?
Is she stuffed? - (laughs) No, not quite yet. - She will be though (laughs)
- well its bob that's got the problem here. - Oh ok -
Because it's Bob that won't be able to get this so - ok -
even though 2 was trying to operate on Bob's behalf and get this lock, uh Bob the 2 has just found out
out that actually Alice had already been
agreed so what two will do is instead of
sending out commit for bob, 2 is actually going to send out a commit for alice to everybody and say commit
with its number which is 22 Alice
- Ok you've thrown me there. - (laughs)
so the idea of paxos is that in the end everyone needs
to find out about what happened so in
this example two and 5 weren't aware of what
happened in as a side effect of two
trying to do something which was
conflicting, two has actually found out
about what happened and they've told
everybody so five after they recovered
didn't have a clue what was going on and
now they know actually it was alice that has got the
lock so by going through that extra
stage before you commit anything we
ensured that bob couldn't have got the
lock - so the thing about having the higher ID 
was just him trying it on was it or - yes so you - if it
hadn't been committed if it had just been promised
would he have gotten the lock then? Because he has a
higher ID or was that - if it had just been
promised he would have been able to yeah, so
what you basically do is you once you
get a promise from everybody if nobody has
told you that anything's been committed
you're free to choose whatever by you want to commit, however if things have gone wrong for
example maybe the node that was actually trying to
do the commit might have failed midway through
you might end up getting one person
telling you Alice had it another person saying Bob had it
and another person saying Charlie had it and the importance of the IDs here is that whoever has
the highest ID with it that's the
one that you then disseminated everybody
and say charlie is the man who has the
lock let's have a look at what failure
midway through might actually look like
this is node 3 and we've got our friend Alice still trying to get
her get her lock in the system so as before node three's gonna ask everyone to promise.
with our ID which we will say is 13 and everyone will remember that they promised 13 and respond to say
okay and then three might message these guys and say commit for Alice in thirteen and then node 3 might fail
we have to work out what do we do. Did alice ever get that lock? Did alice not get that lock? So the message was sent to nodes 1 and 2 before 3
failed so these guys are going to agree and send back that these messages won't matter because 3 33 has unfortunately has failed
so we have to kind of look at this and work out what to do about this so in the
future
someone else might come along and try to
get this lock and if say this is 4 trying to
do this when they gonna send out their promise and they're going to try to send out their commit
and when they do this they gonna find
out that people have already committed
alice and therefore they won't give it away again instead of just tell everybody
that it was alice who was successful - Would there be a majority there though, if there were 4 nodes and 2 of them...? - The interesting
thing about majorities
is if you have, say, a set of 5 nodes and if you need 3 for a majority, any majorities will always overlap. So if you ask and if you get
you get a majority and just one of them says
this thing happened that they could be a
majority that exists where that happened. So just one person saying something happened is enough. - I see
because it's committed it must have been a majority at some point - yeah absolutely - got you, ok, and then
does alive ever find out about that if 3 has died? (laugh) - So paxos itself is just kind of
really kind of basic idea and you build
systems on top of it, so it is up to the system.
Um, usually what would happen in this
scenario is alice would try asking
someone else can you do this for me and find out that it was successful
- so basically this is, uh yeah, you might implement it in various ways, is that what you are saying? - Yeah yeah absolutely, um, and the big thing as well is this is just
agreement over a single value but in practice
we want to reach agreement over many
values we want the value to change over time you know you want Alice to be able to get the lock and
then return the lock, and because of that, what we use is something called multipaxos, which is like this but multiple times
basically and the really interesting
insight here is that this took two stages 2 kind of
round trip times across the network
but the first stage when we were sending
out these promised messages none it
didn't matter that it was Alice or Bob it didn't
matter what was going on so what we can
do is we can do this first stage before
any requests are even there, and by doing that when Alice comes along if the nodes
have already done the first stage is just one round trip time - basically, they are putting in a bit of
groundwork - yes - before anyone even asks so if
I had a user here and they wanted
something would you be alright with that.
- yeah basically, so when we when we did this for the first time there was no, there
was no kind of thing that said it was alice when we did that first route,
so they can basically do that in advance before they even
know, and then when someone comes along and asks for it, then they then they can get it, but the restriction there would
be at the moment alice could ask anybody and
they could do it and it would take 2
kind of phases or 2 round trip times if
you if three kind of did their
preparation work first then alice would
only be able to go to three, but if they did go to three,
then it would just take one round trip
time. When we do this bit of work, we call
this kind of electing a leader so you're
basically saying to one machine you're in
charge so you can in pretty much any setting
you can say you're gonna be the leader, you're gonna be in charge
that could be one of the people, one of their machines, one of the servers in the cloud and then
you send the all the requests to that one and then they distribute it like this. - What is probably one of the
most common places this is used? -  I would actually say this example is a really good example of where it's used,
so in big distributor systems it's really important to have this idea of locks so this
idea that only one person can be doing
certain things at a time we like to
think that systems are really parallel and we have lots of things going on at once but at
some point usually we need to say actually only one person is allowed to edit a file at that time only one
person is allowed the permission to
access this key at that time so locks
in a distributor system is a really good example of where you might, where you might need something like this
The problem is that any of these connections if these are all using HTTP then there's no way for
the browser to verify the data that's
coming back here.
