we had this idea of a machine having or a network card or a wifi card having a "mac" address
which I understand to be a unique address to that
dev... not necessarily that device
but certainly to that network interface (that's probably the best word for it is it?) So, the question is why do we need IP addresses if we've got mac addresses?
It's an interesting question, is because they do different things I'd say's probably the glib answer to it
A MAC address looks something
like; six bytes, would be an ethernet mac address
...there are some details about the
first few bits at the start of this, which
can be used to indicate things like this is
multicast and broadcast couple of things
and then you've essentially got: These
indicate the; well you can think of it as the
manufacturer , so you apply to the
standards body I think it's the IEEE to get a block
You pay them some money and you get a block and then you can mint your own mac addresses from that
and then these would refer normally to the station on the network on the ethernet
so that gives you a fairly big address space, you can address quite a lot
in six bytes. So it used to be the case
that these were burnt into the hardware
of the ethernet card back in the day when
it was all physical things with
physical wires, there'd be something on the card
which had this hard, hard coded into in, in the hardware
nowadays and I mean for several years
particularly with WiFi it's been possible to set this through software in many cases so you can't
in fact guarantee one-to-one mapping
between the network card, which is what it used to be
what it identified and the mac address
for its you can change the mac address
network card particularly in wifi - was one of the reasons why spoofing on wifi is so much
is actually easy than people think. On the other hand an IP address is a four byte number in Hex that
might be something like that. I'm not
gonna try & translate that to decimal in my head
an example of a decimal one would be 10 dot zero dot zero dot one. So that's a dotted quad notation these
are bytes, each of these goes from 0 to 255 - When you have Hollywood doing films the
number of times that, erm CSI's a particular example of this as well, the number of
times that people will do trace routes and other such things to IP addresses that
start, you know, 700 and something and it's like, it's completely meaningless. [It's a bit like the Hollywood phone numbers: 555...]
Yeah, Possibly - maybe it's been done deliberately I would like, it would be nice to think it was done deliberately so you've got these two
different address formats and they're doing two different things so this used, would
be used to identify the frames for you
on the ethernet that you were connect physically connected to
so for example on the ethernet frame
this is why you start out with the
destination address then you have the
source address and then you have the
protocol fields and it's because the
first bits that hit the interface card
should be the destination address cause
the card needs to make a decision.
Should I receive this will not. Or should I just start ignoring it
[So this is card sitting there on what is
effectively an open phone line...
... and it has to know conversation is to do with...] Yeah, when you
go back to first versions of the ethernet
you have Ethernet card sitting on an ethernet they'd have; they were called vampire taps
coming off them (I think I was the first version of this) and then you have a much thicker
cable connecting them and it was called a
vampire tap because literally you screwed it into
the coax and it broke through the
sheath and hit the core in the middle and
that was what connected the card to the cable
so you'd have multiple stations
electrically connected to the same physical line and so there'd be signals passing up and down
this line and each station needs to know should I listen to the signal or not. Do I need to start
receiving it, turning it into bits, passing it 
up to the computer, into the memory etceteras.
You wanna make that decision early as back in the day, things were expensive and things
were slow so you put the destination bit
first and then the card can make a
decision very early do I need to keep
listening and put this into memory or can i
just ignore it and it kinda carries
through into technology today, even though it doesn't matter so much.
If you look at an IP address then the first
fields, ah well actually, the first fields you
get, you've got version, you've got header length and such like, but the first fields in address terms
you get the source and destination interestingly - so it's the other way around because by the time you got to this
stage is maybe a bit less important
and you're gonna have to do quite a lot more
thinking anyway about the thing about
the packet coming in so you don't need
to make this very fast decision to
decide to ignore it
the other thing about IP addresses is
so ethernet starts out with the idea
that you've got a bunch of stations
attached to a particular cable IP is
with the idea that you got multiple
networks with different things in them
may be different technologies and you
want to start inter-connecting between
these networks so it's a much bigger
thing to start with you're trying to build a
much bigger network than you're thinking
about when you're building an ethernet where you've got
generally things quite local this is known as a
local area network a LAN
Nowadays you'd think of different sizes of this, you might think of a Metropolitan Area Network
or a MAN, you might think of a wide area
network or a WAN but yeah it's a much bigger network.
and so you got a lot more information in
some sense to encode if you try using
MAC addresses they have no structure other than this manufacturer, 'n this station ID and so if
you want to look up what to do when
you've got a ethernet frame that's arrived
and what you, where you should send it
you just need a big table of everything
which on the internet is completely
infeasible - you've got no structure. With an IP address
you can do
prefix matching so you can represent
blocks of IP addresses with a single
entry so you no longer have to have a
table with two to the 32 - about four
billion entries in it to make a decision what to do - you match it against the
entries that you've got, which cover ranges and so because it's got more structure embedded in it, you can
sort of compress that information & you can make that look up process quicker and you
can help it to scale out and that's why
ipv4 has this ability to do this global
network where is it will be quite
difficult to run a really global ethernet just
relying on standard ethernet style protocol do that. Wouldn't work so well and that's why then
IPv6 then has, one of the reasons it's got this bigger address space - so it's got 128 bits per address rather than
32bits per address. Gives you more addresses but then you have this ability because they're
structured; the possibility to compress
them down, so you don't have to have
two to the 128 entries in each IPv6 router, 'cause, that's a lot. [So the IP address is a bit more like a street address...
OK, you can break it down into city, street, number. Wheras a MAC address might be more like the phone...
number for that house. It would be hard to find that house from the phone number, & therefore having
a huge list of phone numbers is really not useful?] Yeah. The analogy's not that far away, I
think,  & certainly the idea that you got
structure in the address and structure
that allows you to compress the entries
allows you to make decisions from far
away as well so, so commonly in an IP
network some of these routers will have
a complete table so they have, I think
currently we're running at about something over 500,000 entries in our table even given
the fact you can compress these down &
express ranges rather than having to
express each address individually but some of these routers, particularly if they're
routers in the middle of a network, may not, for whatever reason, erm so they may have default routes
for example, so that may be the case
that you've got two hundred thousand
entries and then you've got one entry which covers everything else, and so that's the default route. If you're on a smaller network
that's connected to a much bigger
backbone network quite commonly you'll end
up with you got some addresses for the
things that are inside your network and
the things your customers connected
directly to you then you might have a
default route which says everything else
that's not want your own customers goes
up to whover provides you with service and so you can sometimes have interesting cases
where if for example this smaller network
is multi homed as they call it. So it's
connected to multiple bigger networks and a mistake happens in the configuration you
end up being used as a through route for
some reason and this is then, well it's not
good for you because you're now carrying
probably a lot more traffic than you can
cope with and it's not good for anybody
else who's connecting to these two large
networks because now their traffic is
taking this completely unnecessary hop
going further than it needs to taking longer than it needs to get there, perhaps going through a
bottleneck getting slowed down to get there so it's generally it's a bad thing
[Sat Nav sending HGVs through a...] Exactly, yeah and if you make
a mistake that route suddenly becomes visible to the Sat Nav and suddenly all the HGVs go ah, it's really fast to go that way
It looks like it should be really fast to go that way. In practice it won't be because now they're all piled in.
So what have I done, I've done a sorting of the data and the approach I've done is something based on similarity measures.
Document four has 18 "my"s and
five "horse"s and document five is only
about the word "my" not about horses and so the first thing you would do....
