PAUL COLLINGE: Hi there.
I'm Paul Collinge.
I'm a senior program manager
with the Microsoft 365 team.
I'm really pleased to be
here presenting to you today
on our Microsoft 365 Network
Connectivity Video Series.
And specifically,
we're going to be
talking around enterprise
network design for the cloud
era.
So, this first slide
you've probably
seen in the first
of the video series,
but we'll just go over it
again for context of how
it applies to this session.
As you can see on
the screen, there
are the four
Microsoft 365 network
connectivity principles.
These are the principles
we need to follow
to get the best performance
out of the Office 365/Microsoft
365 service.
So, in a nutshell, they are
optimize Microsoft 365 traffic.
So, we use the REST API
service that we have.
We mark the endpoints
in certain ways
that you can differentiate
how you treat them
because there are many
of them, and they all
have different properties.
We can mark them in certain ways
that you can send some direct,
and others can be treated like
normal web browsing traffic.
Second principle being enabling
local egress--getting that
traffic out, away from the user,
as close as possible as you can
with minimal latency.
That's so it can hit Microsoft's
infrastructure, which
is normally very close to
where that user may be.
Third, relatively similar
to the second but enabling
direct connectivity.
And this means
avoiding hairpins.
So, avoiding sending traffic
through a head office location
to come out to the internet;
avoiding VPN hairpins,
where we go in and out
of the corporate network
to get to the
destination; or sending it
through third-party
cloud services
to come back out to Microsoft.
And we'll look into what that
means during this session.
And finally,
modernizing security.
Obviously, this
network transformation
means routing traffic
away from our security
stack that we've designed
to protect our corporation
or enterprise.
The modernized security
for SaaS principle
is about how do we deliver
and technically improve
on that security while still
avoiding that infrastructure
that we've build for
the on-premises world.
Let's dive in.
So, this first
slide, if you've seen
me present over the last few
years, you'll find it familiar.
I like to use it to
set the scene here.
What is the challenge?
Why do we need to
change what we're
doing that's worked very
well for many years?
Well, this slide helps
articulate what an enterprise
network looks like
before that cloud
journey, in that the network,
the security infrastructure
is designed around
the on-premises world.
So that dotted line you see
on the screen is our corp net.
And inside that dotted line
is where we keep our crown
jewels--essentially,
the business, Our data,
our services, our applications
all sit within that.
So, generally from an
enterprise network perspective,
if we're on that
network, we're trusted.
If we're outside that
network, we're untrusted.
So, to get anything outside
of that, which is generally
web browsing, we have to go
through a security stack.
That means proxy service,
DLP, firewalls, CASBs,
intrusion prevention, and so on.
These are expensive
equipment designed
to protect that corp net from
unknown and untrusted endpoints
outside.
And that's perfectly
normal because everything
that lives within that
corp net is, as I say,
what we trust as a business.
Everything outside is untrusted.
So that means all our
network infrastructure
is built around that.
This stack is
expensive to implement,
so we tend to see these
in head office locations,
maybe one or two in
each regional location
of the world--you know, one
in APAC, one in North America,
one in Europe, and so on.
And then, we built a network
around that using something
like an MPLS network to
connect our branch offices back
to our head office
location to egress out.
So, anything we need to go--so,
web browsing--all to access
those resources that we tend
to host at our head office
location.
And also our roaming users
are built to deal with that
in the same way as a branch
office in that we use a VPN
in this instance to
route all that traffic,
web browsing--whether we're
accessing mail, file service,
whatever--we come
in via the VPN,
and then we can access
all those resources.
And if we want to go
out to the internet,
we hairpin through that
corporate network back out.
And that has served enterprises
very well for many years
because it's there
and it protects
that corp net, as I say, from
anything unknown, untrusted,
that might be trying to
attack it from outside.
Now, the problem comes when
we start to move to the cloud.
Microsoft 365 is often
one of the first things
that push this door.
So, we start to move those
crown jewels out of that corp
net--our data, our services, our
applications start to move out
into the cloud.
And the problem is, if we
treat the traffic connecting
to those endpoints
in the same way
as we treat a user browsing
YouTube or any website,
then we start to
run into trouble
pretty quickly because that
work, that security stack, that
back-haul to that
location where we egress,
it's expensive in
terms of latency
and the physical cost of
paying for that infrastructure
to deal with that.
The impact on Microsoft 365
generally is poor performance.
We're increasing latency as
we do all these functions,
as we VPN in, as we back-haul
from the branch offices,
we go through the proxy,
through the DPL device.
The end result from a
Microsoft 365 perspective
is that we get poor-quality
calls, sluggish behavior
in Outlook, slow uploads
and downloads into OneDrive
as examples.
We're going to focus here
today on Microsoft 365,
but this problem exists
for things like Azure.
For connecting to Azure
through this stack,
the same problems apply.
And then, there are other
things to think about.
There's been a shift in the
way that we deal with Windows
and Office updates.
Previously, we'd roll out a
service pack once every year,
couple of years, and there'd
be a slow, measured approach.
But in the cloud-first world,
everything's evergreen.
We provide updates
on a regular basis
to mean that the service is
always evolving and improving.
So, that means that fairly
large update packages can be
pushed out on a regular basis.
And what we've seen
with some customers
is the MPLS circuit there,
for example, or the VPN
concentrator in
the times of COVID,
where everyone's coming in
through there at the same time.
That can be an enormous
volume of traffic
that we're having to deal
with all of a sudden that
can take away those resources
that are also used and required
for business-critical services.
And it's important
to note here--again,
we're going to be focusing on
Microsoft 365 today--but this
problem exists for cloud
services in general.
That most cloud
services will want
to follow very similar
principles to what
we're going to discuss here.
And the network design
we're going to look at
is designed for that in mind.
There's a great anecdote
I can provide here.
A customer I've been dealing
with over the past couple
of years, really,
on and off has moved
from pretty much the
model you see here
to nearly complete cloud usage.
So, they said that 80
percent of their traffic
three years ago was
internal to that corp net
and 20 percent external.
And in three years,
that number has flipped.
So this network model
does not lend itself
well to that world, where
80 percent of the volume
of that traffic is going to some
external cloud-based resource
or the internet.
And they have gone through
some of the changes,
the improvements that
you'll see here today.
And you'll see, it's
a much better fit
for that cloud-first world.
This network infrastructure
served us very well
for a long time,
but it does need
to change as we move
into the cloud era.
So, let's look at that
traditional network
approach just to give
a baseline for what
we're going to move from.
So, in this example
here, on the left
you can see the Sidney
branch office there,
where we've got users
that send all the traffic
through from that
location through the MPLS
back to the head office
location in Singapore.
Then, to get out
to the internet,
we go through a proxy,
we have those DLP devices
we discussed on
the previous slide,
and then that means from
an Office 365 perspective,
that user that's accessing
Teams, SharePoint,
Outlook from Sidney,
from Australia,
is routing that traffic
through Singapore,
and it sits in Microsoft's
infrastructure in Singapore.
The common result
we see with this
is that we see high latency.
I mentioned earlier, latency
is our key currency here.
We want this as low as
possible to hit what we call
the Microsoft 365 front door.
When I that phrase "front
door," what I mean is Microsoft
has infrastructure
around the globe.
We have a global network
that has locations . . .
internet peering locations
pretty much most metropolitan
areas that have
internet exchanges.
Microsoft's network is there.
We also have a what we
call a "front door."
So, this is where the
client will connect
to to consume the service.
So, a user's data
here in Sidney might
be hosted in our
Singapore or APAC data
center region, but
that user in Sidney,
with the right approach, can
use the service front door
for Exchange, for Teams, for
SharePoint and OneDrive locally
in Australia.
So, you can see
there the latency
to do this connectivity
from Sidney
is around 140 milliseconds.
So, that's pretty high.
I'm mean, Teams, for example,
they have a roundtrip time aim
of being under 100 milliseconds
roundtrip time to the relay
server--the front
door for Teams.
Now, instantly, in this
scenario, we're over that.
Teams will work, of
course, but we're
pushing that latency envelope.
So, we may see call quality
issues and meeting quality
issues.
The other problems are
things like high load on that
centralized egress, that
all this Exchange traffic,
this Teams traffic--imaging
we're having an all-hands call,
so all that Teams traffic .
. . everyone's on their email
and SharePoint at the same
time--that's an enormous load
going through that
infrastructure,
that internet proxy, that
didn't go through there before.
So that that can then have
a knock-on impact on things
like web browsing and other
business-critical use that
uses that egress, which
is obviously not something
we want to do.
The other problem, if we think
about the MPLS, the circuit
between Sidney and
Singapore, that's
obviously used for other
business-critical traffic.
We're again putting high
load through that MPLS
as we use all these services.
Again, an all-hands Teams call,
for example--all those user
pulling that traffic back and
forth over that WAN--can then
impact other things.
And if you think back
to the previous slide,
I was talking about updates.
That MPLS can be overwhelmed
with the number of packages
of updates being
pulled in, for example.
And this is an expensive model
to deliver in many locations
around the globe.
MPLS circuits can be extremely
expensive in some areas,
much more expensive in many
cases than local internet.
The other thing
is, I always like
to think about this in
terms of the ability
to adapt to changing
needs and requirements.
We could throw money at
the proxy or the MPLS
here to boost its capacity
to deal with this problem,
but then, what happens when the
next cloud service comes along
that needs to use this model?
Do we throw more money at it?
And we're not really
solving the problem.
We've still got
that latency there
that we can't throw
money at and remove
using this particular model.
And from an Office
perspective, as I say,
the end result will be
poor-quality Teams calls,
meetings; sluggish
Outlook behavior;
slow file transfer into
OneDrive--none of these,
obviously, anyone at Microsoft
wants our customers to have
and, I'm sure, anyone deploying
Office 365 wouldn't want
their users to have.
So this model is inherently
not well designed
for SaaS services, for
cloud services in general.
So, what do we do to fix it?
Well, the most popular way
we're seeing customers evolve
to deal with this problem
is changing the router
they use to connect that
MPLS between two locations
from a router to
an SD-WAN device.
And you'll see there we've got
some additional connections
into that SD-WAN
in Australia now.
We've got two
commodity ISP lines
that connect us to
the internet locally
in Sidney in this example.
We've still got the
MPLS circuit there
that's plugged into the
SD-WAN that connects us.
But what this now
allows us to do
is use the connectivity
principles.
From an Office
perspective, we highlight
those endpoints
that are critical,
the Optimize-marked
endpoints, as we call them.
So, these are the high
volume, the latency sensitive,
the high connection count--all
these things that are going
to put pressure on the
MPLS, on the proxy,
on the centralized egress.
These are marked as Optimize.
And there are literally,
at the time of recording,
only four FQDNs
and 20 IP subnets
that correspond to those.
And they don't change
very often at all.
So, what we can do
with this SD-WAN device
is say, OK, let's read
Microsoft's REST API service
and pull that information in.
And if we see connectivity
going from a client
to those Optimize
endpoints, we send them
out of one of our two ISPs
that are connected in here.
And what that means is that
that user connecting to Teams,
to OneDrive, to Exchange--they
can connect via the ISP
to the service front door
that Microsoft operates
in Australia.
So this might be our
network edge or in the data
centers themselves.
So, what that means
is we can then
deliver much higher performance
levels for that user,
and we've removed that
problem of high volume
on the MPLS, the WAN, and also
on the centralized egress.
And that MPLS circuit is
still there for any traffic
we still want to
send by that path.
So, the bulk of the
Office 365 endpoints
can be sent by here
with no problem.
We've solved the
problem by optimizing
that small subset of endpoints.
So, once we've
implemented this model,
the common result would
be, you can see here,
the latency to the service
front door from Sidney
in this scenario would
around 10 milliseconds.
So that's an
enormous improvement
in latency to the
service front door.
And this will then reflect
on the service quality.
You'll have high-quality
Teams calls and meetings;
high-quality, responsive
Outlook functions;
high-speed transfers into
OneDrive for Business.
And it's removed all that
load from the central egress
in the MPLS circuit.
And what we're seeing many
customers do as they implement
this model, you can see
that on the slide now,
the MPLS circuit is gone.
We see many customers reduce
the size of that MPLS circuit,
saving cost.
As we're starting to push out
the bulk of our traffic--and I
think back to that customer
that's 80 percent cloud based
now--there's no need, really,
for an MPLS anymore because all
those services are based
out on the internet.
And we've found a secure way
to connect to those directly
without having to use that
internal infrastructure that
was built for the
on-premises world.
And you can see on
this screen here,
this is extensible to anything
else we deem necessary.
So, we can use
the local breakout
for things like Windows
and Office updates,
any other cloud
service we connect to,
where we have trusted,
known endpoints.
We can configure
the SD-WAN device
to just send those direct.
And again, you can see
here with the green lines,
we can use these
SD-WAN devices to just
build fault-tolerant tunnels
over the internet path
back to head office for
that 20 percent of traffic
that still needs to go there.
So, a very interesting
and very powerful
model that we're seeing
customers very successfully
implement.
And many are
reporting to Microsoft
that they're seeing significant
cost savings on their network
OpEx by implementing this,
many up to about 60 percent
year-on-year spend
on the network,
and the added bonus of
giving great performance,
great usability
for cloud services.
Just expanding on that
model a little further,
if we think about
that previous model,
where we're still sending
our web browsing traffic back
to head office to go
through those proxies--now,
in the cloud-first world there's
a much more efficient way
of doing that.
And many customers are
heading down this path,
where instead of hosting their
own internet browsing proxies
on the premises in the
head office data centers,
they're switching to things
like cloud-based secure web
gateways, a fancy word
Security as a Service,
essentially--proxies, security
services in the cloud.
So you can see in this example,
we've got two branch offices
again: one in Sidney, the one
we were looking at before,
and one in Auckland,
in New Zealand.
So, we can use this
cloud-based secure web gateway,
and this company that
provides that might
have nodes in Sidney, in
Singapore, in Tokyo, in London,
and so on.
So, we find the
nearest one, we fire
all our internet
browsing traffic
at this cloud-based
secure web gateway,
and it connects to the
internet from there.
So, we we're utilizing
that lower-cost bandwidth
to locally break out.
And this is where we're
moving towards that,
kind of, internet-first model,
where the corp net--corporate
network--seems to . . . it does
retract in size to the point
where it becomes a,
well, not there anymore,
which is the end goal our
other customers are aiming for.
But this enables customers
to provide secure web
browsing for their
clients, their branches,
utilizing that lower-cost,
high-speed bandwidth they've
got at a branch office
and still maintaining
control of that browsing.
So, we can access things like
Windows Update through here.
But one of the points
we wanted to make here,
those Optimize
endpoints I mentioned
before, we would still recommend
very strongly that they
are sent direct.
And if you look at
this example, this
is an example why that might be.
So, in this instance, we're
talking to the nearest cloud
proxy, which is in Sidney.
So, for the Sidney branch
office this is fine.
It would be low latency
to hit that node.
But from Auckland, there
isn't one in New Zealand,
so the nearest one is in Sidney.
So, we've got additional
latency again.
And this is the problem of
that principle number 3:
avoiding hairpins.
This I would consider a hairpin,
where we're sending traffic
away from where it needs to
go before it hits Microsoft
because you can see
on the slide here,
we've got a New Zealand
service front door.
So, we have infrastructure
in New Zealand
which will expand considerably
as a new data center
there opens.
So what we would
always recommend,
even with these cloud-based
secure web gateways, is we
utilize that SD-WAN to
intelligently read Microsoft's
REST AP service and say, if I
see those Optimize endpoints,
I'm just going to
send them direct.
And I can then hit the
service front door,
Microsoft's network, our
infrastructure, instantly,
within a few milliseconds
in these cases here.
And we've still,
again, got those links
back to our head office should
we need them for anything
that remains there.
So, this is a very
popular model I'm seeing
customers use more and more.
It helps them really
sweat the asset,
the SD-WAN, the local high-speed
bandwidth that they've put in.
And it improves things
like web browsing speed.
And if you think,
also, in terms of what
this looks like from
a security perspective
(I'll talk about how we
deal with that problem
on the next slide),
you can essentially
think of a branch
office as a user,
a VPN user, a remote user.
You know, in the time of
COVID, where everyone's
working at home, if we put
these principles in place,
a user becomes exactly the
same as the branch office.
And if you imagine
in this scenario,
if we've got this model in
place, we can tell that user,
any web browsing, just find your
nearest cloud web gateway--we
configure the machine, the PC
to do that--and our Optimize
endpoints, we use VPN
forced-tunnel exceptions,
split tunneling to send
that traffic direct.
So, we're removing
the bottlenecking
of all that traffic that's
coming through my VPN
concentrator by applying
these principles.
And that's the great thing:
When we implement these
changes--think back to the
statement I made before about
allowing business to be
agile--the customers I've seen
that have gone down this path
and implemented it to a large
degree have really had to do
very little to deal with 100
percent home working because
there is no bottleneck
on premises, there's no VPN
concentrator that has to deal
with all those users connecting
because most of that traffic is
sent direct to the service.
One of the other
advantages of this model
is these SD-WAN devices
can provide other services,
like local DNS.
Many cloud services, including
Office 365, rely on local DNS
to find the local front door,
the nearest node for you
to connect to.
And this SD-WAN
approach, another benefit
is that the SD-WAN can deal
with that problem for you,
that it does local DNS for you.
So, we're not having
to rely on that DNS
in Singapore, which means that
we may get infrastructure front
doors in Singapore because
we've found them via DNS.
So, moving on from this,
the obvious question
is, OK, we're going
to bypass our security
infrastructure that we
know and love and trust.
What about security?
You know, what's the risk to
my business of doing this?
I've built this
infrastructure that
inspects my traffic
as it goes through,
and it protects my business.
The way I like to look at
it is that in this example,
we're not talking about
sending every piece of traffic
to Office 365 direct, just those
significant Optimize endpoints.
And if we take
Exchange as an example,
in the on-premises
world, most corporations
will have Outlook
connect to Exchange
without being inspected, without
being sent through a proxy.
You know, so we have a direct
connection between the two.
In the cloud world,
all you've done
is really lifted that
Exchange server and put it
into Microsoft's cloud.
So, we don't need to be treating
that traffic the same way
as we do user browsing,
as I mentioned before.
So, from a security
perspective, what
we do is we deal
with the security
as the data comes in
and out of the service,
not as the client
connects to the service.
That's the difference.
So, how do we do that?
Well, from a
Microsoft perspective,
we have various features
that are already
built into the service that
can be turned on or are there
automatically that often
replicate all the things we're
trying to do at the network
edge in that security
stack in terms of A/B scanning.
DLP can be applied
against those endpoints.
We've got Exchange
Online Protection.
We've got Office ATP.
And all these things
are often replicating
and probably do a better
job because they understand
the service and the protocol
better of securing that user.
But one of the key components
of this model is zero trust.
I've touched on it
throughout this presentation.
It's really about moving away
from trusting the fact that we
know users on a
trusted network--i.e.,
our corp net--and they
have the credentials.
Therefore, you're good to go.
You connect to
anything you need.
That model is
inherently insecure.
And that's the benefit of
moving to this cloud model
is that we can apply this
zero trust in phases,
and it allows us
to have much more
granular, real-time security
decisions that allow access
rather than just
relying on the fact
that the user's in a trusted
place and has credentials.
So, there's another
session that we'll
deliver as part
of this that will
cover this in much more detail.
But I just wanted to touch on
some of the elements here that
might be of interest
in terms of . . .
we use identity with strong
authentication across
the estate.
We use the visibility
of the devices,
using things like
Intune, where we
can see the
compliance, the health
status, that it's domain
joined, before we grant access
to our resources.
Applications, again, looking
at things like shadow IT,
we moved that
security to the data,
that we move from
perimeter-based data protection
because your data doesn't live
within that perimeter anymore.
That's the key thing here.
That we moved to
data-driven protection.
That we classify data.
We ensure it's encrypted,
and we restrict access
based on policy.
We look at infrastructure here,
where we can automatically
flag and block risky behavior.
We employ least-privilege
access principles, for example,
so we could, you know, remove
a user's privileges, rights,
access if they move
roles, for example.
We don't assume that
that access goes
with them to their new role.
We apply it.
And then, I mention network.
We don't trust the
network anymore.
That network is going to
reduce in its capacity
to host things anymore.
And there's no reason
why we should just
trust in the first place.
And one of the key
elements of this
is Azure AD conditional access,
as it says on the screen:
maintaining control in the
cloud-first, mobile-first
world.
We can use elements of
our behavior, our access,
to change.
Again, we're not
trusting this user
because they've got credentials.
We let them at the data.
We can look at things,
as I said before,
like domain-joined devices.
Is this a device . . .
I've not seen this user
connect to this service before.
If so, we ratchet up
or we block access.
We trigger MFA, maybe,
if we've not seen a user
log in from here before.
Is the IP range something
that we need to consider here?
Is the user connecting
from a region of the world
we don't want access from?
All these things can be
analyzed in real time.
We look at that risk, and
we apply appropriate policy
rather than just
trusting the fact
that this user has
the credentials
and they have access.
As I say, we'll dig
into this more in one
of the later modules.
And one of the key questions
that always comes up
around zero trust and the
modern network approach
is, OK, I've implemented
the feature called
Tenant Restrictions, which
means that I block access
to any tenant other than those
which I implicitly trust.
So, how does that work if you
allow users to connect directly
because that feature
is applied on proxies.
Well, the answer is, if you look
at the model on the screen now,
we've still got those local
breakout links via the SD-WAN
from the branch office.
But there is a separation
between the authentication
and the connection
to the service.
So, we optimize those
marked endpoints,
the connection is to
Exchange, the TCP connection
to Exchange, the UDP
connection to Teams, and so on,
but the authentication is
done through the proxy still.
We don't need to optimize that.
That's a relatively
irregular function.
We ask for a token as
we access a service,
and we can cache that
for a period of time.
So that authentication
can still continue
to come through the MPLS,
through the VPN, whatever
model we have to come into
the corporate network.
And we then apply that
feature as normal.
Then, if the user
is trying to access
a tenant that is not trusted,
we refuse to issue the token.
So, even if I can make a
TCP connection to Exchange,
I do not have a
token to access it.
So, that TCP session
is useless for me.
These features can still richly
apply and deliver the security
and control that we need,
and it doesn't break that
by implementing this model.
Hopefully, that session
was useful for you
to see an insight into how
Microsoft's customers are
evolving their
network infrastructure
and their connectivity to be
much more agile, providing
high performance,
optimizing user
connectivity to the service,
and also giving themselves
much more agility to
deal with problems
that come at them, like
the sudden shift to remote
working this COVID-19 crisis
has driven on all of us.
For more additional
resources, we
have our Microsoft 365 Network
Connectivity Video Series page,
where we'll cover in more
detail some of the elements
I've touched on here around
the principles of zero trust.
And we've also got some
interesting tooling that
will help you highlight where
your network connectivity
is aligning to these principles
or where it may not be
and some remediation actions
you can take from there.
And with that, I'll wrap up.
I hope this session
was useful for you.
And I hope the rest of the
sessions are useful, also.
Thank you.
